赞
踩
It has been a long time that computer architecture and systems are optimized for efficient execution of ma-chine learning (ML) models. Now, it is time to reconsider the relationship between ML and systems and let ML transform the way that computer architecture and systems are designed. This embraces a twofold mean-ing: improvement of designers’ productivity and completion of the virtuous cycle. In this article, we present a comprehensive review of the work that applies ML for computer architecture and system design. First, we perform a high-level taxonomy by considering the typical role that ML techniques take in architecture/system design, i.e., either for fast predictive modeling or as the design methodology. Then, we summarize the com-mon problems in computer architecture/system design that can be solved by ML techniques and the typical ML techniques employed to resolve each of them. In addition to emphasis on computer architecture in a nar-row sense, we adopt the concept that data centers can be recognized as warehouse-scale computers; sketchy discussions are provided in adjacent computer systems, such as code generation and compiler; we also give attention to how ML techniques can aid and transform design automation. We further provide a future vi-sion of opportunities and potential directions and envision that applying ML for computer architecture and systems would thrive in the community.
长期以来,计算机结构和系统的优化都是为了提高机器学习模型的高效执行。现在,是时候重新考虑机器学习 (ML) 和系统之间的关系,并让 ML 改变计算机结构和系统的设计方式了。这意味着可以提高设计师的生产力,并完成良性循环。在本文中,我们提出了对应用于计算机结构和系统设计的 ML 工作的全面综述。首先,我们考虑了 ML 技术在架构/系统设计中的典型角色,即快速预测建模或设计方法。然后,我们总结了计算机结构和系统设计中可用 ML 技术解决的常见问题,并概述了每种问题使用的的典型 ML 技术。除了强调狭义的计算机架构之外,我们还采用了数据中心被视为仓库规模计算机的概念;对相邻计算机系统,如代码生成和编译,进行了简要讨论;我们还关注了 ML 技术如何帮助和改变设计自动化。我们进一步提供了未来机会和潜在方向的前景,并认为应用于计算机结构和系统的 ML 将会繁荣于社区。
CCS Concepts: • Computing methodologies → Machine learning; • Computer systems organization→ Architectures; • General and reference → Surveys and overviews;
Additional Key Words and Phrases: Machine learning for computer architecture, machine learning for systems
ACM Reference format:
Nan Wu and Yuan Xie. 2022. A Survey of Machine Learning for Computer Architecture and Systems. ACM Comput. Surv. 55, 3, Article 54 (February 2022), 39 pages.
1 | INTRODUCTION |
Machine learning (ML) has been doing wonders in many fields. As people are seeking better artificial intelligence (AI), there is a trend towards larger, more expressive, and more complex models. According to the data reported by OpenAI [21], from 1959 to 2012, the amount of compute used in the largest AI training runs doubles every two years; since 2012, deep learning starts tak-ing off, and the required amount of compute has been increasing exponentially with a 3.4-month doubling period. By comparison, Moore’s law [168], the principle that has powered the integrated-circuit revolution since 1960s, doubles the transistor density every 18 months. While Moore’s law
is approaching its end [228], more pressure is put on innovations of computer architecture and systems to keep up with the compute demand of AI applications.
Conventionally, computer architecture/system designs are made by human experts based on in-tuitions and heuristics, which requires expertise in both ML and architecture/system. Meanwhile, these heuristic-based designs can not guarantee scalability and optimality, especially in the case of increasingly complicated systems. As such, it seems natural to move towards more automated and powerful methodologies for computer architecture and system design, and the relationship be-tween ML and system design is being reconsidered. Over the past decade, architecture and systems are optimized to accelerate the execution and improve the performance of ML models. Recently, there have been signs of emergence of applying ML for computer architecture and systems, which embraces a twofold meaning: 1⃝ the reduction of burdens on human experts designing systems manually to improve designers’ productivity, and 2⃝ the close of the positive feedback loop, i.e., architecture/systems for ML and simultaneously ML for architecture/systems, formulating a vir-tuous cycle to encourage improvements on both sides.
Existing work related to applying ML for computer architecture and system design falls into two
volves performance metrics or some criteria of interest (e.g., power consumption, latency, through-categories: 1⃝ ML techniques are employed for fast and accurate system modeling, which in-
put). During the process of designing systems, it is necessary to make fast and accurate pre-dictions of system behaviors. Traditionally, system modeling is achieved through the forms of cycle-accurate or functional virtual platforms and instruction set simulators (e.g., gem5 [18]). Even though these methods provide accurate estimations, they bring expensive computation costs as-sociated with performance modeling, which limits the scalability to large-scale and complex sys-tems; meanwhile, the long simulation time often dominates design iteration, making it impossible to fully explore the design space. By contrast, ML-based modeling and performance prediction as a design methodology to directly enhance architecture/system design. ML techniques are capable to balance simulation cost and prediction accuracy. 2⃝ ML techniques are employed are skilled at extracting features that might be implicit to human experts, making decisions with-out explicit programming, and improving themselves automatically with accumulated experience. Therefore, applying ML techniques as design tools can explore design space proactively and in-telligently, and manage resource through better understanding of the complicated and non-linear interactions between workloads and systems, making it possible to deliver truly optimal solutions.
In this article, we present a comprehensive overview of applying ML for computer architecture and systems. As depicted in Figure 1, we first perform a high-level taxonomy by considering the typical role that ML techniques take in architecture/system design, i.e., either for fast predictive modeling or as the design methodology; then, we summarize the common problems in architec-ture/system design that can be solved by ML techniques and the typical ML techniques employed to resolve each of them. In addition to emphasis on computer architecture in a narrow sense, we adopt the concept that data centers can be recognized as warehouse-scale computers [15] and review studies associated with data center management; we provide sketchy discussions on ad-jacent computer systems, such as code generation and compiler; we also give attention to how ML techniques can aid and transform design automation that involves both analog and digital circuits. At the end of the article, we discuss challenges and future prospects of applying ML for architecture/system design, aiming to convey insights of design considerations.
2 DIFFERENT ML TECHNIQUES
There are three general frameworks in ML: supervised learning, unsupervised learning, and re-inforcement learning. These frameworks mainly differentiate on what data are sampled and how these sample data are used to build learning models. Table 1 summarizes the commonly used ML
techniques for computer architecture and system designs. Sometimes, multiple learning models may work well for one given problem, and the appropriate selection can be made based on avail-able hardware resource and data, implementation overheads, performance targets, and so on.
2.1 Supervised Learning
Supervised learning is the process of learning a set of rules able to map an input to an output based on labeled datasets. These learned rules can be generalized to make predictions for unseen inputs. We briefly introduce several prevalent techniques in supervised learning, as shown in Figure 2.
• Regression is a process for estimating the relationships between a dependent variable and one or more independent variables. The most common form is linear regression [204], and some other forms include different types of non-linear regression [193]. Regression
learning.
techniques are primarily used for two purposes, prediction/forecasting, and inference of causal relationships.
• Support vector machines (SVMs) [202] try to find the best hyperplanes to separate data classes by maximizing margins. One variant is support vector regression (SVR), which is able to conduct regression tasks. Predictions or classifications of new inputs can be decided by their relative positions to these hyperplanes.
• Decision tree is one representative of logical learning methods, which uses tree structures to build regression or classification models. The final result is a tree with decision nodes and leaf nodes. Each decision node represents a feature, and branches of this node represent possible values of the corresponding feature. Starting from the root node, input instances are classified by sequentially passing through nodes and branches, until they reach leaf nodes that represent either classification results or numerical values.
• Artificial neural networks (ANNs) [197] are capable to approximate a broad family of functions: a single-layer perceptron is usually used for linear regression; complex DNNs [71] consisting of multiple layers are able to approximate non-linear functions, such as the multi-layer perceptron (MLP); variants of DNNs that achieve excellent performance in specific fields benefit from the exploitation of certain computation operations, e.g., convolutional neural networks (CNNs) with convolution operations leveraging spatial features, and re-current neural networks (RNNs) with recurrent connections enabling learning from se-quences and histories.
• Ensemble learning [199] employs multiple models that are strategically designed to solve a particular problem, and the primary goal is to achieve better predictive performance than those could be obtained from any of the constituent models alone. Several common types of ensembles include random forest and gradient boosting.
Different learning models have different preference of input features: SVMs and ANNs gener-ally perform much better with multi-dimension and continuous features, while logic-based sys-tems tend to perform better when dealing with discrete/categorical features. In system design, supervised learning is commonly used for performance modeling, configuration predictions, or predicting higher-level features/behaviors from lower-level features. One thing worth noting is that supervised learning techniques need well labeled training data prior to the training phase, which usually require tremendous human expertise and engineering.
2.2 Unsupervised Learning
Unsupervised learning is the process of finding previously unknown patterns based on unlabeled datasets. Two prevailing methods are clustering analysis [88] and principal component analysis (PCA) [237], as depicted in Figure 3.
• Clustering is a process of grouping data objects into disjoint clusters based on a measure of similarity, such that data objects in the same cluster are similar while data objects in different
clusters share low similarities. The goal of clustering is to classify raw data reasonably and to find possibly existing hidden structures or patterns in datasets. One of the most popular and simple clustering algorithms is K-means clustering.
• PCA is essentially a coordinate transformation leveraging information from data statistics. It aims to reduce the dimensionality of the high-dimensional variable space by representing it with a few orthogonal (linearly uncorrelated) variables that capture most of its variability.
Since there is no label in unsupervised learning, it is difficult to simultaneously measure the performance of learning models and decide when to stop the learning process. One potential workaround is semi-supervised learning [273], which uses a small amount of labeled data together with a large amount of unlabeled data. This approach stands between unsupervised and supervised learning, requiring less human effort and producing higher accuracy. The unlabeled data are used to either finetune or re-prioritize hypotheses obtained from labeled data alone.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。