233x Filetype PDF File size 1.79 MB Source: www.cs.uoregon.edu
Evolution of ProgrammingApproachesforHigh-Performance
HeterogeneousSystems
Jacob Lambert
University of Oregon
jlambert@cs.uoregon.edu
Advisor: Allen D. Malony
University of Oregon
malony@cs.uoregon.edu
External Advisor: Seyong Lee
OakRidgeNational Lab
lees2@ornl.gov
Area ExamReport
CommitteeMembers:AllenMalony,BoyanaNorris,HankChilds
ComputerScience
University of Oregon
United States
December14,2020
Evolution of ProgrammingApproachesforHigh-Performance
HeterogeneousSystems
ABSTRACT they still were created to address the same challenges: using opti-
Nearly all contemporary high-performance systems rely on hetero- mized hardware to execute specific algorithmic patterns.
geneous computation. As a result, scientific application developers ThePartitionable SIMD/MIMD System (PASM) [270] machine
are increasingly pushed to explore heterogeneous programming developed at Purdue University in 1981 was initially developed for
approaches. In this project, we discuss the long history of hetero- image processing and pattern recognition application. PASM was
geneous computing and analyze the evolution of heterogeneous unique in that it could be dynamically reconfigured into either a
programming approaches, from distributed systems to grid com- SIMDorMIMDmachine,oracombinationthereof.Thegoalwas
puting to accelerator-based supercomputers. to create a machine that could be optimized for different image
processing and pattern recognition tasks, configuring either more
SIMDorMIMDcapabilitiesdependingontherequirementsofthe
application.
1 INTRODUCTION However, like many early heterogeneous computing systems,
Heterogeneouscomputingisparamounttotoday’shigh-performance programmability was not the primary concern. The programming
systems.Thetopandnextgenerationofsupercomputersallemploy environment for PASM required the design of a new procedure-
heterogeneity, and even desktop workstations can be configured based structured language similar to TRANQUIL [2], the develop-
to utilize heterogeneous execution. The explosion of activity and mentofacustomcompiler, and even the development of a custom
interest in heterogeneous computing, as well as the exploration operating system.
and development of heterogeneous programming approaches, may Another early heterogeneous system was TRAC, the Texas Re-
seemlike a recent trend. However, heterogeneous programming configurableArrayComputer[264],builtin1980.LikePASM,TRAC
has been a topic of research and discussion for nearly four decades. could weave between SIMD and MIMD execution modes. But also
Many of the issues faced by contemporary heterogeneous pro- like PASM,programmabilitywasnotaprimaryorcommonconcern
grammingapproachdesigners have long histories, and have many with the TRAC machine, as it relied on now-arcane Job Control
connections with now antiquated projects. Languages and APL source code [197].
In this project, we explore the evolution and history of hetero- Thelackoffocusonprogrammingapproachesforearlyheteroge-
geneous computing, with a focus on the development of heteroge- neous systems is evident in some ways by the difficulty in finding
neous programming approaches. In Section 2, we do a deep dive information on how the machines were typically programmed.
into the field of distributed heterogeneous programming, the first However, as the availability of heterogeneous computing environ-
application of hardware heterogeneity in computing. In Section 3, ments increased throughout the 1990s, so did the research and
we briefly explore the resolutions of distributed heterogeneous development of programming environments.
systems and approaches, and discuss the transitional period for Throughoutthe80sandearly90s,this environment expanded
the field of heterogeneous computing. In Section 4, we provide a to include vector processors, scalar processors, graphics machines,
broad exploration into contemporary accelerator-based heteroge- etc. To this end, in this first major section we explore distributed
neouscomputing,specifically analyzing the different programming heterogeneous computing.
approaches developed and employed across different accelerator Although the first heterogeneous machines consisted of mixed-
architectures. Finally, in Section 5, we take a zoomed-out look at modemachines like PASM and TRAC, mixed-machine heteroge-
the development of heterogeneous programming approaches, in- neous systems became the more popular and accessible option
trospect on some important takeaways and topics, and speculate throughout the 1990s. Instead of a single machine with the ability
about the future of next-generation heterogeneous systems. to switch betweenasynchronousSIMDmodeandanasynchronous
MIMDmode,mixed-machinesystemscontainedavarietyofdiffer-
2 DISTRIBUTEDHETEROGENEOUS ent processing machines connected by a high-speed interconnect.
SYSTEMS1980-1995 Examples of machines used in mixed-machine systems include
Even40yearsago,computerscientists realized heterogeneity was graphics and rendering-specific machines like the Pixel Planes 5,
needed due to diminishing returns in homogeneous systems. In Silicon Graphics 340 VGX, SIMD and vector machines like the
the literature, the first references to the term "heterogeneous com- MasParMP-series and the CM 200/2000, and coarse grained MIMD
puting" revolved around the distinction between single instruc- machines like the CM-5, Vista, and Sequent machines.
tion, multiple data (SIMD) and multiple instruction, multiple data It was well understood that different classes of machines (SIMD,
(MIMD)machinesinadistributed computing environment. MIMD,vector, graphics, sequential) excelled at different tasks (par-
Several machines dating back to the 1980s were created and allel computation, statistical analysis, rendering, display), and that
advertised as heterogeneous computers. Although these machines these machines could be networked together in a single system.
were conceptually different than today’s heterogeneous machines, However,coordinatingthesedistributedsystemstoexecuteasingle
application presented significant challenges, which many of the early surveyed works related to distributed heterogeneous com-
projects in the next section began to address. puting, and they heavily influenced the heterogeneous systems
In this section, we explore different programming frameworks created and heterogeneous software and programming approaches
developed to utilize these distributed heterogeneous systems. In used. Ercegovac [106] lists how, at the time, the three different ap-
Section 2.1, we review several surveys to gain a contextualized proacheswerecombinedindifferentwaystoformthefivefollowing
insight into the research consensus during the time period. Then in heterogeneous approaches:
Section 2.2, we review the most prominent and impactful program- (1) Mainframeswithintegratedvectorunits,programmedusing
mingsystemsintroducedduringthistime.FinallyinSection2.3we a single instruction set augmented by vector instructions.
discuss the evolution of distributed heterogeneous computing, and (2) Vector processors having two distinct types of instructions
howitrelates to the subsequent sections. andprocessors, scalar and vector. An early example includes
the SAXPY system, which could be classified as a Matrix
2.1 Distributed Heterogeneous Architectures, Processing Unit.
Concepts, and Themes (3) Specialized processors attached to the host machine (AP).
For insight into high-level perspectives, opinions, and the general This approach closely resembles accelerator-based heteroge-
state of the area of early distributed heterogeneous computing, we neous computing, the subject of Section 4. The ST-100 and
include discussions from several survey works published during ST-50 are early examples of this approach.
the targeted time period. We aim to extract general trends and (4) Multiprocessor Systems with vector processors as nodes,
overarching concepts that drove the development of early systems or scalar processors augmented with vector units as nodes.
andearly heterogeneous programming approaches. For example in PASM, mentioned earlier in this Section, the
TheworkbyErcegovac[106],HeterogeneityinSupercomputerAr- operating system supported multi-tasking at the FORTRAN
chitectures, represents one of the first published works specifically level, and the programmer could use the fork/join API calls
surveying the state of high performance heterogeneous computing. to exploit MIMD-level parallelism.
Theydefine heterogeneity as the combination of different architec- CEDAR[172]representedanotherexampleofamultiproces-
tures and system design styles into one system or machine, and sor cluster with eight processors, each processor modified
their motivation for heterogeneous systems is summed up well by with an Alliant FX/8 mini-supercomputer. This allowed het-
the following direct quote: erogeneity within clusters, and among clusters, and at the
level of instructions, supporting vector processing, multipro-
Heterogeneityinthedesign(ofsupercomputers)needs cessing, and parallel processing.
to be considered when a point of diminishing returns (5) Special-purpose architectures that could contain heterogene-
in a homogeneous architecture is reached. ity at both the implementation and function levels. The
As we see throughout this work, this drive for specialization Navier-Stokes computer (NSC) [262] is an example. The
to counter diminishing returns from existing hardware repeatedly nodes could be personalized via firmware to respond to inte-
resurfaces, and this motivation for heterogeneous systems is very rior or boundary nodes.
muchrelevant today. Five years later, another relevant survey, Heterogeneous Com-
Ercegovac’sworkdefinesfourdistinctavenuesforheterogeneity: puting: Challenges and Opportunities was published by Khokhar
(1) System Level - The combination of a CPU and an I/O channel et al [166]. Where the previous survey focused on heterogeneous
processor, or a host and special processor, or a master/slave computing as a means to improve performance over homogeneous
multiprocessor system. systems, this work offers an additional motivation; instead of re-
(2) Operating System Level - The operating system in a dis- placing existing costly multiprocessor systems, they propose to
tributed architecture, and how it handles functionality and leverage heterogeneous computing to use existing systems in an in-
performance for a diverse set of nodes. tegrated environment. Conceptually, this motivation aligns closely
(3) Program Level - Within a program, tasks need to be defined with the goals of grid and metacomputing, discussed in Section 3.
as concurrent, either by a programmer or compiler, and then The authors present ten primary issues facing the developing
those tasks are allocated and executed on different proces- heterogeneous computing systems, which also serve as a high-
sors. level road map of the required facilities of a mature heterogeneous
(4) Instruction Level - Specialized units, like an arithmetic vector programmingenvironment:
pipelines, are used to provide optimal cost/performance ra- (1) Algorithm Design - Should existing algorithms be manually
tios. These units execute specialized instructions to achieve refactored to exploit heterogeneity, or automatically profiled
higher performance than possible with a generalized unit, to determine types of heterogeneous parallelism?
at an extra cost. (2) Code-type Profiling - The process of determining code prop-
erties (vectorizable, SIMD/MIMD parallel, scalar, special pur-
At the time of Ercegovac’s work, there existed three primary pose)
homogeneousprocessingapproachesinhigh-performancecomput- (3) Analytical Benchmarking - A quantitative method for deter-
ing: (1) vector pipeline and array processors, (2) multiprocessors mining which code patterns and properties most appropri-
and multi-computers following the MIMD model, and (3) attached ately map to which heterogeneous components in a hetero-
SIMDprocessors. These approaches were ubiquitous across all the geneous system
2
(4) Partitioning - The process of dividing up an assigning an FreundandSiegelalsooffertwopotentialprogrammingparadigms:
application to heterogeneous system, informed by the code- (1) the adaptation of existing languages for heterogeneous envi-
type profiling and analytical benchmarking steps. ronments and (2) explicitly designed languages with heterogene-
(5) Machine Selection - Given an array of available heteroge- ity in mind. They discuss advantages and disadvantages of both
neous machines, what is the process for selecting the most paradigms. This discussion of balance between specificity and gen-
appropriate machine for a given application. Typically, the erality in heterogeneous program paradigms continues today, with
goal of machine selection methods and algorithms, for ex- contention between specific approaches like CUDA and general
ample the Heterogeneous Optimal Selection Theory (HOST) approaches like OneAPI. Additionally, the authors depart from the
algorithm [65] was to select the least expensive machine opinion that there would be one true compiler, architecture, oper-
while respecting a maximal execution time. ating system, and tool set to handle all heterogeneous tasks well,
(6) Scheduling - A heterogeneous system-level scheduler needs insisting that a variety of options will likely be beneficial depending
to be aware of the different heterogeneous components and ontheapplication and context.
schedule accordingly. In the conclusion, the authors predict that heterogeneity will
(7) Synchronization - Communication between senders and re- always be necessary for wide classes of HPC problems; computa-
ceivers, shared data structures, and collectives operations tional demands will always exceed capacity and grow faster than
presented novel challenges in heterogeneous systems. hardware capabilities. This has certainly proven to be true, as het-
(8) Network - The interconnection network itself between het- erogeneous computing is a staple in today’s high-performance
erogeneous machines presented challenges. computing.
(9) Programming Environments - Unlike today, where program- The 1994 work by Weems et al., Linguistic Support for Hetero-
mibility and productivity lie at the forefront of heteroge- geneous Parallel Processing: A Survey and an Approach [292], is
neoussystemdiscussions, in this work the discussion of pro- particularly interesting in the context of this project. As previ-
grammingenvironmentsalmost seems like an afterthought. ously mentioned, programming approaches and methodologies are
This is not unusual in works exploring early heterogeneous typically a minor consideration in many early heterogeneous com-
systems however, as hardware system-level issues were typ- puting works. However, this work explored the existing options for
ically the primary focus. However, they do mention that heterogeneous programming and the challenges and requirements
a programming language would need to be independent, for heterogeneous languages.
portable, and include cross-parallel compilers and debug- The authors define three essential criteria for evaluating the
gers. suitability of languages for heterogeneous computing: (1) efficiency
(10) PerformanceEvaluation-Finally,theydiscusstheneedforde- and performance, (2) ease of implementation, and (3) portability.
velopmentofnovelperformanceevaluationtoolsspecifically Theydiscuss how languages would need to support an orthogonal
designed for heterogeneous systems. combinationofdifferentprogrammingmodels,includingsequential,
control (task) parallelism, coarse and fine-grained data parallelism,
andsharedanddistributedmemory.Theystressthatheterogeneous
Insummary,theauthorscallforaneedforbettertoolstoidentify programminglanguages must be extendable to avoid limitations
parallelism, improved high-speed networking and communication on their adaptability, and that abstractions over trivialities must
protocols, standards for interfaces between machines, efficient par- be provided in order to not overwhelm programmers, while still
titioning and mapping strategies, and user-friendly interfaces and providing access details needed by system software. Furthermore,
programmingenvironments. Many of these issues are addressed they discuss the need for an appropriate model of parallelism at
bytheprogrammingapproachesandimplementationsdiscussed different levels, i.e., control parallelism at a high level, and data
throughout this work. However, as more heterogeneous and spe- parallelism at a lower level. These kinds of considerations and
cialized processors emerge (Sections 4.8 and 4.9), many of these concernsarestillrelevanttoday.Forexample,theubiquitousMPI+X
issues resurface and remain as outstanding issues and challenges approach has long been the de facto solution for this kind of tiered
with today’s high-performance heterogeneous computing. parallelism, but requires interfacing with two standards and two
In the guest editor’s introduction of the 1993 ACM Computer implementations.
journal, a special edition on Heterogeneous Processing, Freund and Weemsetal.thensurveytheexistinglanguages,anddiscusstheir
Siegel offer a high-level perspective on the then-current state of limitations with respect to their vision of a truly heterogeneous
high-performance heterogeneous computing [117]. language. They include Cluster-M [107], HPF [170], Delirium [203],
They offer several motivations for heterogeneous processing. Linda[62],PCN[115],PVM[278],p4[60],C**,PC++,ABCL/1[302],
Different types of tasks inherently contain different computational Jade [254], Ada9x [276], Charm++ [161], and Mentat [126] in the
characteristics requiring different types of processors, and forcing discussion, some of which are explored in this project in Section 2.2.
all problem sets to map to the same fixed processor is unnatural. Theyfurther detail six features of an ideal heterogeneous program-
They also consider the notion that the primary goal of heteroge- minglanguage:
neous computing should be to maximize usable performance as (1) supports any mode of parallelism
opposed to peak performance, by means of using all available hard- (2) supports any grain size
wareinaheterogeneouswayinsteadofmaximizingperformance (3) supports both implicit and explicit communication
onaspecific processor. (4) users can define and abstract synchronizations
3
no reviews yet
Please Login to review.