jagomart
digital resources
picture1_Area 202012 Lambert


 112x       Filetype PDF       File size 1.79 MB       Source: www.cs.uoregon.edu


File: Area 202012 Lambert
evolution of programmingapproachesforhigh performance heterogeneoussystems jacob lambert university of oregon jlambert cs uoregon edu advisor allen d malony university of oregon malony cs uoregon edu external advisor seyong lee oakridgenational ...

icon picture PDF Filetype PDF | Posted on 04 Feb 2023 | 2 years ago
Partial capture of text on file.
       Evolution of ProgrammingApproachesforHigh-Performance
                        HeterogeneousSystems
                              Jacob Lambert
                             University of Oregon
                            jlambert@cs.uoregon.edu
                           Advisor: Allen D. Malony
                             University of Oregon
                            malony@cs.uoregon.edu
                          External Advisor: Seyong Lee
                            OakRidgeNational Lab
                              lees2@ornl.gov
                             Area ExamReport
                  CommitteeMembers:AllenMalony,BoyanaNorris,HankChilds
                              ComputerScience
                             University of Oregon
                               United States
                             December14,2020
                  Evolution of ProgrammingApproachesforHigh-Performance
                                                         HeterogeneousSystems
               ABSTRACT                                                              they still were created to address the same challenges: using opti-
               Nearly all contemporary high-performance systems rely on hetero-      mized hardware to execute specific algorithmic patterns.
               geneous computation. As a result, scientific application developers      ThePartitionable SIMD/MIMD System (PASM) [270] machine
               are increasingly pushed to explore heterogeneous programming          developed at Purdue University in 1981 was initially developed for
               approaches. In this project, we discuss the long history of hetero-   image processing and pattern recognition application. PASM was
               geneous computing and analyze the evolution of heterogeneous          unique in that it could be dynamically reconfigured into either a
               programming approaches, from distributed systems to grid com-         SIMDorMIMDmachine,oracombinationthereof.Thegoalwas
               puting to accelerator-based supercomputers.                           to create a machine that could be optimized for different image
                                                                                     processing and pattern recognition tasks, configuring either more
                                                                                     SIMDorMIMDcapabilitiesdependingontherequirementsofthe
                                                                                     application.
               1 INTRODUCTION                                                           However, like many early heterogeneous computing systems,
               Heterogeneouscomputingisparamounttotoday’shigh-performance            programmability was not the primary concern. The programming
               systems.Thetopandnextgenerationofsupercomputersallemploy              environment for PASM required the design of a new procedure-
               heterogeneity, and even desktop workstations can be configured        based structured language similar to TRANQUIL [2], the develop-
               to utilize heterogeneous execution. The explosion of activity and     mentofacustomcompiler, and even the development of a custom
               interest in heterogeneous computing, as well as the exploration       operating system.
               and development of heterogeneous programming approaches, may             Another early heterogeneous system was TRAC, the Texas Re-
               seemlike a recent trend. However, heterogeneous programming           configurableArrayComputer[264],builtin1980.LikePASM,TRAC
               has been a topic of research and discussion for nearly four decades.  could weave between SIMD and MIMD execution modes. But also
               Many of the issues faced by contemporary heterogeneous pro-           like PASM,programmabilitywasnotaprimaryorcommonconcern
               grammingapproachdesigners have long histories, and have many          with the TRAC machine, as it relied on now-arcane Job Control
               connections with now antiquated projects.                             Languages and APL source code [197].
                 In this project, we explore the evolution and history of hetero-       Thelackoffocusonprogrammingapproachesforearlyheteroge-
               geneous computing, with a focus on the development of heteroge-       neous systems is evident in some ways by the difficulty in finding
               neous programming approaches. In Section 2, we do a deep dive         information on how the machines were typically programmed.
               into the field of distributed heterogeneous programming, the first    However, as the availability of heterogeneous computing environ-
               application of hardware heterogeneity in computing. In Section 3,     ments increased throughout the 1990s, so did the research and
               we briefly explore the resolutions of distributed heterogeneous       development of programming environments.
               systems and approaches, and discuss the transitional period for          Throughoutthe80sandearly90s,this environment expanded
               the field of heterogeneous computing. In Section 4, we provide a      to include vector processors, scalar processors, graphics machines,
               broad exploration into contemporary accelerator-based heteroge-       etc. To this end, in this first major section we explore distributed
               neouscomputing,specifically analyzing the different programming       heterogeneous computing.
               approaches developed and employed across different accelerator           Although the first heterogeneous machines consisted of mixed-
               architectures. Finally, in Section 5, we take a zoomed-out look at    modemachines like PASM and TRAC, mixed-machine heteroge-
               the development of heterogeneous programming approaches, in-          neous systems became the more popular and accessible option
               trospect on some important takeaways and topics, and speculate        throughout the 1990s. Instead of a single machine with the ability
               about the future of next-generation heterogeneous systems.            to switch betweenasynchronousSIMDmodeandanasynchronous
                                                                                     MIMDmode,mixed-machinesystemscontainedavarietyofdiffer-
               2 DISTRIBUTEDHETEROGENEOUS                                            ent processing machines connected by a high-speed interconnect.
                   SYSTEMS1980-1995                                                     Examples of machines used in mixed-machine systems include
               Even40yearsago,computerscientists realized heterogeneity was          graphics and rendering-specific machines like the Pixel Planes 5,
               needed due to diminishing returns in homogeneous systems. In          Silicon Graphics 340 VGX, SIMD and vector machines like the
               the literature, the first references to the term "heterogeneous com-  MasParMP-series and the CM 200/2000, and coarse grained MIMD
               puting" revolved around the distinction between single instruc-       machines like the CM-5, Vista, and Sequent machines.
               tion, multiple data (SIMD) and multiple instruction, multiple data       It was well understood that different classes of machines (SIMD,
               (MIMD)machinesinadistributed computing environment.                   MIMD,vector, graphics, sequential) excelled at different tasks (par-
                 Several machines dating back to the 1980s were created and          allel computation, statistical analysis, rendering, display), and that
               advertised as heterogeneous computers. Although these machines        these machines could be networked together in a single system.
               were conceptually different than today’s heterogeneous machines,      However,coordinatingthesedistributedsystemstoexecuteasingle
                application presented significant challenges, which many of the                early surveyed works related to distributed heterogeneous com-
                projects in the next section began to address.                                 puting, and they heavily influenced the heterogeneous systems
                   In this section, we explore different programming frameworks                created and heterogeneous software and programming approaches
                developed to utilize these distributed heterogeneous systems. In               used. Ercegovac [106] lists how, at the time, the three different ap-
                Section 2.1, we review several surveys to gain a contextualized                proacheswerecombinedindifferentwaystoformthefivefollowing
                insight into the research consensus during the time period. Then in            heterogeneous approaches:
                Section 2.2, we review the most prominent and impactful program-                  (1) Mainframeswithintegratedvectorunits,programmedusing
                mingsystemsintroducedduringthistime.FinallyinSection2.3we                             a single instruction set augmented by vector instructions.
                discuss the evolution of distributed heterogeneous computing, and                 (2) Vector processors having two distinct types of instructions
                howitrelates to the subsequent sections.                                              andprocessors, scalar and vector. An early example includes
                                                                                                      the SAXPY system, which could be classified as a Matrix
                2.1     Distributed Heterogeneous Architectures,                                      Processing Unit.
                        Concepts, and Themes                                                      (3) Specialized processors attached to the host machine (AP).
                For insight into high-level perspectives, opinions, and the general                   This approach closely resembles accelerator-based heteroge-
                state of the area of early distributed heterogeneous computing, we                    neous computing, the subject of Section 4. The ST-100 and
                include discussions from several survey works published during                        ST-50 are early examples of this approach.
                the targeted time period. We aim to extract general trends and                    (4) Multiprocessor Systems with vector processors as nodes,
                overarching concepts that drove the development of early systems                      or scalar processors augmented with vector units as nodes.
                andearly heterogeneous programming approaches.                                        For example in PASM, mentioned earlier in this Section, the
                   TheworkbyErcegovac[106],HeterogeneityinSupercomputerAr-                            operating system supported multi-tasking at the FORTRAN
                chitectures, represents one of the first published works specifically                 level, and the programmer could use the fork/join API calls
                surveying the state of high performance heterogeneous computing.                      to exploit MIMD-level parallelism.
                Theydefine heterogeneity as the combination of different architec-                    CEDAR[172]representedanotherexampleofamultiproces-
                tures and system design styles into one system or machine, and                        sor cluster with eight processors, each processor modified
                their motivation for heterogeneous systems is summed up well by                       with an Alliant FX/8 mini-supercomputer. This allowed het-
                the following direct quote:                                                           erogeneity within clusters, and among clusters, and at the
                                                                                                      level of instructions, supporting vector processing, multipro-
                        Heterogeneityinthedesign(ofsupercomputers)needs                               cessing, and parallel processing.
                        to be considered when a point of diminishing returns                      (5) Special-purpose architectures that could contain heterogene-
                        in a homogeneous architecture is reached.                                     ity at both the implementation and function levels. The
                   As we see throughout this work, this drive for specialization                      Navier-Stokes computer (NSC) [262] is an example. The
                to counter diminishing returns from existing hardware repeatedly                      nodes could be personalized via firmware to respond to inte-
                resurfaces, and this motivation for heterogeneous systems is very                     rior or boundary nodes.
                muchrelevant today.                                                               Five years later, another relevant survey, Heterogeneous Com-
                   Ercegovac’sworkdefinesfourdistinctavenuesforheterogeneity:                  puting: Challenges and Opportunities was published by Khokhar
                    (1) System Level - The combination of a CPU and an I/O channel             et al [166]. Where the previous survey focused on heterogeneous
                        processor, or a host and special processor, or a master/slave          computing as a means to improve performance over homogeneous
                        multiprocessor system.                                                 systems, this work offers an additional motivation; instead of re-
                    (2) Operating System Level - The operating system in a dis-                placing existing costly multiprocessor systems, they propose to
                        tributed architecture, and how it handles functionality and            leverage heterogeneous computing to use existing systems in an in-
                        performance for a diverse set of nodes.                                tegrated environment. Conceptually, this motivation aligns closely
                    (3) Program Level - Within a program, tasks need to be defined             with the goals of grid and metacomputing, discussed in Section 3.
                        as concurrent, either by a programmer or compiler, and then               The authors present ten primary issues facing the developing
                        those tasks are allocated and executed on different proces-            heterogeneous computing systems, which also serve as a high-
                        sors.                                                                  level road map of the required facilities of a mature heterogeneous
                    (4) Instruction Level - Specialized units, like an arithmetic vector       programmingenvironment:
                        pipelines, are used to provide optimal cost/performance ra-               (1) Algorithm Design - Should existing algorithms be manually
                        tios. These units execute specialized instructions to achieve                 refactored to exploit heterogeneity, or automatically profiled
                        higher performance than possible with a generalized unit,                     to determine types of heterogeneous parallelism?
                        at an extra cost.                                                         (2) Code-type Profiling - The process of determining code prop-
                                                                                                      erties (vectorizable, SIMD/MIMD parallel, scalar, special pur-
                   At the time of Ercegovac’s work, there existed three primary                       pose)
                homogeneousprocessingapproachesinhigh-performancecomput-                          (3) Analytical Benchmarking - A quantitative method for deter-
                ing: (1) vector pipeline and array processors, (2) multiprocessors                    mining which code patterns and properties most appropri-
                and multi-computers following the MIMD model, and (3) attached                        ately map to which heterogeneous components in a hetero-
                SIMDprocessors. These approaches were ubiquitous across all the                       geneous system
                                                                                           2
                  (4) Partitioning - The process of dividing up an assigning an         FreundandSiegelalsooffertwopotentialprogrammingparadigms:
                     application to heterogeneous system, informed by the code-      (1) the adaptation of existing languages for heterogeneous envi-
                     type profiling and analytical benchmarking steps.               ronments and (2) explicitly designed languages with heterogene-
                  (5) Machine Selection - Given an array of available heteroge-      ity in mind. They discuss advantages and disadvantages of both
                     neous machines, what is the process for selecting the most      paradigms. This discussion of balance between specificity and gen-
                     appropriate machine for a given application. Typically, the     erality in heterogeneous program paradigms continues today, with
                     goal of machine selection methods and algorithms, for ex-       contention between specific approaches like CUDA and general
                     ample the Heterogeneous Optimal Selection Theory (HOST)         approaches like OneAPI. Additionally, the authors depart from the
                     algorithm [65] was to select the least expensive machine        opinion that there would be one true compiler, architecture, oper-
                     while respecting a maximal execution time.                      ating system, and tool set to handle all heterogeneous tasks well,
                  (6) Scheduling - A heterogeneous system-level scheduler needs      insisting that a variety of options will likely be beneficial depending
                     to be aware of the different heterogeneous components and       ontheapplication and context.
                     schedule accordingly.                                              In the conclusion, the authors predict that heterogeneity will
                  (7) Synchronization - Communication between senders and re-        always be necessary for wide classes of HPC problems; computa-
                     ceivers, shared data structures, and collectives operations     tional demands will always exceed capacity and grow faster than
                     presented novel challenges in heterogeneous systems.            hardware capabilities. This has certainly proven to be true, as het-
                  (8) Network - The interconnection network itself between het-      erogeneous computing is a staple in today’s high-performance
                     erogeneous machines presented challenges.                       computing.
                  (9) Programming Environments - Unlike today, where program-           The 1994 work by Weems et al., Linguistic Support for Hetero-
                     mibility and productivity lie at the forefront of heteroge-     geneous Parallel Processing: A Survey and an Approach [292], is
                     neoussystemdiscussions, in this work the discussion of pro-     particularly interesting in the context of this project. As previ-
                     grammingenvironmentsalmost seems like an afterthought.          ously mentioned, programming approaches and methodologies are
                     This is not unusual in works exploring early heterogeneous      typically a minor consideration in many early heterogeneous com-
                     systems however, as hardware system-level issues were typ-      puting works. However, this work explored the existing options for
                     ically the primary focus. However, they do mention that         heterogeneous programming and the challenges and requirements
                     a programming language would need to be independent,            for heterogeneous languages.
                     portable, and include cross-parallel compilers and debug-          The authors define three essential criteria for evaluating the
                     gers.                                                           suitability of languages for heterogeneous computing: (1) efficiency
                 (10) PerformanceEvaluation-Finally,theydiscusstheneedforde-         and performance, (2) ease of implementation, and (3) portability.
                     velopmentofnovelperformanceevaluationtoolsspecifically          Theydiscuss how languages would need to support an orthogonal
                     designed for heterogeneous systems.                             combinationofdifferentprogrammingmodels,includingsequential,
                                                                                     control (task) parallelism, coarse and fine-grained data parallelism,
                                                                                     andsharedanddistributedmemory.Theystressthatheterogeneous
                 Insummary,theauthorscallforaneedforbettertoolstoidentify            programminglanguages must be extendable to avoid limitations
               parallelism, improved high-speed networking and communication         on their adaptability, and that abstractions over trivialities must
               protocols, standards for interfaces between machines, efficient par-  be provided in order to not overwhelm programmers, while still
               titioning and mapping strategies, and user-friendly interfaces and    providing access details needed by system software. Furthermore,
               programmingenvironments. Many of these issues are addressed           they discuss the need for an appropriate model of parallelism at
               bytheprogrammingapproachesandimplementationsdiscussed                 different levels, i.e., control parallelism at a high level, and data
               throughout this work. However, as more heterogeneous and spe-         parallelism at a lower level. These kinds of considerations and
               cialized processors emerge (Sections 4.8 and 4.9), many of these      concernsarestillrelevanttoday.Forexample,theubiquitousMPI+X
               issues resurface and remain as outstanding issues and challenges      approach has long been the de facto solution for this kind of tiered
               with today’s high-performance heterogeneous computing.                parallelism, but requires interfacing with two standards and two
                 In the guest editor’s introduction of the 1993 ACM Computer         implementations.
               journal, a special edition on Heterogeneous Processing, Freund and       Weemsetal.thensurveytheexistinglanguages,anddiscusstheir
               Siegel offer a high-level perspective on the then-current state of    limitations with respect to their vision of a truly heterogeneous
               high-performance heterogeneous computing [117].                       language. They include Cluster-M [107], HPF [170], Delirium [203],
                 They offer several motivations for heterogeneous processing.        Linda[62],PCN[115],PVM[278],p4[60],C**,PC++,ABCL/1[302],
               Different types of tasks inherently contain different computational   Jade [254], Ada9x [276], Charm++ [161], and Mentat [126] in the
               characteristics requiring different types of processors, and forcing  discussion, some of which are explored in this project in Section 2.2.
               all problem sets to map to the same fixed processor is unnatural.     Theyfurther detail six features of an ideal heterogeneous program-
               They also consider the notion that the primary goal of heteroge-      minglanguage:
               neous computing should be to maximize usable performance as              (1) supports any mode of parallelism
               opposed to peak performance, by means of using all available hard-       (2) supports any grain size
               wareinaheterogeneouswayinsteadofmaximizingperformance                    (3) supports both implicit and explicit communication
               onaspecific processor.                                                   (4) users can define and abstract synchronizations
                                                                                  3
The words contained in this file might help you see if this file matches what you are looking for:

...Evolution of programmingapproachesforhigh performance heterogeneoussystems jacob lambert university oregon jlambert cs uoregon edu advisor allen d malony external seyong lee oakridgenational lab lees ornl gov area examreport committeemembers allenmalony boyananorris hankchilds computerscience united states december abstract they still were created to address the same challenges using opti nearly all contemporary high systems rely on hetero mized hardware execute specific algorithmic patterns geneous computation as a result scientific application developers thepartitionable simd mimd system pasm machine are increasingly pushed explore heterogeneous programming developed at purdue in was initially for approaches this project we discuss long history image processing and pattern recognition computing analyze unique that it could be dynamically reconfigured into either from distributed grid com simdormimdmachine oracombinationthereof thegoalwas puting accelerator based supercomputers create...

no reviews yet
Please Login to review.