With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. * Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC * Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems * Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Emmanuel Jeannot is a Senior Research Scientist at INRIA. He received his PhD in computer science from Ecole Normale Superieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms and models for parallel machines. Julius ?ilinskas is a Principal Researcher and a Head of Department at Vilnius University in Vilnius, Lithuania. His research interests include parallel computing, optimization, data analysis and visualization.
Contributors xxiii Preface xxvii PART I INTRODUCTION 1 1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3 Emmanuel Jeannot and Julius Zilinskas 1.1 Introduction and Vision 4 1.2 Scientific Organization 6 1.3 Activities of the Project 6 1.4 Main Outcomes of the Action 7 1.5 Contents of the Book 8 PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11 2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13 Dimitar Lukarski and Maya Neytcheva 2.1 Introduction 14 2.2 General Description of Iterative Methods and Preconditioning 16 2.3 Preconditioning Techniques 20 2.4 Defect-Correction Technique 21 2.5 Multigrid Method 22 2.6 Parallelization of Iterative Methods 22 2.7 Heterogeneous Systems 23 2.8 Maintenance and Portability 29 2.9 Conclusion 30 3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33 Matjaz Depolli, Gregor Kosec, and Roman Trobec 3.1 Introduction 34 3.2 Test Case 35 3.3 Parallel Implementation 39 3.4 Results 41 3.5 Discussion 45 3.6 Conclusion 47 4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51 Natalija Tumanova and Raimondas Ciegis 4.1 Introduction 51 4.2 Formulation of the Discrete Model 53 4.3 Parallel Algorithms 59 4.4 Computational Results 63 4.5 Conclusions 69 PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73 5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75 Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier 5.1 Introduction 76 5.2 General Overview 76 5.3 Formalization of the Problem 79 5.4 Algorithmic Strategies for Topology Mapping 81 5.5 Mapping Enforcement Techniques 82 5.6 Survey of Solutions 85 5.7 Conclusion and Open Problems 89 6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95 Kiril Dichev and Alexey Lastovetsky 6.1 Introduction 95 6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97 6.3 Optimizations of Collectives on Homogeneous Clusters 98 6.4 Heterogeneous Networks 99 6.5 Topology- and Performance-Aware Collectives 100 6.6 Topology as Input 101 6.7 Performance as Input 102 6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106 6.9 Conclusion 111 7. Effective Data Access Patterns on Massively Parallel Processors 115 Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini 7.1 Introduction 115 7.2 Architectural Details 116 7.3 K-Model 117 7.4 Parallel Prefix Sum 120 7.5 Bitonic Sorting Networks 126 7.6 Final Remarks 132 8. Scalable Storage I/O Software for Blue Gene Architectures 135 Florin Isaila, Javier Garcia, and Jesús Carretero 8.1 Introduction 135 8.2 Blue Gene System Overview 136 8.3 Design and Implementation 138 8.4 Conclusions and Future Work 142 PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145 9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147 Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter 9.1 Introduction 148 9.2 Concurrent Workflow Scheduling 153 9.3 Experimental Results and Discussion 160 9.4 Conclusions 165 10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169 Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski 10.1 Introduction 169 10.2 Related Works 171 10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172 10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173 10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178 10.6 Methods of Increasing the Algorithm Performance on GPUs 181 10.7 GPU Performance Evaluation 185 10.8 Conclusions and Future Works 190 11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193 Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro 11.1 Introduction 194 11.2 A Low-Cost Heterogeneous Computing Environment 196 11.3 First Case Study: The N-Body Problem 200 11.4 Second Case Study: The Convolution Algorithm 206 11.5 Conclusions 211 12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215 Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez 12.1 Introduction 215 12.2 Computation of Green's functions in Hybrid Systems 216 12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222 12.4 Autotuning Parallel Codes 226 12.5 Conclusions and Future Research 230 PART V CPU + GPU COPROCESSING 235 13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237 David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong 13.1 Introduction 238 13.2 Related Work 241 13.3 Data Partitioning Based on Functional Performance Model 243 13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245 13.5 Performance Measurement on CPUs/GPUs System 247 13.6 Functional Performance Models of Multiple Cores and GPUs 248 13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250 13.8 Efficient Building of Functional Performance Models 251 13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253 13.10 Conclusion 257 14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261 Aleksandar Ilic and Leonel Sousa 14.1 Introduction: Heterogeneous CPU + GPU Systems 262 14.2 Background and Related Work 265 14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269 14.4 Experimental Results 275 14.5 Conclusions 279 15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283 Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano 15.1 Introduction 283 15.2 Algorithmic Overview 285 15.3 CUDA Overview 287 15.4 Heterogeneous Systems and Load Balancing 288 15.5 Parallel Solutions to The APSP 289 15.6 Experimental Setup 291 15.7 Experimental Results 293 15.8 Conclusions 297 PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301 16. Resource Management for HPC on the Cloud 303 Marc E. Frincu and Dana Petcu 16.1 Introduction 303 16.2 On the Type of Applications for HPC and HPC2 305 16.3 HPC on the Cloud 306 16.4 Scheduling Algorithms for HPC2 311 16.5 Toward an Autonomous Scheduling Framework 312 16.6 Conclusions 319 17. Resource Discovery in Large-Scale Grid Systems 323 Konstantinos Karaoglanoglou and Helen Karatza 17.1 Introduction and Background 323 17.2 The Semantic Communities Approach 325 17.3 The P2P Approach 329 17.4 The Grid-Routing Transferring Approach 333 17.5 Conclusions 337 PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341 18. Energy-Aware Approaches for HPC Systems 343 Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson 18.1 Introduction 344 18.2 Power Consumption of Servers 345 18.3 Classification and Energy Profiles of HPC Applications 354 18.4 Policies and Leverages 359 18.5 Conclusion 360 19. Strategies for Increased Energy Awareness in Cloud Federations 365 Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth 19.1 Introduction 365 19.2 Related Work 367 19.3 Scenarios 369 19.4 Energy-Aware Cloud Federations 374 19.5 Conclusions 379 20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383 Ozcan Ozturk and Suleyman Tosun 20.1 Introduction 384 20.2 Related Work 386 20.3 Overview of Our Approach 387 20.4 Heterogeneous CMP Design for Network Security Processors 390 20.5 Experimental Evaluation 394 20.6 Concluding Remarks 397 PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401 21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403 Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza 21.1 Introduction 404 21.2 CBIR For Hyperspectral Imaging Data 407 21.3 Jungle Computing 410 21.4 IBIS and Constellation 412 21.5 System Design and Implementation 415 21.6 Evaluation 420 21.7 Conclusions 426 22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429 Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun 22.1 Introduction 430 22.2 Related Work 431 22.3 Parallel Image Processing on GPU 433 22.4 Image Processing on Heterogeneous Architectures 437 22.5 Video Processing on GPU 438 22.6 Experimental Results 444 22.7 Conclusion 447 23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451 Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez 23.1 Introduction 452 23.2 Tomographic Reconstruction 453 23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455 23.4 Hybrid CPU + GPU Tomographic Reconstruction 457 23.5 Results 459 23.6 Discussion and Conclusion 461 Acknowledgments 463 References 463 Index 467
Show moreWith recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. * Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC * Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems * Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Emmanuel Jeannot is a Senior Research Scientist at INRIA. He received his PhD in computer science from Ecole Normale Superieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms and models for parallel machines. Julius ?ilinskas is a Principal Researcher and a Head of Department at Vilnius University in Vilnius, Lithuania. His research interests include parallel computing, optimization, data analysis and visualization.
Contributors xxiii Preface xxvii PART I INTRODUCTION 1 1. Summary of the Open European Network for High-Performance Computing in Complex Environments 3 Emmanuel Jeannot and Julius Zilinskas 1.1 Introduction and Vision 4 1.2 Scientific Organization 6 1.3 Activities of the Project 6 1.4 Main Outcomes of the Action 7 1.5 Contents of the Book 8 PART II NUMERICAL ANALYSIS FOR HETEROGENEOUS AND MULTICORE SYSTEMS 11 2. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13 Dimitar Lukarski and Maya Neytcheva 2.1 Introduction 14 2.2 General Description of Iterative Methods and Preconditioning 16 2.3 Preconditioning Techniques 20 2.4 Defect-Correction Technique 21 2.5 Multigrid Method 22 2.6 Parallelization of Iterative Methods 22 2.7 Heterogeneous Systems 23 2.8 Maintenance and Portability 29 2.9 Conclusion 30 3. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33 Matjaz Depolli, Gregor Kosec, and Roman Trobec 3.1 Introduction 34 3.2 Test Case 35 3.3 Parallel Implementation 39 3.4 Results 41 3.5 Discussion 45 3.6 Conclusion 47 4. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51 Natalija Tumanova and Raimondas Ciegis 4.1 Introduction 51 4.2 Formulation of the Discrete Model 53 4.3 Parallel Algorithms 59 4.4 Computational Results 63 4.5 Conclusions 69 PART III COMMUNICATION AND STORAGE CONSIDERATIONS IN HIGH-PERFORMANCE COMPUTING 73 5. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75 Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier 5.1 Introduction 76 5.2 General Overview 76 5.3 Formalization of the Problem 79 5.4 Algorithmic Strategies for Topology Mapping 81 5.5 Mapping Enforcement Techniques 82 5.6 Survey of Solutions 85 5.7 Conclusion and Open Problems 89 6. Optimization of Collective Communication for Heterogeneous HPC Platforms 95 Kiril Dichev and Alexey Lastovetsky 6.1 Introduction 95 6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97 6.3 Optimizations of Collectives on Homogeneous Clusters 98 6.4 Heterogeneous Networks 99 6.5 Topology- and Performance-Aware Collectives 100 6.6 Topology as Input 101 6.7 Performance as Input 102 6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106 6.9 Conclusion 111 7. Effective Data Access Patterns on Massively Parallel Processors 115 Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini 7.1 Introduction 115 7.2 Architectural Details 116 7.3 K-Model 117 7.4 Parallel Prefix Sum 120 7.5 Bitonic Sorting Networks 126 7.6 Final Remarks 132 8. Scalable Storage I/O Software for Blue Gene Architectures 135 Florin Isaila, Javier Garcia, and Jesús Carretero 8.1 Introduction 135 8.2 Blue Gene System Overview 136 8.3 Design and Implementation 138 8.4 Conclusions and Future Work 142 PART IV EFFICIENT EXPLOITATION OF HETEROGENEOUS ARCHITECTURES 145 9. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147 Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter 9.1 Introduction 148 9.2 Concurrent Workflow Scheduling 153 9.3 Experimental Results and Discussion 160 9.4 Conclusions 165 10. Systematic Mapping of Reed-Solomon Erasure Codes on Heterogeneous Multicore Architectures 169 Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski 10.1 Introduction 169 10.2 Related Works 171 10.3 Reed-Solomon Codes and Linear Algebra Algorithms 172 10.4 Mapping Reed-Solomon Codes on Cell/B.E. Architecture 173 10.5 Mapping Reed-Solomon Codes on Multicore GPU Architectures 178 10.6 Methods of Increasing the Algorithm Performance on GPUs 181 10.7 GPU Performance Evaluation 185 10.8 Conclusions and Future Works 190 11. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193 Daniele D'Agostino, Andrea Clematis, and Emanuele Danovaro 11.1 Introduction 194 11.2 A Low-Cost Heterogeneous Computing Environment 196 11.3 First Case Study: The N-Body Problem 200 11.4 Second Case Study: The Convolution Algorithm 206 11.5 Conclusions 211 12. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215 Alejandro Alvarez-Melcon, Fernando D. Quesada, Domingo Gimenez, Carlos Pérez-Alcaraz, Jose-Gines Picon, and Tomas Ramírez 12.1 Introduction 215 12.2 Computation of Green's functions in Hybrid Systems 216 12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222 12.4 Autotuning Parallel Codes 226 12.5 Conclusions and Future Research 230 PART V CPU + GPU COPROCESSING 235 13. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237 David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong 13.1 Introduction 238 13.2 Related Work 241 13.3 Data Partitioning Based on Functional Performance Model 243 13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245 13.5 Performance Measurement on CPUs/GPUs System 247 13.6 Functional Performance Models of Multiple Cores and GPUs 248 13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250 13.8 Efficient Building of Functional Performance Models 251 13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253 13.10 Conclusion 257 14. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261 Aleksandar Ilic and Leonel Sousa 14.1 Introduction: Heterogeneous CPU + GPU Systems 262 14.2 Background and Related Work 265 14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269 14.4 Experimental Results 275 14.5 Conclusions 279 15. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283 Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano 15.1 Introduction 283 15.2 Algorithmic Overview 285 15.3 CUDA Overview 287 15.4 Heterogeneous Systems and Load Balancing 288 15.5 Parallel Solutions to The APSP 289 15.6 Experimental Setup 291 15.7 Experimental Results 293 15.8 Conclusions 297 PART VI EFFICIENT EXPLOITATION OF DISTRIBUTED SYSTEMS 301 16. Resource Management for HPC on the Cloud 303 Marc E. Frincu and Dana Petcu 16.1 Introduction 303 16.2 On the Type of Applications for HPC and HPC2 305 16.3 HPC on the Cloud 306 16.4 Scheduling Algorithms for HPC2 311 16.5 Toward an Autonomous Scheduling Framework 312 16.6 Conclusions 319 17. Resource Discovery in Large-Scale Grid Systems 323 Konstantinos Karaoglanoglou and Helen Karatza 17.1 Introduction and Background 323 17.2 The Semantic Communities Approach 325 17.3 The P2P Approach 329 17.4 The Grid-Routing Transferring Approach 333 17.5 Conclusions 337 PART VII ENERGY AWARENESS IN HIGH-PERFORMANCE COMPUTING 341 18. Energy-Aware Approaches for HPC Systems 343 Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson 18.1 Introduction 344 18.2 Power Consumption of Servers 345 18.3 Classification and Energy Profiles of HPC Applications 354 18.4 Policies and Leverages 359 18.5 Conclusion 360 19. Strategies for Increased Energy Awareness in Cloud Federations 365 Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth 19.1 Introduction 365 19.2 Related Work 367 19.3 Scenarios 369 19.4 Energy-Aware Cloud Federations 374 19.5 Conclusions 379 20. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383 Ozcan Ozturk and Suleyman Tosun 20.1 Introduction 384 20.2 Related Work 386 20.3 Overview of Our Approach 387 20.4 Heterogeneous CMP Design for Network Security Processors 390 20.5 Experimental Evaluation 394 20.6 Concluding Remarks 397 PART VIII APPLICATIONS OF HETEROGENEOUS HIGH-PERFORMANCE COMPUTING 401 21. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403 Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza 21.1 Introduction 404 21.2 CBIR For Hyperspectral Imaging Data 407 21.3 Jungle Computing 410 21.4 IBIS and Constellation 412 21.5 System Design and Implementation 415 21.6 Evaluation 420 21.7 Conclusions 426 22. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429 Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun 22.1 Introduction 430 22.2 Related Work 431 22.3 Parallel Image Processing on GPU 433 22.4 Image Processing on Heterogeneous Architectures 437 22.5 Video Processing on GPU 438 22.6 Experimental Results 444 22.7 Conclusion 447 23. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451 Jose Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez 23.1 Introduction 452 23.2 Tomographic Reconstruction 453 23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455 23.4 Hybrid CPU + GPU Tomographic Reconstruction 457 23.5 Results 459 23.6 Discussion and Conclusion 461 Acknowledgments 463 References 463 Index 467
Show moreContributors xxiii
Preface xxvii
Part I Introduction 1
1. Summary of the Open European Network for High-Performance
Computing in Complex Environments 3
Emmanuel Jeannot and
Julius ilinskas
1.1 Introduction and Vision 4
1.2 Scientific Organization 6
1.2.1 Scientific Focus 6
1.2.2 Working Groups 6
1.3 Activities of the Project 6
1.3.1 Spring Schools 6
1.3.2 International Workshops 7
1.3.3 Working Groups Meetings 7
1.3.4 Management Committee Meetings 7
1.3.5 Short-Term Scientific Missions 7
1.4 Main Outcomes of the Action 7
1.5 Contents of the Book 8
Acknowledgment 10
Part II Numerical Analysis for Heterogeneous and Multicore Systems 11
2. On the Impact of the Heterogeneous Multicore and Many-Core
Platforms on Iterative Solution Methods and Preconditioning
Techniques 13
Dimitar Lukarski and Maya Neytcheva
2.1 Introduction 14
2.2 General Description of Iterative Methods and Preconditioning 16
2.2.1 Basic Iterative Methods 16
2.2.2 Projection Methods: CG and GMRES 18
2.3 Preconditioning Techniques 20
2.4 Defect-Correction Technique 21
2.5 Multigrid Method 22
2.6 Parallelization of Iterative Methods 22
2.7 Heterogeneous Systems 23
2.7.1 Heterogeneous Computing 24
2.7.2 Algorithm Characteristics and Resource Utilization 25
2.7.3 Exposing Parallelism 26
2.7.4 Heterogeneity in Matrix Computation 26
2.7.5 Setup of Heterogeneous Iterative Solvers 27
2.8 Maintenance and Portability 29
2.9 Conclusion 30
Acknowledgments 31
References 31
3. Efficient Numerical Solution of 2D Diffusion Equation on
Multicore Computers 33
Matja Depolli, Gregor Kosec, and
Roman Trobec
3.1 Introduction 34
3.2 Test Case 35
3.2.1 Governing Equations 35
3.2.2 Solution Procedure 36
3.3 Parallel Implementation 39
3.3.1 Intel PCM Library 39
3.3.2 OpenMP 40
3.4 Results 41
3.4.1 Results of Numerical Integration 41
3.4.2 Parallel Efficiency 42
3.5 Discussion 45
3.6 Conclusion 47
Acknowledgment 47
References 47
4. Parallel Algorithms for Parabolic Problems on Graphs in
Neuroscience 51
Natalija Tumanova and Raimondas Ciegis
4.1 Introduction 51
4.2 Formulation of the Discrete Model 53
4.2.1 The 𝜃-Implicit Discrete Scheme 55
4.2.2 The Predictor–Corrector Algorithm I 57
4.2.3 The Predictor–Corrector Algorithm II 58
4.3 Parallel Algorithms 59
4.3.1 Parallel 𝜃-Implicit Algorithm 59
4.3.2 Parallel Predictor–Corrector Algorithm I 62
4.3.3 Parallel Predictor–Corrector Algorithm II 63
4.4 Computational Results 63
4.4.1 Experimental Comparison of Predictor–Corrector Algorithms 66
4.4.2 Numerical Experiment of Neuron Excitation 68
4.5 Conclusions 69
Acknowledgments 70
References 70
Part III Communication and Storage Considerations in High-Performance Computing 73
5. An Overview of Topology Mapping Algorithms and Techniques
in High-Performance Computing 75
Torsten Hoefler, Emmanuel
Jeannot, and Guillaume Mercier
5.1 Introduction 76
5.2 General Overview 76
5.2.1 A Key to Scalability: Data Locality 77
5.2.2 Data Locality Management in Parallel Programming Models 77
5.2.3 Virtual Topology: Definition and Characteristics 78
5.2.4 Understanding the Hardware 79
5.3 Formalization of the Problem 79
5.4 Algorithmic Strategies for Topology Mapping 81
5.4.1 Greedy Algorithm Variants 81
5.4.2 Graph Partitioning 82
5.4.3 Schemes Based on Graph Similarity 82
5.4.4 Schemes Based on Subgraph Isomorphism 82
5.5 Mapping Enforcement Techniques 82
5.5.1 Resource Binding 83
5.5.2 Rank Reordering 83
5.5.3 Other Techniques 84
5.6 Survey of Solutions 85
5.6.1 Algorithmic Solutions 85
5.6.2 Existing Implementations 85
5.7 Conclusion and Open Problems 89
Acknowledgment 90
References 90
6. Optimization of Collective Communication for Heterogeneous
HPC Platforms 95
Kiril Dichev and Alexey Lastovetsky
6.1 Introduction 95
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 97
6.3 Optimizations of Collectives on Homogeneous Clusters 98
6.4 Heterogeneous Networks 99
6.4.1 Comparison to Homogeneous Clusters 99
6.5 Topology- and Performance-Aware Collectives 100
6.6 Topology as Input 101
6.7 Performance as Input 102
6.7.1 Homogeneous Performance Models 103
6.7.2 Heterogeneous Performance Models 105
6.7.3 Estimation of Parameters of Heterogeneous Performance Models 106
6.7.4 Other Performance Models 106
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 106
6.8.1 Optimal Solutions with Multiple Spanning Trees 107
6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 107
6.8.3 Network Models Inspired by BitTorrent 108
6.9 Conclusion 111
Acknowledgments 111
References 111
7. Effective Data Access Patterns on Massively Parallel
Processors 115
Gabriele Capannini, Ranieri Baraglia,
Fabrizio Silvestri, and Franco Maria Nardini
7.1 Introduction 115
7.2 Architectural Details 116
7.3 K-Model 117
7.3.1 The Architecture 117
7.3.2 Cost and Complexity Evaluation 118
7.3.3 Efficiency Evaluation 119
7.4 Parallel Prefix Sum 120
7.4.1 Experiments 125
7.5 Bitonic Sorting Networks 126
7.5.1 Experiments 131
7.6 Final Remarks 132
Acknowledgments 133
References 133
8. Scalable Storage I/O Software for Blue Gene Architectures
135
Florin Isaila, Javier Garcia, and Jesús Carretero
8.1 Introduction 135
8.2 Blue Gene System Overview 136
8.2.1 Blue Gene Architecture 136
8.2.2 Operating System Architecture 136
8.3 Design and Implementation 138
8.3.1 The Client Module 139
8.3.2 The I/O Module 141
8.4 Conclusions and Future Work 142
Acknowledgments 142
References 142
Part IV Efficient Exploitation af Heterogeneous Architectures 145
9. Fair Resource Sharing for Dynamic Scheduling of Workflows
on Heterogeneous Systems 147
Hamid Arabnejad, Jorge G.
Barbosa, and Frédéric Suter
9.1 Introduction 148
9.1.1 Application Model 148
9.1.2 System Model 151
9.1.3 Performance Metrics 152
9.2 Concurrent Workflow Scheduling 153
9.2.1 Offline Scheduling of Concurrent Workflows 154
9.2.2 Online Scheduling of Concurrent Workflows 155
9.3 Experimental Results and Discussion 160
9.3.1 DAG Structure 160
9.3.2 Simulated Platforms 160
9.3.3 Results and Discussion 162
9.4 Conclusions 165
Acknowledgments 166
References 166
10. Systematic Mapping of Reed–Solomon Erasure Codes on
Heterogeneous Multicore Architectures 169
Roman Wyrzykowski,
Marcin Wozniak, and Lukasz Kuczynski
10.1 Introduction 169
10.2 Related Works 171
10.3 Reed–Solomon Codes and Linear Algebra Algorithms 172
10.4 Mapping Reed–Solomon Codes on Cell/B.E. Architecture 173
10.4.1 Cell/B.E. Architecture 173
10.4.2 Basic Assumptions for Mapping 174
10.4.3 Vectorization Algorithm and Increasing its Efficiency 175
10.4.4 Performance Results 177
10.5 Mapping Reed–Solomon Codes on Multicore GPU Architectures 178
10.5.1 Parallelization of Reed–Solomon Codes on GPU Architectures 178
10.5.2 Organization of GPU Threads 180
10.6 Methods of Increasing the Algorithm Performance on GPUs 181
10.6.1 Basic Modifications 181
10.6.2 Stream Processing 182
10.6.3 Using Shared Memory 184
10.7 GPU Performance Evaluation 185
10.7.1 Experimental Results 185
10.7.2 Performance Analysis using the Roofline Model 187
10.8 Conclusions and Future Works 190
Acknowledgments 191
References 191
11. Heterogeneous Parallel Computing Platforms and Tools for
Compute-Intensive Algorithms: A Case Study 193
Daniele
D’Agostino, Andrea Clematis, and Emanuele Danovaro
11.1 Introduction 194
11.2 A Low-Cost Heterogeneous Computing Environment 196
11.2.1 Adopted Computing Environment 199
11.3 First Case Study: The N-Body Problem 200
11.3.1 The Sequential N-Body Algorithm 201
11.3.2 The Parallel N-Body Algorithm for Multicore Architectures 203
11.3.3 The Parallel N-Body Algorithm for CUDA Architectures 204
11.4 Second Case Study: The Convolution Algorithm 206
11.4.1 The Sequential Convolver Algorithm 206
11.4.2 The Parallel Convolver Algorithm for Multicore Architectures 207
11.4.3 The Parallel Convolver Algorithm for GPU Architectures 208
11.5 Conclusions 211
Acknowledgments 212
References 212
12. Efficient Application of Hybrid Parallelism in
Electromagnetism Problems 215
Alejandro Álvarez-Melcón,
Fernando D. Quesada, Domingo Giménez, Carlos Pérez-Alcaraz,
José-Ginés Picón, and Tomás Ramírez
12.1 Introduction 215
12.2 Computation of Green’s functions in Hybrid Systems 216
12.2.1 Computation in a Heterogeneous Cluster 217
12.2.2 Experiments 218
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 222
12.3.1 Experiments 222
12.4 Autotuning Parallel Codes 226
12.4.1 Empirical Autotuning 227
12.4.2 Modeling the Linear Algebra Routines 229
12.5 Conclusions and Future Research 230
Acknowledgments 231
References 232
Part V CPU + GPU Coprocessing 235
13. Design and Optimization of Scientific Applications for
Highly Heterogeneous and Hierarchical HPC Platforms Using
Functional Computation Performance Models 237
David Clarke,
Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel
Sousa, and Ziming Zhong
13.1 Introduction 238
13.2 Related Work 241
13.3 Data Partitioning Based on Functional Performance Model 243
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 245
13.5 Performance Measurement on CPUs/GPUs System 247
13.6 Functional Performance Models of Multiple Cores and GPUs 248
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 250
13.8 Efficient Building of Functional Performance Models 251
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 253
13.10 Conclusion 257
Acknowledgments 259
References 259
14. Efficient Multilevel Load Balancing on Heterogeneous CPU
+ GPU Systems 261
Aleksandar Ilic and Leonel Sousa
14.1 Introduction: Heterogeneous CPU + GPU Systems 262
14.1.1 Open Problems and Specific Contributions 263
14.2 Background and Related Work 265
14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 265
14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 268
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 269
14.3.1 Multilevel Simultaneous Load Balancing Algorithm 270
14.3.2 Algorithm for Multi-Installment Processing with Multidistributions 273
14.4 Experimental Results 275
14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 275
14.4.2 AMPMD Evaluation: 2D FFT Case Study 277
14.5 Conclusions 279
Acknowledgments 280
References 280
15. The All-Pair Shortest-Path Problem in Shared-Memory
Heterogeneous Systems 283
Hector Ortega-Arranz, Yuri Torres,
Diego R. Llanos, and Arturo Gonzalez-Escribano
15.1 Introduction 283
15.2 Algorithmic Overview 285
15.2.1 Graph Theory Notation 285
15.2.2 Dijkstra’s Algorithm 286
15.2.3 Parallel Version of Dijkstra’s Algorithm 287
15.3 CUDA Overview 287
15.4 Heterogeneous Systems and Load Balancing 288
15.5 Parallel Solutions to The APSP 289
15.5.1 GPU Implementation 289
15.5.2 Heterogeneous Implementation 290
15.6 Experimental Setup 291
15.6.1 Methodology 291
15.6.2 Target Architectures 292
15.6.3 Input Set Characteristics 292
15.6.4 Load-Balancing Techniques Evaluated 292
15.7 Experimental Results 293
15.7.1 Complete APSP 293
15.7.2 512-Source-Node-to-All Shortest Path 295
15.7.3 Experimental Conclusions 296
15.8 Conclusions 297
Acknowledgments 297
References 297
Part VI Efficient Exploitation of Distributed Systems 301
16. Resource Management for HPC on the Cloud 303
Marc
E. Frincu and Dana Petcu
16.1 Introduction 303
16.2 On the Type of Applications for HPC and HPC2 305
16.3 HPC on the Cloud 306
16.3.1 General PaaS Solutions 306
16.3.2 On-Demand Platforms for HPC 310
16.4 Scheduling Algorithms for HPC2 311
16.5 Toward an Autonomous Scheduling Framework 312
16.5.1 Autonomous Framework for RMS 313
16.5.2 Self-Management 315
16.5.3 Use Cases 317
16.6 Conclusions 319
Acknowledgment 320
References 320
17. Resource Discovery in Large-Scale Grid Systems
323
Konstantinos Karaoglanoglou and Helen Karatza
17.1 Introduction and Background 323
17.1.1 Introduction 323
17.1.2 Resource Discovery in Grids 324
17.1.3 Background 325
17.2 The Semantic Communities Approach 325
17.2.1 Grid Resource Discovery Using Semantic Communities 325
17.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 327
17.3 The P2P Approach 329
17.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 329
17.3.2 P2P Protocols for Resource Discovery in the Grid 330
17.4 The Grid-Routing Transferring Approach 333
17.4.1 Resource Discovery Based on Matchmaking Routers 333
17.4.2 Acquiring Knowledge in a Large-Scale Grid System 335
17.5 Conclusions 337
Acknowledgment 338
References 338
Part VII Energy Awareness in High-Performance Computing 341
18. Energy-Aware Approaches for HPC Systems 343
Robert
Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa,
Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson
18.1 Introduction 344
18.2 Power Consumption of Servers 345
18.2.1 Server Modeling 346
18.2.2 Power Prediction Models 347
18.3 Classification and Energy Profiles of HPC Applications 354
18.3.1 Phase Detection 356
18.3.2 Phase Identification 358
18.4 Policies and Leverages 359
18.5 Conclusion 360
Acknowledgements 361
References 361
19. Strategies for Increased Energy Awareness in Cloud
Federations 365
Gabor Kecskemeti, AttilaKertesz, Attila Cs.
Marosi, and Zsolt Nemeth
19.1 Introduction 365
19.2 Related Work 367
19.3 Scenarios 369
19.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 369
19.3.2 Energy Considerations in Commercial Cloud Federations 372
19.3.3 Reduced Energy Footprint of Academic Cloud Federations 374
19.4 Energy-Aware Cloud Federations 374
19.4.1 Availability of Energy-Consumption-Related Information 375
19.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 376
19.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 377
19.5 Conclusions 379
Acknowledgments 380
References 380
20. Enabling Network Security in HPC Systems Using
Heterogeneous CMPs 383
Ozcan Ozturk and Suleyman Tosun
20.1 Introduction 384
20.2 Related Work 386
20.3 Overview of Our Approach 387
20.3.1 Heterogeneous CMP Architecture 387
20.3.2 Network Security Application Behavior 388
20.3.3 High-Level View 389
20.4 Heterogeneous CMP Design for Network Security Processors 390
20.4.1 Task Assignment 390
20.4.2 ILP Formulation 391
20.4.3 Discussion 393
20.5 Experimental Evaluation 394
20.5.1 Setup 394
20.5.2 Results 395
20.6 Concluding Remarks 397
Acknowledgments 397
References 397
Part VIII Applications of Heterogeneous High-Performance Computing 401
21. Toward a High-Performance Distributed CBIR System for
Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing
403
Timo van Kessel, NielsDrost, Jason Maassen, Henri E.
Bal, Frank J. Seinstra, and Antonio J. Plaza
21.1 Introduction 404
21.2 CBIR For Hyperspectral Imaging Data 407
21.2.1 Spectral Unmixing 407
21.2.2 Proposed CBIR System 409
21.3 Jungle Computing 410
21.3.1 Jungle Computing: Requirements 411
21.4 IBIS and Constellation 412
21.5 System Design and Implementation 415
21.5.1 Endmember Extraction 418
21.5.2 Query Execution 418
21.5.3 Equi-Kernels 419
21.5.4 Matchmaking 420
21.6 Evaluation 420
21.6.1 Performance Evaluation 421
21.7 Conclusions 426
Acknowledgments 426
References 426
22. Taking Advantage of Heterogeneous Platforms in Image and
Video Processing 429
Sidi A. Mahmoudi, Erencan Ozkan, Pierre
Manneback, and Suleyman Tosun
22.1 Introduction 430
22.2 Related Work 431
22.2.1 Image Processing on GPU 431
22.2.2 Video Processing on GPU 432
22.2.3 Contribution 433
22.3 Parallel Image Processing on GPU 433
22.3.1 Development Scheme for Image Processing on GPU 433
22.3.2 GPU Optimization 434
22.3.3 GPU Implementation of Edge and Corner Detection 434
22.3.4 Performance Analysis and Evaluation 434
22.4 Image Processing on Heterogeneous Architectures 437
22.4.1 Development Scheme for Multiple Image Processing 437
22.4.2 Task Scheduling within Heterogeneous Architectures 438
22.4.3 Optimization Within Heterogeneous Architectures 438
22.5 Video Processing on GPU 438
22.5.1 Development Scheme for Video Processing on GPU 439
22.5.2 GPU Optimizations 440
22.5.3 GPU Implementations 440
22.5.4 GPU-Based Silhouette Extraction 440
22.5.5 GPU-Based Optical Flow Estimation 440
22.5.6 Result Analysis 443
22.6 Experimental Results 444
22.6.1 Heterogeneous Computing for Vertebra Segmentation 444
22.6.2 GPU Computing for Motion Detection Using a Moving Camera 445
22.7 Conclusion 447
Acknowledgment 448
References 448
23. Real-Time Tomographic Reconstruction Through CPU + GPU
Coprocessing 451
José Ignacio Agulleiro, Francisco Vazquez,
Ester M. Garzon, and Jose J. Fernandez
23.1 Introduction 452
23.2 Tomographic Reconstruction 453
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 455
23.4 Hybrid CPU + GPU Tomographic Reconstruction 457
23.5 Results 459
23.6 Discussion and Conclusion 461
Acknowledgments 463
References 463
Index 467
Emmanuel Jeannot is a Senior Research Scientist at INRIA.He received his PhD in computer science from Ecole NormaleSuperieur de Lyon. His main research interests are processesplacement, scheduling for heterogeneous environments and grids,data redistribution, algorithms and models for parallelmachines. Julius ?ilinskas is a Principal Researcher and a Head ofDepartment at Vilnius University in Vilnius, Lithuania. Hisresearch interests include parallel computing, optimization, dataanalysis and visualization.
![]() |
Ask a Question About this Product More... |
![]() |