Kunal Datta (Eindhoven University of Technology): Publications

More details

Eindhoven University of Technology

Graduate student

Eindhoven, North Brabant, Netherlands

35

Auto-tuning Stencil Computations on Multicore and Accelerators
with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick
13

Productivity and performance using partitioned global address space languages
with K. Yelick, D. Bonachea, W. Y. Chen, P. Colella, J. Duell, S. L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, and T. Wen

Partitioned Global Address Space languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of JavaTM designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titaniu…Read more
Partitioned Global Address Space languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of JavaTM designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that trans-lates the parallel languages to C with calls to a communication layer called GASNet. The result is portable high-performance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.Copyright 2007 ACM.
18

Parallel languages and compilers: Perspective from the Titanium experience
with K. Yelick, P. Hilfinger, S. Graham, D. Bonachea, J. Su, A. Kamil, P. Colella, and T. Wen

We describe the rationale behind the design of key features of Titanium-an explicitly parallel dialect of Java for high-performance scientific programming-and our experiences in building applications with the language. Specifically, we address Titanium's partitioned global address space model, single program multiple data parallelism support, multi-dimensional arrays and array-index calculus, memory management, immutable classes, operator overloading, and generic programming. We provide an overv…Read more
We describe the rationale behind the design of key features of Titanium-an explicitly parallel dialect of Java for high-performance scientific programming-and our experiences in building applications with the language. Specifically, we address Titanium's partitioned global address space model, single program multiple data parallelism support, multi-dimensional arrays and array-index calculus, memory management, immutable classes, operator overloading, and generic programming. We provide an overview of the Titanium compiler implementation, covering various parallel analyses and optimizations, Titanium runtime technology and the GASNet network communication layer. We summarize results and lessons learned from implementing the NAS parallel benchmarks, elliptic and hyperbolic solvers using adaptive mesh refinement, and several applications of the immersed boundary method. © 2007 SAGE Publications.
16

Auto-Tuning the 27-point Stencil for Multicore
with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick

This study focuses on the key numerical technique of stencil computations, used in many different scientific disciplines, and illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures.
21

Auto-Tuning Memory-Intensive Kernels for Multicore
with S. W. Williams, L. Oliker, J. Carter, J. Shalf, and K. Yelick
17

Implicit and explicit optimizations for stencil computations
with S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, in…Read more
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster. Copyright 2006 ACM.

Kunal Datta

Auto-tuning Stencil Computations on Multicore and Accelerators with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick

Productivity and performance using partitioned global address space languages with K. Yelick, D. Bonachea, W. Y. Chen, P. Colella, J. Duell, S. L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, and T. Wen

Parallel languages and compilers: Perspective from the Titanium experience with K. Yelick, P. Hilfinger, S. Graham, D. Bonachea, J. Su, A. Kamil, P. Colella, and T. Wen

Auto-Tuning the 27-point Stencil for Multicore with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick

Auto-Tuning Memory-Intensive Kernels for Multicore with S. W. Williams, L. Oliker, J. Carter, J. Shalf, and K. Yelick

Implicit and explicit optimizations for stencil computations with S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick

Auto-tuning Stencil Computations on Multicore and Accelerators
with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick

Productivity and performance using partitioned global address space languages
with K. Yelick, D. Bonachea, W. Y. Chen, P. Colella, J. Duell, S. L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, and T. Wen

Parallel languages and compilers: Perspective from the Titanium experience
with K. Yelick, P. Hilfinger, S. Graham, D. Bonachea, J. Su, A. Kamil, P. Colella, and T. Wen

Auto-Tuning the 27-point Stencil for Multicore
with S. W. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, and K. Yelick

Auto-Tuning Memory-Intensive Kernels for Multicore
with S. W. Williams, L. Oliker, J. Carter, J. Shalf, and K. Yelick

Implicit and explicit optimizations for stencil computations
with S. Kamil, S. Williams, L. Oliker, J. Shalf, and K. Yelick