Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications
{{output}}
Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous paral... ...