求：Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

huikong2013 · 发表于 2025-7-3 17:25:24

https://dl.acm.org/doi/10.1145/3649510
While GPUs can bring substantial speedup to compute-intensive tasks, their programming is notoriously hard. From their programming model, to microarchitectural particularities, the programmer may encounter many pitfalls which may hinder performance in obscure ways. Numerous performance analysis tools provide helpful data on the efficiency of the compute kernels, but few allow the programmer to efficiently gather runtime information directly on the device and pinpoint the sections to optimize.
We propose in this article an instrumentation method to collect traces while executing the compute kernel, with a reduced overhead compared with other approaches, by exploiting the inherently parallel behavior of GPUs and compartmentalizing tracing phases. The reference implementation is freely available and induces an average overhead of 1.6 × on a popular scientific computing benchmark and 1.5 × over the kernel execution time. This represents an improvement of an order of magnitude compared with similar work, and proves useful for timing-guided optimizations. The tool generates insightful execution traces and timestamps which can be analyzed to better understand performance issues in the kernel.

TCKG · 发表于 2025-7-3 17:25:25

本帖最后由 TCKG 于 2025-8-11 11:37 编辑

Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

账号		自动登录	找回密码
密码			注册

[求助] 求：Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

最佳答案

浏览过的版块

站长推荐 /2