在线咨询
eetop公众号 创芯大讲堂 创芯人才网
切换到宽版

EETOP 创芯网论坛 (原名:电子顶级开发网)

手机号码,快捷登录

手机号码,快捷登录

找回密码

  登录   注册  

快捷导航
搜帖子
查看: 43|回复: 0

[求助] 求:Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

[复制链接]
发表于 昨天 17:25 | 显示全部楼层 |阅读模式
50资产
https://dl.acm.org/doi/10.1145/3649510
While GPUs can bring substantial speedup to compute-intensive tasks, their programming is notoriously hard. From their programming model, to microarchitectural particularities, the programmer may encounter many pitfalls which may hinder performance in obscure ways. Numerous performance analysis tools provide helpful data on the efficiency of the compute kernels, but few allow the programmer to efficiently gather runtime information directly on the device and pinpoint the sections to optimize.
We propose in this article an instrumentation method to collect traces while executing the compute kernel, with a reduced overhead compared with other approaches, by exploiting the inherently parallel behavior of GPUs and compartmentalizing tracing phases. The reference implementation is freely available and induces an average overhead of 1.6 × on a popular scientific computing benchmark and 1.5 × over the kernel execution time. This represents an improvement of an order of magnitude compared with similar work, and proves useful for timing-guided optimizations. The tool generates insightful execution traces and timestamps which can be analyzed to better understand performance issues in the kernel.

您需要登录后才可以回帖 登录 | 注册

本版积分规则

关闭

站长推荐 上一条 /1 下一条

X

小黑屋| 手机版| 关于我们| 联系我们| 隐私声明| EETOP 创芯网
( 京ICP备:10050787号 京公网安备:11010502037710 )

GMT+8, 2025-7-4 01:39 , Processed in 0.013391 second(s), 7 queries , Gzip On, MemCached On.

eetop公众号 创芯大讲堂 创芯人才网
快速回复 返回顶部 返回列表