在线咨询
eetop公众号 创芯大讲堂 创芯人才网
切换到宽版

EETOP 创芯网论坛 (原名:电子顶级开发网)

手机号码,快捷登录

手机号码,快捷登录

找回密码

  登录   注册  

快捷导航
搜帖子
查看: 238|回复: 6

[求助] 求论文:HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs

[复制链接]
发表于 前天 10:05 | 显示全部楼层 |阅读模式
50资产
https://www.computer.org/csdl/pr ... 700a168/22niuJK8Uj6
Abstract

The widespread adoption of GPUs has driven the development of GPU simulators, which, in turn, lead advancements in both GPU architectures and software optimization. Trace-driven cycle-accurate Cycle-accurate simulators, which provide detailed microarchitectural models and clock-level precision, come at the cost of extended simulation times and require high computational resources. Their scalability has become a bottleneck. A growing trend is the adoption of cycle-approximate simulators, which introduce mathematical modeling of partial hardware units and utilize sampling to accelerate simulation. However, this approach faces challenges regarding the accuracy of performance predictions. To address these limitations, we introduce HyFiSS, a hybrid fidelity stall-aware GPU simulator. HyFiSS features fine-grained stall events tracking and attribution by constructing a detailed execution pipeline model for various stall events on Streaming Multiprocessors (SMs). It accurately emulates the thread block scheduler behavior using real-time scheduling logs and utilizes sampling based on thread block sets to minimize the precision loss due to fine-grained sampling points on the microarchitectural state. We achieve a balance between reliability, speed, and the level of simulation detail, especially regarding bottlenecks. By evaluating a diverse set of benchmarks, HyFiSS achieves a mean absolute percentage error in predicting active cycles that is comparable to the state-of-the-art cycle-accurate simulator Accel-Sim. Moreover, HyFiSS achieves a substantial 12.8 × speedup in the simulation efficiency compared to Accel-Sim. HyFiSS also requires at least 3.2 × less disk storage than both Accel-Sim and another state-of-the-art cycle-approximate simulator PPT-GPU due to its efficient SASS (Streaming Assembler) traces compression. With precise, per-cycle stall events statistics, HyFiSS can provide accurate GPU performance metrics and stall cause reporting. This significantly simplifies performance analysis, bottleneck identification, and performance optimization tasks for researchers, making it easier to enhance GPU performance effectively.

发表于 前天 10:05 | 显示全部楼层

HyFiSS_A_Hybrid_Fidelity_Stall-Aware_Simulator_for_GPGPUs.pdf

1.07 MB, 下载次数: 13 , 下载积分: 资产 -2 信元, 下载支出 2 信元

 楼主| 发表于 前天 16:11 | 显示全部楼层
发表于 前天 21:07 | 显示全部楼层
thanks
发表于 昨天 07:05 | 显示全部楼层
Thanks
发表于 昨天 10:52 | 显示全部楼层
多谢分享 多谢分享 多谢分享
发表于 6 小时前 | 显示全部楼层
您需要登录后才可以回帖 登录 | 注册

本版积分规则

关闭

站长推荐 上一条 /1 下一条

X

小黑屋| 手机版| 关于我们| 联系我们| 隐私声明| EETOP 创芯网
( 京ICP备:10050787号 京公网安备:11010502037710 )

GMT+8, 2025-6-7 16:27 , Processed in 0.020406 second(s), 8 queries , Gzip On, MemCached On.

eetop公众号 创芯大讲堂 创芯人才网
快速回复 返回顶部 返回列表