Abstract:Graphics Processing Units (GPUs) play a pivotal role as primary devices for executing a diverse range of applications. Effective load balancing of the interconnection network is crucial in distributed computing systems as it ensures optimal resource utilization. While previous studies have addressed interconnection network load balancing, our investigation reveals that GPU cores often exhibit a uniform load pattern due to the nature of their workloads. However, we found that memory controllers experience varying loads, potentially leading to stall cycles during which memory requests cannot enter a specific controller’s full queue, causing it to remain in the interconnection network. Introducing the concept of “busy” and “relaxed” memory controllers, our proposed method, Memory Controller Load Balancing (MCLB), dynamically balances the load on memory controllers by categorizing them based on a predefined threshold. GPU cores temporarily pause sending memory requests to “busy” memory controllers, prioritizing “relaxed” cores. This strategy effectively reduces unnecessary congestion in the interconnection network and improves resource utilization in the memory request path. To our knowledge, MCLB is the first method specifically designed to balance memory controller loads in GPU. MCLB significantly reduces total number of memory controller stalls (eliminating them completely in some cases), resulting in latency enhancements. It improves memory request and response roundtrip latency by up to 11.8%, and interconnection network latency by up to 24.6%. This work presents a novel approach to GPU optimization by addressing memory controller load imbalances.