As AI models grow in size and complexity, the demand for GPU power in datacenters is skyrocketing — along with energy consumption and environmental impact. A team of researchers from the Italian National Research Council (CNR) has developed a new scheduling strategy that intelligently balances performance and sustainability, cutting energy use by up to 20% without sacrificing efficiency. Here’s how it works.
Artificial Intelligence is transforming every aspect of our lives — from healthcare to creativity — but behind its impressive capabilities lies a hidden cost: energy consumption. Training and running large language models like those powering ChatGPT or DALL·E require enormous computing power, which translates into high operational costs and a growing environmental footprint.
A research team from the Institute of Information Science and Technologies “Alessandro Faedo” of the National Research Council of Italy (Cnr-Isti), offers a concrete solution: a new scheduling algorithm for GPU datacenters that significantly cuts power consumption without compromising performance.
The Core Problem: Shared GPUs and Fragmented Resources
Modern AI datacenters rely on thousands of specialized computing units — GPUs. To maximize resource utilization, these GPUs are often shared among multiple tasks — a technique called GPU sharing. However, this sharing can lead to a problem known as GPU fragmentation: if a GPU is only partially used, the leftover capacity may not be usable for other tasks, leaving resources idle.
As if that weren’t enough, GPUs are power-hungry. Even when underused, they still consume significant amounts of energy. In this context, striking a balance between full resource utilization and minimal energy waste is a critical challenge for datacenter operators.
A Dual Strategy: Reducing Fragmentation and Power Use
The CNR team focused on a realistic scenario: online scheduling, where tasks arrive in real time and must be assigned immediately, without knowing what’s coming next. In this setting, the researchers propose two complementary strategies:
Fragmentation Gradient Descent (FGD)
An algorithm that minimizes GPU fragmentation by choosing the node that is least likely to waste resources for future tasks.
Power-aware Scheduler (PWR)
A new approach that estimates the energy impact of each scheduling decision and favors the most energy-efficient CPU-GPU combinations.
The real innovation lies in the smart combination of these two strategies. Thanks to Kubernetes’ flexible architecture, the researchers integrated both FGD and PWR into a single system, dynamically balancing energy savings and optimal resource usage.
The Results: Up to 20% Energy Savings
Using simulations based on real-world data (8,000 tasks from an actual Alibaba datacenter), the researchers showed that their combined strategy can reduce power consumption by up to 20% compared to FGD alone, while maintaining excellent task scheduling performance.
The approach works well across a variety of scenarios: whether dealing with multi-GPU jobs, lightweight shared-GPU tasks, or jobs requiring specific hardware models, the algorithm consistently delivers strong results.
Why It Matters (Even If You Don’t Run a Datacenter)
Every time we use an AI-powered service — from music recommendations to image generation — we’re consuming energy. Behind the scenes, a vast infrastructure is running, consuming, and polluting.
Finding smarter ways to use those resources — like this research does — is essential to making AI sustainable. And it shows that innovation isn’t just about building more powerful models, but also about running them more responsibly.
Article originally posted on Medium.