Saving Energy Without Wasting Power: The Algorithm That Optimizes AI Datacenters

As AI models grow in size and complexity, the demand for GPU power in datacenters is skyrocketing — along with energy consumption and environmental impact. A team of researchers from the Italian National Research Council (CNR) has developed a new scheduling strategy that intelligently balances performance and sustainability, cutting energy use by up to 20% without sacrificing efficiency. Here’s how it works.

Artificial Intelligence is transforming every aspect of our lives — from healthcare to creativity — but behind its impressive capabilities lies a hidden cost: energy consumption. Training and running large language models like those powering ChatGPT or DALL·E require enormous computing power, which translates into high operational costs and a growing environmental footprint.

A research team from the Institute of Information Science and Technologies “Alessandro Faedo” of the National Research Council of Italy (Cnr-Isti), offers a concrete solution: a new scheduling algorithm for GPU datacenters that significantly cuts power consumption without compromising performance.

The Core Problem: Shared GPUs and Fragmented Resources

Modern AI datacenters rely on thousands of specialized computing units — GPUs. To maximize resource utilization, these GPUs are often shared among multiple tasks — a technique called GPU sharing. However, this sharing can lead to a problem known as GPU fragmentation: if a GPU is only partially used, the leftover capacity may not be usable for other tasks, leaving resources idle.

As if that weren’t enough, GPUs are power-hungry. Even when underused, they still consume significant amounts of energy. In this context, striking a balance between full resource utilization and minimal energy waste is a critical challenge for datacenter operators.

A Dual Strategy: Reducing Fragmentation and Power Use

The CNR team focused on a realistic scenario: online scheduling, where tasks arrive in real time and must be assigned immediately, without knowing what’s coming next. In this setting, the researchers propose two complementary strategies:

Fragmentation Gradient Descent (FGD)

An algorithm that minimizes GPU fragmentation by choosing the node that is least likely to waste resources for future tasks.

Power-aware Scheduler (PWR)

A new approach that estimates the energy impact of each scheduling decision and favors the most energy-efficient CPU-GPU combinations.

The real innovation lies in the smart combination of these two strategies. Thanks to Kubernetes’ flexible architecture, the researchers integrated both FGD and PWR into a single system, dynamically balancing energy savings and optimal resource usage.

The Results: Up to 20% Energy Savings

Using simulations based on real-world data (8,000 tasks from an actual Alibaba datacenter), the researchers showed that their combined strategy can reduce power consumption by up to 20% compared to FGD alone, while maintaining excellent task scheduling performance.

The approach works well across a variety of scenarios: whether dealing with multi-GPU jobs, lightweight shared-GPU tasks, or jobs requiring specific hardware models, the algorithm consistently delivers strong results.

Why It Matters (Even If You Don’t Run a Datacenter)

Every time we use an AI-powered service — from music recommendations to image generation — we’re consuming energy. Behind the scenes, a vast infrastructure is running, consuming, and polluting.

Finding smarter ways to use those resources — like this research does — is essential to making AI sustainable. And it shows that innovation isn’t just about building more powerful models, but also about running them more responsibly.

Article originally posted on Medium.

Send us a message

Get our latest news

Subscribe
to our newsletter.