Saving Energy Without Wasting Power: The Algorithm That Optimizes AI Datacenters

June 30, 2025

As AI models grow in size and complexity, the demand for GPU power in datacenters is skyrocketing — along with energy consumption and environmental impact. A team of researchers from the Italian National Research Council (CNR) has developed a new scheduling strategy that intelligently balances performance and sustainability, cutting energy use by up to 20% without sacrificing efficiency. Here’s how it works.

Artificial Intelligence is transforming every aspect of our lives — from healthcare to creativity — but behind its impressive capabilities lies a hidden cost: energy consumption. Training and running large language models like those powering ChatGPT or DALL·E require enormous computing power, which translates into high operational costs and a growing environmental footprint.

A research team from the Institute of Information Science and Technologies “Alessandro Faedo” of the National Research Council of Italy (Cnr-Isti), offers a concrete solution: a new scheduling algorithm for GPU datacenters that significantly cuts power consumption without compromising performance.

The Core Problem: Shared GPUs and Fragmented Resources

Modern AI datacenters rely on thousands of specialized computing units — GPUs. To maximize resource utilization, these GPUs are often shared among multiple tasks — a technique called GPU sharing. However, this sharing can lead to a problem known as GPU fragmentation: if a GPU is only partially used, the leftover capacity may not be usable for other tasks, leaving resources idle.

As if that weren’t enough, GPUs are power-hungry. Even when underused, they still consume significant amounts of energy. In this context, striking a balance between full resource utilization and minimal energy waste is a critical challenge for datacenter operators.

A Dual Strategy: Reducing Fragmentation and Power Use

The CNR team focused on a realistic scenario: online scheduling, where tasks arrive in real time and must be assigned immediately, without knowing what’s coming next. In this setting, the researchers propose two complementary strategies:

Fragmentation Gradient Descent (FGD)

An algorithm that minimizes GPU fragmentation by choosing the node that is least likely to waste resources for future tasks.

Power-aware Scheduler (PWR)

A new approach that estimates the energy impact of each scheduling decision and favors the most energy-efficient CPU-GPU combinations.

The real innovation lies in the smart combination of these two strategies. Thanks to Kubernetes’ flexible architecture, the researchers integrated both FGD and PWR into a single system, dynamically balancing energy savings and optimal resource usage.

The Results: Up to 20% Energy Savings

Using simulations based on real-world data (8,000 tasks from an actual Alibaba datacenter), the researchers showed that their combined strategy can reduce power consumption by up to 20% compared to FGD alone, while maintaining excellent task scheduling performance.

The approach works well across a variety of scenarios: whether dealing with multi-GPU jobs, lightweight shared-GPU tasks, or jobs requiring specific hardware models, the algorithm consistently delivers strong results.

Why It Matters (Even If You Don’t Run a Datacenter)

Every time we use an AI-powered service — from music recommendations to image generation — we’re consuming energy. Behind the scenes, a vast infrastructure is running, consuming, and polluting.

Finding smarter ways to use those resources — like this research does — is essential to making AI sustainable. And it shows that innovation isn’t just about building more powerful models, but also about running them more responsibly.

Article originally posted on Medium.