Image courtesy 123rf
Google has developed a new methodology to determine the energy, water and greenhouse gas emissions of its Gemini AI inference.
According to Google, the environmental impact of one Gemini text query is the equivalent of watching TV for slightly less than nine seconds.
More specifically Google has estimated that the median Gemini text prompt uses 0.24Wh of energy, consumes 0.26ml or five drops of water and emits 0.03g of CO2e – figures that are lower than past estimates that have ranged up to almost 7Wh of energy and 50ml of water per prompt.
The challenge, however, is in the inputs to the estimates, often relying on certain assumptions and making comparison between them difficult. Some also have considered only the active machine consumption.
Have you read?
How a university data centre delivers heat to the campus district heating network
Data centre ‘minimum performance standards’ under spotlight in Europe
Google describes its approach as a ‘full stack’ one – conceptually similar to the full delivery chain approach to data centre energy consumption recently developed in Britain – and takes account of four sets of energy inputs.
These are the full system dynamic power, which includes the energy used by the primary AI model during active computation, as well as that for the achieved chip utilisation, the energy consumption of the chips that are idle to ensure high availability and reliability, the consumption of the host CPU and RAM, and the data centre overhead, including cooling and other energy consuming infrastructure.
Google believes that this methodology offers the most complete view of AI’s overall footprint. Considering only the active TPU and GPU consumption, its estimated median Gemini text prompt figures are 0.10Wh for energy, 0.12ml for water and 0.02g CO2e emissions and clearly an underestimate.
Google also comments that its AI systems are continuously becoming more efficient through research innovations and software and hardware efficiency improvements. For example, figures presented for a recent 12 month period are drops in the energy and total carbon footprint of the median Gemini Apps text prompt by 33x and 44x respectively, while at the same time delivering higher quality responses.
Such improvements have included more efficient model architectures, more efficient algorithms, optimised approaches to inference and idling, and the use of custom built hardware.
Google also claims its data centres are among the industry’s most efficient, with a fleet-wide average power usage effectiveness (PUE, i.e. the ratio of the total power consumption to that delivered to the computing equipment) of 1.09.
Scaling up for flexibility
While the energy consumption figure is low, it is for only one prompt and when scaled up across the millions of prompts that Gemini handles on a daily basis – the actual number is apparently not disclosed by Google but the company is known to have over 4 billion users – then the figures take on the different, data centre scale.
The growing challenge of integrating data centres to the grid is well known and one approach that has been developed, notably by bitcoin miners, is to use them to deliver flexibility to the grid.
Following a 2024 demonstration with Omaha Public Power District in which the power demand associated with machine working workloads was reduced during three grid events last year, Google has reached agreements to implement these capabilities with two utilities, Indiana Michigan Power and Tennessee Valley Authority.
The first data centre demand response capabilities that were developed involved shifting non-urgent compute tasks, such as processing of YouTube videos, during specific periods when the grid is strained. Google reports this is now being leveraged in partnerships with Centrica Energy and the TSO Elia in Belgium and Taiwan Power Company in Taiwan.




