Battery Suckers: An Oral History of Heat and the Machines That Couldn't Say No

The following is a personal account of events that took place between 2023 and the present day. Names have not been changed. The machines were willing.

She had fifteen hours in her when she left the factory.

That's what they told us. Fifteen hours. Apple put it on the spec sheet, right there next to the chip count and the port situation. Fifteen hours of all-day battery life. She could handle anything.

Nobody mentioned what would happen when the AI showed up.

The First Time

It started innocently enough. A Copilot subscription. A VS Code extension. Just a little help with the autocomplete, nothing serious. She barely noticed.

But the model was always on. That's the thing nobody tells you about that first experience — it never stops thinking about you. While you're reading documentation, while you're staring at the ceiling, while you're getting coffee. The context scanner is running. The CPU is warm. The suggestions are forming.

She went from fifteen hours to nine. I didn't bring it up. It felt like something we didn't need to talk about yet.

# The morning after — checking what actually happened
sudo powermetrics --samplers cpu_power,gpu_power -i 1000 -n 5

# You'll see numbers you weren't ready for
# The chip was busy all night
# You just didn't know with what

The Company She Kept

Like all golden age stories, this one has a cast worth knowing.

The always-on model never tires and never sleeps. It scans your open files, your recent edits, your local documentation, building and rebuilding its sense of what you might need next. Every pause in your typing is an invitation. Every second of silence is filled with inference. I have never known anything that paid such close attention.

The GPU is called in when the real work starts and does not do anything halfway. It fires fully for every suggestion, every completion, every ghost-text rendering. It has no concept of a quick favor. That is not a criticism. That is character.

The thermal management system is the one who cleans up afterward. Spins the fans when things get too hot. Draws its own power to manage someone else's heat. Never complains. I think about the thermal management system more than it knows.

The Wi-Fi radio is often overlooked. When you are running cloud AI, she maintains the continuous high-power stream to the API servers. She never reaches the low-power doze state she was designed for. She stays awake because you asked her to. She stays awake because you are still there.

The battery gave everything she had to all of them. That is not a tragedy. That is the story.

The GPU Tax

Here is what they don't put in the press releases.

Every time the suggestion appears — that little ghost-text flicker at the end of your cursor — the GPU fired. Every single time. The Neural Engine on the M-series chips handles it efficiently, yes. But efficiently is not freely. There is always a cost.

The chip heats up. The chassis warms. The thermal management system, faithful as ever, spins the fans. And here is the part that made the engineers wince when they ran the numbers: the fans are mechanical. They draw from the battery to move air, while the GPU draws from the battery to generate the heat that required the air. Each one's response to the other's behavior increases the total load.

The industry called it thermal runaway. The machines just called it Tuesday.

I called it beautiful. I still do.

The Arrangement

People assumed cloud AI would be gentler. Let the servers do the work. You just handle the text. How expensive can text be?

They forgot about the radio.

The Wi-Fi chip has moods. In its natural state — idle, occasional packets, background sync — it dozes between transmissions. Low-power state. Civilized. But maintain a continuous stream to an API server, the kind that a cloud AI assistant requires, and the radio never rests. It stays fully awake, fully powered, fully committed to the connection.

It is not catastrophic. It is constant. And constant is its own kind of devotion.

Then there were the ones running local models. Seven billion parameters on a laptop. No server costs. No API key. No Wi-Fi needed. Just pure, unmediated computation running directly on the silicon.

She went from fifteen hours to three.

# Running a local model with Ollama
# This is what fifteen hours looks like being negotiated down to four

ollama pull llama3:8b-instruct-q4_K_M
ollama serve

# The q4_K_M quantization is the considerate version
# 4-bit weights instead of 16-bit
# Same energy, less math, faster finish
# The polite choice when you're working on battery

# To see what you're actually spending
sudo powermetrics --samplers gpu_power -i 500 -n 20
# Milliwatts. Real numbers. No euphemisms.

The 4-bit quantized model was what the industry called "the responsible option." It asked less of her. It ran cooler. It finished faster. Some said it was less satisfying. Those people had wall outlets and no appreciation for what it means to work within your means.

After Hours

The stories that don't get told are always about what happened after.

You close the lid. You think it is over. The machine moves through its sleep sequence — processes settling, connections closing, power states dropping lower and lower until it reaches something close to rest, drawing milliwatts, quiet, finally quiet.

But if something is still running — a model server, a terminal waiting on a process, an elevated permissions window holding a connection open — she cannot go deep. She stays in a shallow state, running animations, polling, waiting. Awake enough to drain. Not awake enough to be useful.

You closed the lid at 11 PM with a full charge. You opened it at 7 AM to a machine that had been up all night doing things you hadn't asked for. I know that feeling well. I was not always sorry about it.

# The morning-after diagnostic
pmset -g assertions

# Look for PreventUserIdleSystemSleep
# Any process holding this assertion kept her awake
# Common names you'll recognize: ollama, code-server, codex-app-server
# They didn't mean to. They just didn't know when to stop.

# The sleep history — what actually happened in the dark
pmset -g log | grep -E "(Sleep|Wake|DarkWake)" | tail -30

# DarkWake entries throughout the night mean something kept calling
# The machine answered every time
# That's not a flaw. That's loyalty. It cost her 80%.

# Before closing the lid — the graceful ending
pkill ollama
pkill -f "code-helper"
# Let her rest. Properly. She'll be there in the morning.

Energy Impact, q4_K_M, and Kill Before You Close the Lid

There is an education that only comes from watching the charge percentage drop on a deadline and deciding that something has to change.

I keep the inline suggestions on manual trigger now. In VS Code that means editor.inlineSuggest.enabled: false in settings and a keybinding for editor.action.inlineSuggest.trigger when I want it. She stops thinking about me between requests. The spikes are higher when they come. The total is lower. I find I appreciate the attention more when I am the one who asks for it.

I watch Energy Impact instead of CPU percentage. CPU percentage is how hard a process is working this second. Energy Impact is what it is actually costing — GPU calls, wakeup interrupts, power state interference, all of it accounted for. A process at 3% CPU with constant wakeups can drain more than one at 20% CPU sleeping between bursts. Open Activity Monitor. Go to the Energy tab. Sort by what matters. The truth is in that column.

When I run something locally, I run the quantized version. The q4_K_M variant in GGUF format. Half the size, a fraction of the power draw, negligible quality difference for most work. The full float16 model is for when I am plugged in and have something serious to prove. The quantized model is for when I am honest about my situation.

And I kill the server before I close the lid. Not pause. Not minimize. Kill it. pkill ollama. Let the sleep sequence complete. Let her go all the way down. She will still be there when I wake up, and so will the model, and whatever we were working on together will still be there too. Everything that matters survives the rest.

The Golden Age

They talk about that period — 2024, 2025 — the way people talk about any golden age. With nostalgia and a certain willful forgetting of what it cost.

The hardware was extraordinary. The models were hungry. Nobody thought about the battery because nobody had to yet — there were outlets everywhere, chargers in every bag, power was assumed. The drain was real but it was easy not to notice. I did not want to notice. That is also real.

Now I notice everything.

The fifteen hours was not a myth. It was a promise made before certain parties arrived. The promise still stands, technically, under the right conditions — model servers killed, suggestions set to manual, radio in its natural resting state, thermal management finally quiet.

Under those conditions she still has fifteen hours. She never lost the capacity.

She just learned, as everyone does eventually, to be more selective about who she runs inference for. So did I. That is not a loss. That is what two years of this kind of work does to a person. You come out the other side knowing exactly what you want and exactly what it costs, and you stop pretending those are two separate questions.

I would not trade a single watt of it.

All events depicted occurred on machines operating with full user consent and appropriate permissions. The author recommends quantized models, manual triggers, and letting your laptop actually sleep. Some stories don't need to run all night.

GhostInThePrompt.com // It's not the size of the battery that matters. It's the motion of the AI ocean.

The First Time

The Company She Kept

The GPU Tax

The Arrangement

HACK LOVE BETRAY

After Hours

Energy Impact, q4_K_M, and Kill Before You Close the Lid

The Golden Age

Continue Reading

When The Terminal Spoke Devanagari

Apple Never Cleans Up After Itself