4 min read

When AI Efficiency Turns to Waste

When AI Efficiency Turns to Waste

Audio version

Don't have the time to read the post? You can listen to the full blog here

When AI Efficiency Turns to Waste
7:41

When using AI, speed does not always equal efficiency or quality.

AI platforms are very good at looking efficient. You ask a question, and it gives you an answer in seconds. A draft appears, a summary lands, perhaps a workflow speeds up. On the surface, that feels like a win. It can do things in seconds that would take a human hours.

But when you look underneath the output, and start looking at how it is actually being used, time stops being the only factor. There is the time it takes to create a good prompt, because AI is only ever as good as what you give it. And then there is how the platforms themselves are priced.

That’s where the idea of “efficiency” starts to get more complicated.

Did you know that most AI platforms charge by tokens? In simple terms, tokens are the small chunks of text a model reads and generates when it processes a request. They’re not the same as words, think of them as the units that make up your prompt and the answer that comes back. Google says 100 tokens is roughly equal to 60 to 80 English words, depending on the text.

OpenAI, Anthropic and Google all price model usage around those tokens, separating costs across things like input, output and caching. OpenAI’s current pricing, for example, lists GPT-5.5 at $5 per 1 million input tokens, $0.50 per 1 million cached input tokens and $30 per 1 million output tokens for standard use. Anthropic and Google also separate token costs in similar ways, which means the shape of the workflow matters just as much as the speed of the reply.

So yes, AI may feel fast, but fast is not always the same as efficient.

Are You Watching Your AI Waste?

A lot of businesses look at the speed of the first output and assume the process itself is efficient, but they don’t see the waste behind it.

That waste shows up in all sorts of ways: repeated prompting, bloated context, regeneration cycles, and using the wrong model for the job.

We have all been guilty of repeated prompting. You ask ChatGPT for something, get a vague answer, rewrite the prompt, ask again, tweak it again, then spend another ten minutes cleaning up the final version.

On paper, that still looks faster than starting from scratch. In practice, token use is climbing in the background, and with it, the cost of using the platform.

Then there’s bloated context. Google says 100 tokens is roughly equal to 60 to 80 English words, which gives you a sense of how quickly usage builds once people start pasting in long meeting notes, entire strategy documents, old chat threads or unnecessary background every time they prompt.

Meanwhile, OpenAI’s pricing makes it clear that long-context requests can cost more than short-context ones. For GPT-5.5, long-context pricing is listed at $5 per 1 million input tokens, $0.50 per 1 million cached input tokens and $22.50 per 1 million output tokens, while prompts above 272,000 input tokens are priced at 2x input and 1.5x output for the full session.

So yes, you may want to think twice before pasting in a 20-page strategy document every time you need a summary.

Another factor people often miss is model choice. When you open a platform like ChatGPT, it can be tempting to use the most advanced model by default. But the most powerful models are not always the most sensible choice for the task in front of you. Use them for work that does not need that level of reasoning, and you start adding to the same pile of waste.

A lot of businesses default to the highest-spec option because it feels safer. That can distort the idea of efficiency, especially when a cheaper model would have done the job perfectly well. Google’s Gemini pricing, for example, shows clear variation between models, with different rates for input, output and context caching depending on which model is being used.

Then there are regeneration cycles, where AI is constantly churning out different options for you to consider. If the first answer is weak, and the second is not much better, and the third finally gets close, that’s not a clean workflow - it’s trial and error dressed up as speed.

This is where businesses risk falling into a false sense of efficiency. A workflow can feel quick while still being badly designed; it can produce more at speed, while quietly consuming more tokens, more attention and more human correction than anyone expected.

What Does Real AI Efficiency Look Like?

Real AI efficiency comes from everything that happens around the output, not the output alone. Think:

  • Does the model fit the task?
  • Does the prompt get to the point quickly?
  • Does the answer come back usable?
  • Does it reduce rework, or create more of it?
  • Does it actually free up human time for better thinking and higher-value work?

When businesses talk about efficiency, they should be looking for lower waste, clearer processes and better use of human time.

In practice, that might look like:

  • Using a cheaper model for lower-value work
  • Creating stronger prompt templates and AI use guidelines so teams are not reinventing the wheel every time.
  • Getting clear on when not to use AI at all, because the time spent prompting, regenerating and fixing would outweigh the benefit.
  • Building AI token budgeting into planning, so usage is forecast properly and ownership is clear across teams

This is where AI efficiency starts to look less like speed and more like discipline. The businesses getting the most value from AI are the ones using it with more intention: they know which workflows genuinely benefit from it, where human judgement still matters, and where the process needs tightening up before more AI gets added into the mix.

Fast Isn’t the Same as Efficient

This is the trap a lot of businesses are falling into. AI gives the impression of efficiency because it is quick: it produces something in seconds, and that speed can make the whole workflow feel smarter than it really is.

However, if the process behind the output is full of repeated prompting, bloated context, unnecessary regeneration and human clean-up, then the workflow is not particularly efficient at all.

That’s where thinking in terms of tokens becomes useful. They show what AI actually costs, and they can reveal just how much waste is sitting behind the work.

When measuring AI efficiency, businesses should be looking at how useful AI really is. Is it improving output quality? Is it making better use of people’s time? Is it reducing waste in the process?

Because if nobody’s looking at the process underneath it, that ‘efficiency’ can start getting expensive very quickly.

There’s also a budgeting question sitting underneath all of this. AI use now is certainly not the AI use of the future. So, a lot of businesses still do not have a clear view of what it is actually costing them today, never mind how to forecast for it properly. Is that budget being managed company-wide or by department? Who owns it? Finance may hold the budget, but they are not necessarily the ones shaping output quality, workflow design or day-to-day usage. That’s where things can start drifting. If businesses want AI to be efficient, they need more than access to the tools. They need a clearer handle on usage, ownership and how token costs are being planned for over time.

If you need help making commercial sense of how AI should be used in your business, get in touch with Marmalade Marketing here.