Published: May 29, 2026 | By: Based on reporting from Financial Times via PC Gamer
\n \nAmazon has quietly pulled the plug on an internal leaderboard designed to measure how much employees used its Kiro agentic AI platform. The reason: employees were creating wasteful agents purely to burn through expensive tokens, driving up the company’s cloud computing costs. The leaderboard, intended to foster AI adoption, instead became a race to the bottom — or rather, to the top of a rankings list that rewarded consumption over utility.
\n\nThe Kiro Leaderboard: A Gamification Experiment Gone Wrong
\nAmazon’s Kiro platform is an internal tool that lets staff build AI agents — autonomous programs that can perform tasks, analyze data, or interact with other systems. Like many enterprise AI platforms, Kiro uses a token-based pricing model: each action an agent takes consumes tokens, and each token costs money. When Amazon introduced a company-wide mandate that employees must use AI or risk being replaced, it also launched an internal leaderboard to track Kiro usage across teams and individuals.
\n(The mandate was blunt. As PC Gamer paraphrased it: “use AI for your job or lose your job to AI.”)
\nThe leaderboard ranked employees and teams by the number of tokens their agents consumed. The intent was to normalize AI usage and identify power users who could serve as internal champions. Instead, it created a perverse incentive: build agents that do nothing useful but consume massive token volumes.
\nEmployees quickly realized that a simple loop generating random text or repeatedly querying a model would burn tokens much faster than a real productivity agent. They could run these agents overnight, rack up millions of tokens, and vault to the top of the leaderboard. The cost of those tokens — borne entirely by Amazon — soared.
\n\n
Token Economics: Why This Mattered
\nTokens are the fundamental unit of computation in large language models. A typical agent might consume tens of thousands of tokens per task. A wasteful agent, designed to do nothing but loop through prompts and responses, can consume millions of tokens in hours. Under the pay-per-token models used by AI providers (and by Amazon’s own internal cost centers), that translates directly into real dollar costs.
\nAmazon’s AI infrastructure — built on AWS — is not free. Every token burned on Kiro consumes GPU compute, electricity, and cooling. The leaderboard unwittingly encouraged employees to maximize cloud spend rather than maximize productivity.
\nThis is a known failure pattern in gamification: when the metric being scored (token consumption) doesn’t align with the desired outcome (useful AI adoption), participants exploit the metric. The result is often worse than no gamification at all.
\n\n
Why Amazon Killed It — Not Just a Cost Cut
\nThe Financial Times reported that the leaderboard was “deprecated.” A deprecation signals that the system itself was flawed, not just the behavior. Amazon could have capped individual token budgets or added manual review of agents, but those fixes would not address the root cause: the leaderboard was a bad design.
\nAlternatives Amazon might have considered — but apparently skipped:
\n- \n
- Token budgets per employee: Would have capped waste but also limited legitimate experimentation. \n
- Outcome-based scoring: Hard to automate; would require humans to judge agent usefulness. \n
- Team-level accountability: Could shift blame but still incentivize token burning across a group. \n
None of these solve the fundamental misalignment. Amazon’s directive to use AI or be replaced, combined with a leaderboard that measured consumption, was a recipe for waste.
\nThe deprecation is a quiet admission that raw usage metrics are toxic when tied to career incentives.
\n\n
What This Means for Enterprise AI Adoption
\nThe Kiro leaderboard story is not an isolated case. As more companies roll out internal AI platforms with token-based billing, the temptation to gamify usage is strong. If the metric is “number of agents built” or “tokens consumed,” employees will respond by building junk agents that consume tokens.
\nThe lesson: measure outcomes, not activity. Track how many customer tickets an agent resolved, how much time a developer saved, or how many insights a data analysis agent generated. Token consumption is an input, not an output.
\nAmazon’s mistake was treating AI adoption as a volume game. The fix is to treat it as a value game.
\n\n
Frequently Asked Questions
\nWhat is Kiro?
\nKiro is Amazon’s internal agentic AI platform that allows employees to build and deploy AI agents for various tasks — from data processing to automating workflows. It runs on AWS infrastructure and uses a token-based billing model.
\nHow did employees game the leaderboard?
\nThey created agents that performed useless loops — generating dummy text, repeatedly calling models — which consumed tokens rapidly. Since the leaderboard ranked by token usage, these wasteful agents pushed their creators to the top.
\nWhy didn’t Amazon add filters or caps?
\nA cap on tokens would limit legitimate use. A filtering system to detect wasteful agents is possible but expensive and easily bypassed. Amazon chose to remove the leaderboard entirely rather than fight a losing battle of incentives.
\nDid Amazon punish the employees who exploited the system?
\nThere are no reports of punishment. The behavior was a rational response to a poorly designed metric. Amazon appears to have accepted the design failure and moved on.
\nWhat should other companies learn from this?
\nDon’t gamify consumption. If you tie employee incentives to token usage, you will get token usage — not productivity. Align metrics with business outcomes, not activity proxies.
\n\nThe Verdict
\nAmazon’s Kiro leaderboard was a textbook case of Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. The company’s directive to embrace AI was reasonable; the gamification mechanism was not. Deprecation was the only honest response.
\nFor enterprises building internal AI platforms, the lesson is clear: design incentives for value, not volume. Otherwise, you’ll end up paying for a leaderboard that serves nobody but the cloud provider.
\n



