Token-based billing: from premium request units to AI credits and tokens
This video explains GitHub Copilot’s shift from premium request units to token-based billing, what that means in practice, and what changes start in June.
Full summary based on transcript
What GitHub is changing: premium request units → token-based billing
Rob Bos explains that GitHub is replacing the previous billing model (based on premium request units) with usage-based billing based on tokens.
Refresher: how premium request units worked
Under the old model:
- A premium request unit was a fixed-cost unit (stated as $0.04 per unit in the video).
- A premium request unit applied to “heavier” model interactions, such as:
- A chat turn in the editor chat UI
- Starting a cloud agent session from an issue/PR (e.g., “handle this issue for me”)
- Triggering a Copilot review agent on a pull request (e.g., “review this PR and comment on issues”)
- Costs were attributed to the user who performed the action (the “actor”), including when using Copilot from issues/PRs.
- Different models had multipliers (some models cost more per interaction; some were effectively included with a “0x” multiplier).
- Licenses included a monthly allowance of premium request units per user (example given: Copilot Business included 300 premium request units per user per month).
- A limitation of premium request units: allowances were not shareable across users (no pooling), so one user’s unused units couldn’t offset another user’s overage.
Why GitHub is moving to tokens
The presenter frames token billing as a more “honest” model because a single premium request unit could represent a very long or compute-heavy interaction (e.g., a long-running session) while still only costing $0.04, which is not sustainable for GitHub/model providers.
The new model: AI credits and token costs
GitHub is moving to a system based on AI credits:
- The video states:
- Each credit is $0.04.
- Each license tier includes a certain number of AI credits “included out of the box” (licensing price itself is described as not changing).
- If usage exceeds included credits, additional credits are billed.
The presenter then explains how credits relate to tokens by referencing GitHub’s published model pricing page and using Anthropic models as an example:
- Token costs vary by:
- Model
- Input tokens vs output tokens
- Caching (cached reads/writes can be priced differently)
The key takeaway is that developers and organizations need to start thinking in terms of:
- How many tokens are consumed per interaction
- How many million tokens per day are being used per developer
- How model choice and settings (like reasoning effort) affect token usage
Mini demo: how a single prompt can consume many tokens
The presenter demonstrates in an editor session that even a single question can consume a large number of tokens due to the context window and the codebase context.
- Example shown:
- A single question results in roughly 28K tokens consumed in the session context.
- They compare different reasoning effort settings (e.g., “medium” vs “high”) and note that token usage can change.
- They also mention caching as a possible factor, but note they can’t confirm how the service is hosted/configured.
What will be billed (and what stays included)
The presenter lists which Copilot features will be affected by token-based billing:
Moves to token-based billing:
- Copilot chat turns
- Cloud agent sessions (agentic tasks from issues/PRs)
- Cloud review agent (PR review)
Additional change:
- Starting in June, the cloud review agent will also consume GitHub Actions minutes (runtime cost), whereas previously that runtime was included.
Stays included in the license:
- Inline suggestions
- Next edit suggestions
“Free models” going away
The presenter states that “free models” will no longer be free under the new system:
- All models will incur token-based billing.
- They mention examples like GPT-5 mini, GPT-4o, and GPT-4.1 as models that were “free” under the premium request unit approach but will be billed under token-based billing starting June 1.
Enterprise impact: pooled credits (shared billing)
A major change highlighted for enterprise customers is pooling:
- AI credits can be pooled at the organization or enterprise billing level.
- This addresses the old limitation where premium request units were tied to individual users.
The presenter gives a worked example:
- 100 Copilot Business users
- Each user contributes a fixed amount of AI credits to the pool (example uses “100 × $19 in AI credits”)
- Some users consume less than their share, others consume more
- As long as total usage stays within the pooled credits, there is no overage billing
Practical takeaway
The presenter concludes that teams should prepare for token-based billing by:
- Monitoring token usage per developer and per day
- Being deliberate about model selection and reasoning settings
- Accounting for additional costs like GitHub Actions minutes for cloud review starting in June