
GitHub Copilot is moving into a different cost model. GitHub announced that Copilot plans will transition from premium request units to GitHub AI Credits on June 1, 2026. Usage will be calculated from token consumption, including input, output and cached tokens, using the listed API rates for each model. This is a more direct mapping between compute usage and billing, but it also changes the economics for teams that have started to use agentic workflows heavily.
That matters because Copilot is no longer just autocomplete and short chat. Modern usage often means repository-wide context, planning, tool calls, multi-file edits and code review. A few long-running agent sessions can consume a very different amount of inference than short inline questions, even if both used to feel similar from a subscription perspective.
One pragmatic response is model separation. Expensive frontier models remain useful for difficult architecture, security and migration work. Cheaper coding models can handle a large share of day-to-day implementation, test generation, refactoring and explanation work. Models such as Kimi K2.5 and DeepSeek V4 are interesting in that category because they are comparatively inexpensive and increasingly competitive for software engineering tasks.
There is an important boundary: many of these models are operated by Chinese providers or may be routed through infrastructure outside the usual enterprise trust perimeter. That makes them a questionable default for proprietary source code, customer data, regulated environments or internal business logic. The setup described here is mainly attractive for open source work, public examples, disposable experiments and repositories where the data classification allows external model processing.
What BYOK changes in VS Code
Bring your own key, usually shortened to BYOK, lets VS Code use model providers outside the models bundled with GitHub Copilot. GitHub’s April 2026 changelog states that BYOK models are available in VS Code Chat, including the built-in plan agent and custom agents, and that usage is billed directly by the selected provider rather than counted against Copilot request quotas. GitHub also states that BYOK does not apply to code completions.
That distinction is essential. BYOK is currently about chat, agent and related language model interactions in VS Code. Inline completions remain a separate surface. A BYOK setup therefore reduces or redirects the cost of chat-heavy and agent-heavy workflows, but it does not replace every Copilot feature.
For known providers, VS Code can add models through the Language Models editor. For arbitrary OpenAI-compatible endpoints, VS Code’s current documentation points to the Custom Endpoint provider. As of the documentation dated May 20, 2026, that Custom Endpoint provider is available in VS Code Insiders and replaces the deprecated github.copilot.chat.customOAIModels setting.
That makes the practical architecture:
- VS Code Insiders with GitHub Copilot enabled.
- BYOK configured through the Language Models editor.
- A Custom Endpoint pointing at OpenCode Go’s OpenAI-compatible API.
- Kimi K2.5 and DeepSeek V4 models exposed in the VS Code model picker.
Why OpenCode Go fits this setup
OpenCode Go is a low-cost OpenCode subscription plan for popular open coding models tested by the OpenCode team. Its current model list includes Kimi K2.5, Kimi K2.6, DeepSeek V4 Pro and DeepSeek V4 Flash. The OpenCode documentation also lists OpenCode Go as an OpenAI-compatible endpoint at:
1https://opencode.ai/zen/go/v1/chat/completions
The model IDs documented by OpenCode use simple identifiers such as:
1kimi-k2.5
2deepseek-v4-pro
3deepseek-v4-flash
Inside OpenCode’s own configuration, those models are referenced with the opencode-go/ prefix. For VS Code’s Custom Endpoint configuration, the relevant part is the model ID sent to the API. The examples below therefore use the OpenCode Go model IDs directly.
Prerequisites
This setup assumes the following baseline:
- A GitHub account with access to GitHub Copilot in VS Code.
- VS Code Insiders, because the Custom Endpoint provider is still documented as Insiders-only.
- The GitHub Copilot extension enabled in VS Code.
- An OpenCode Go account and API key.
- Permission to send the repository context to the selected model provider.
For Copilot Business or Enterprise, the organization policy also has to allow BYOK in VS Code. GitHub documents this as the “Bring Your Own Language Model Key in VS Code” policy in Copilot settings.
Getting an OpenCode Go API key
OpenCode’s documented flow starts in the OpenCode terminal UI:
1/connect
OpenCode Go can then be selected as the provider. The authentication flow opens opencode.ai/auth, where billing details are added and an API key is created. After the key has been pasted back into OpenCode, the available models can be checked with:
1/models
This step is useful even if the final goal is VS Code. It verifies that the OpenCode account, subscription and model access work before the VS Code configuration is added.
Adding OpenCode Go as a VS Code model provider
The configuration starts from the VS Code Chat model picker:
- Open the model picker in the Chat view.
- Select Manage Language Models.
- Select Add Models.
- Select Custom Endpoint.
- Use a group name such as
OpenCode Go. - Use the API type
Chat Completions. - Enter the OpenCode Go API key when prompted.
VS Code then opens chatLanguageModels.json. A minimal configuration for Kimi K2.5 and DeepSeek V4 can look like this:
1[
2 {
3 "name": "OpenCode Go",
4 "vendor": "customendpoint",
5 "apiType": "chat-completions",
6 "apiKey": "YOUR_OPENCODE_GO_API_KEY",
7 "models": [
8 {
9 "id": "kimi-k2.5",
10 "name": "Kimi K2.5",
11 "url": "https://opencode.ai/zen/go/v1/chat/completions",
12 "toolCalling": true,
13 "vision": true,
14 "maxInputTokens": 262144,
15 "maxOutputTokens": 16384
16 },
17 {
18 "id": "deepseek-v4-pro",
19 "name": "DeepSeek V4 Pro",
20 "url": "https://opencode.ai/zen/go/v1/chat/completions",
21 "toolCalling": true,
22 "vision": true,
23 "maxInputTokens": 262144,
24 "maxOutputTokens": 16384
25 },
26 {
27 "id": "deepseek-v4-flash",
28 "name": "DeepSeek V4 Flash",
29 "url": "https://opencode.ai/zen/go/v1/chat/completions",
30 "toolCalling": true,
31 "vision": true,
32 "maxInputTokens": 131072,
33 "maxOutputTokens": 8192
34 }
35 ]
36 }
37]
The toolCalling flag is important. VS Code only shows models in agent scenarios if they support tool calling. Without that flag, a model may appear suitable for simple chat but disappear from agent workflows.
The API key should not be committed into a repository. This configuration belongs in VS Code’s user-level model configuration, not in workspace settings. If VS Code offers secret storage for the key during the provider setup, that path is preferable to storing the key as plain text.
After saving the file, the models should appear in the chat model picker. If they do not appear immediately, a VS Code restart is usually enough. The Language Models editor also allows hiding less relevant models and pinning the preferred ones at the top of the picker.
Choosing between Kimi and DeepSeek
Kimi K2.5 is a good candidate for broad coding work where long context, code explanation, test scaffolding and steady tool use matter more than the absolute strongest reasoning. It is also a useful default for open source repositories where the goal is to keep chat and agent costs low without dropping to a weak model.
DeepSeek V4 Pro is the more natural choice for larger implementation tasks, multi-file refactorings and agent-style work where stronger reasoning and planning are useful. It is still much cheaper than many premium frontier-model workflows, but it should not be treated as free. Long contexts and repeated agent loops can consume meaningful usage on any provider.
DeepSeek V4 Flash fits fast and low-cost work: small code edits, explanations, test suggestions, documentation cleanup and routine transformations. It is the model to try when latency and cost matter more than maximum reasoning depth.
The practical pattern is a three-tier setup:
- Kimi K2.5: default coding assistant for open source work.
- DeepSeek V4 Pro: heavier agent tasks and larger refactorings.
- DeepSeek V4 Flash: cheap utility model for small requests.
VS Code also has chat.utilityModel and chat.utilitySmallModel settings for background tasks such as summaries, title generation, Git review, commit messages, branch names and intent detection. A cheap BYOK model can be a useful fit there, especially for chat.utilitySmallModel.
Cost control is still required
BYOK moves the billing relationship; it does not remove the need for governance. The selected provider bills directly. That can be cheaper, especially with efficient coding models, but the same agentic behavior that pressures Copilot costs can also create unexpected usage elsewhere.
Useful controls include:
- Provider-side budget limits where available.
- Separate API keys for experiments, open source and production-adjacent work.
- Model-specific defaults instead of always using the strongest model.
- Shorter context attachments for routine questions.
- Avoiding agent loops for tasks that only need a small direct edit.
Cost reduction usually comes from using the right model for the task, not from replacing one provider with another and keeping the same prompting habits unchanged.
Data and compliance implications
The biggest risk in this setup is not technical. It is data handling.
When VS Code Chat or an agent uses a remote BYOK model, prompts, selected code context, tool results and parts of the repository can be sent to that provider. GitHub makes the same point for Copilot CLI BYOK: if the configured provider base URL points to a remote endpoint, prompts and code context are sent over the network to that provider.
That has several consequences:
- Proprietary source code may require legal, security and procurement approval before being sent to an external AI provider.
- Secrets, credentials, customer data, private tickets and production logs should not be included in prompts.
- Chinese-hosted or China-operated models may be unacceptable under corporate policy, client contracts or regulatory constraints.
- Open source repositories are the cleaner fit because the code is already public and the residual risk is easier to reason about.
- Provider routing matters. A model name alone does not describe where data is processed.
This point is especially important with aggregator or router services. The model may be DeepSeek or Kimi, but the hosting provider, routing policy, retention policy and region may vary. A serious evaluation should therefore check the provider’s current privacy terms, data retention behavior and enterprise controls, not just the model benchmark.
Troubleshooting
If a model does not appear in VS Code’s agent model picker, the usual causes are an incorrect API type, missing toolCalling, a typo in the model ID or an endpoint URL that does not match the selected API style. For OpenCode Go, the documented endpoint is the Chat Completions endpoint, so chat-completions is the expected API type.
If chat works but tools fail, the model may not be producing tool calls in the format expected by VS Code’s agent runtime. Trying the Pro model instead of the Flash model is a reasonable first check. If the issue only appears through one routing provider, provider-specific message conversion may be involved.
If all requests fail with authentication errors, the OpenCode Go key should be verified in OpenCode itself with /connect and /models. That separates account and subscription issues from VS Code configuration issues.
If inline completions still use Copilot’s normal models, that is expected. VS Code’s BYOK documentation and GitHub’s changelog both separate BYOK chat models from code completions.
Conclusion
GitHub Copilot’s move toward token-based AI Credits makes model selection a cost-management concern, not just a quality preference. BYOK in VS Code gives teams and individual developers a practical way to route chat and agent workloads to cheaper models, while keeping the Copilot editor experience.
OpenCode Go is a useful bridge for this because it exposes coding-focused models such as Kimi K2.5 and DeepSeek V4 through an OpenAI-compatible endpoint. The setup is still early in a few places, especially around VS Code’s Custom Endpoint provider, but the direction is clear: expensive models for the hardest work, cheaper coding models for the large volume of routine engineering tasks.
For public and open source repositories, that trade-off can be attractive. For proprietary code, the data boundary is the deciding factor. Cost savings do not justify sending sensitive source code or business context to a provider that has not passed the same review as every other production dependency.
Related articles

Mar 17, 2026 · 15 min read
GitHub Copilot - Custom Agents for Full-Stack Teams: A Practical Operating Model for .NET, React and Azure
GitHub Copilot custom agents allow teams to define specialized AI assistants, each with its own role, tool access and behavioral boundaries. …

Feb 12, 2026 · 8 min read
Agent Skills Standard: The Quality Contract Behind Reliable AI Agents
Large language model agents can appear intelligent while still producing unstable output across runs, contexts and tasks. In practice, this …

Feb 03, 2026 · 5 min read
Building Custom AI Tooling with the GitHub Copilot SDK for .NET
The landscape of AI development is shifting rapidly from simple chat interfaces to fully integrated, agentic workflows like the Ralph Loop . …
Let's Work Together
Looking for an experienced Platform Architect or Engineer for your next project? Whether it's cloud migration, platform modernization or building new solutions from scratch - I'm here to help you succeed.

Comments