whichllm — Browse and compare AI model specs and pricing

Inference buying guide

API vs subscription vs Ollama Cloud: how should you buy open model inference?

API for full control, subscriptions cap cost, Ollama Cloud removes infra. Cost and flexibility breakdown for DeepSeek V4 Pro, Kimi K2.6, and GLM-5.1.

Source note

鸭哥 AI 手记 / Superlinear Academy · 开源模型推理采购指南:GLM-5.1、DeepSeek V4 Pro、Kimi K2.6 的 API、订阅和 Ollama Cloud 对比

Inspired by a Chinese buying guide from 鸭哥 AI 手记 / Superlinear Academy. This whichllm page is original commentary focused on model procurement decisions, not a translation or republication.

TL;DR

  • Use official APIs when usage is variable, accounting needs to be exact, or you need the provider's native controls.
  • Use developer subscriptions when one model family powers a high-volume coding workflow and the subscription limits actually fit your usage.
  • Use Ollama Cloud when you want one subscription across several open models and care more about workflow simplicity than token-level accounting.
  • Check privacy terms first when prompts include customer data, private code, or internal documents. The cheapest channel may be the wrong channel.

Best buying channel by workload

WorkloadFirst channel to testWhy
Light agent codingOfficial API or low-tier subscriptionAt low volume, simplicity matters more than hunting for the theoretical cheapest plan.
Heavy daily codingDeveloper subscriptionRepeated run-patch-test loops can make per-token billing expensive fast if subscription limits hold up.
Cross-model workflowOllama Cloud or routerOne plan or router can be cleaner when GLM, Kimi, and DeepSeek each do different jobs.
Privacy-sensitive promptsChannel with the clearest data termsData retention and training policy can dominate price when prompts contain code or customer context.
Production API billingOfficial APIMetered APIs are easier to budget, monitor, and map back to product usage.

The model is not the whole product

Open models are no longer a single buying decision. The same model can be bought as a metered API, bundled inside a developer subscription, routed through a third-party service, or run through a cloud inference product. Those channels create different behavior even when the model name is the same.

For builders, this changes the real comparison. You are not only comparing GLM-5.1, DeepSeek V4 Pro, and Kimi K2.6. You are comparing billing shape, privacy boundary, rate limits, speed, and how much friction the channel adds to an agent workflow.

Channel notes

Official API

Best when you need explicit usage accounting, provider-native settings, and production billing clarity. It is usually the cleanest way to connect product usage to inference cost.

Developer subscription

Best when one model family dominates your personal or team workflow. The risk is opacity: request caps and fair-use limits can matter more than the headline monthly price.

Ollama Cloud

Best when you want open-model flexibility without running local infrastructure. It is especially interesting for builders who switch between GLM, DeepSeek, and Kimi by task.

Privacy can beat price

A cheap channel is not automatically a good channel. If your prompts include customer records, unreleased code, legal text, or private company context, read the data retention and training terms before you optimize cost.

The practical rule: use price to choose between acceptable channels, not to make an unacceptable channel acceptable. If the privacy boundary is unclear, treat the model as a prototype tool rather than a production dependency.

Compare live model specs on whichllm

Use this guide to choose the buying channel, then check the current model IDs, context windows, capabilities, and token prices before you wire the model into a workflow.