Almost Timely News: 🗞️ How To Get Started with Hosted Open Weights AI (2026-02-22)All the power of AI at 5% of the costAlmost Timely News: 🗞️ How To Get Started with Hosted Open Weights AI (2026-02-22) :: View in Browser The Big PlugTwo things to try out this week: 1. Got a stuck AI project? Try out Katie’s free AI Readiness Assessment tool. A simple quiz to help predict AI project success. 2. Wonder how your website is seen by AI? Try my free AI View tool (limited to 10 URLs per day). It looks at your site and tells you what an AI crawler likely sees - and what to fix. Content Authenticity Statement95% of this week’s newsletter content was originated by me, the human. You’ll see outputs from Claude Code in the opening segment. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future. Watch This Newsletter On YouTube 📺Click here for the video 📺 version of this newsletter on YouTube » Click here for an MP3 audio 🎧 only version » What’s On My Mind: How To Get Started with Hosted Open Weights AIThis week, let’s talk about using open weights models from a hosted provider. There are many situations where you’d want to use something like a state of the art (SOTA) open weights model but you don’t have the hardware to run it yourself. I’ll show you how to get started, what it will cost (it’s not free), and how to start using them. Part 1: GlossaryIf that all sounded like word salad, then let’s get the table set with some definitions. Open weights: in the world of AI, there are two fundamental types of AI models, closed weights and open weights. Closed weights models are kept secret by providers. You can’t download them or exert much control over them; these are models like OpenAI’s GPT-5.3, or Google’s Gemini 3.1. Open weights models are models that you can download and install on your computer or in a third party provider. The models themselves are usually free. SOTA: state of the art. Generally, this term refers to any AI model that tops benchmark charts. Inference: when AI is generating stuff, it’s called inference. When it’s learning, that’s called training. For end users like you and me, we are almost always doing inference. This is important because we’re looking mainly for inference providers, which is the name of the type of company that hosts open weights AI models. Prompt Caching: when we’re shopping for AI model hosting companies, look for companies that offer solid prompt caching. This saves the unchanging parts of a prompt from task to task, which can result in substantial cost savings. Parameters: parameters are the statistical associations in a model that represent its knowledge. The more parameters a model has, generally speaking, the more knowledge it has. An 8 billion parameter model (which is relatively small) will have much less broad knowledge than an 8 trillion parameter model. The fewer parameters a model has, the more likely it is to hallucinate without access to tools. Tools: in the context of AI, tools are anything that an AI model can use if it’s told is available. The most common tool is web search - when we perform a task that requires external knowledge, a model can fire up a web search to get current information. Other tools include things like the ability to talk to specific applications like your CRM or email inbox, etc. Context window: AI models all have long and short term memory. Their long term memory is encoded in their parameters. Their short term working memory is called a context window, measured in tokens. Tokens: the mathematical unit that AI operates in, typically about 3/4 of a word. When we talk about a model’s context window, it’s measured in tokens. The more tokens a model has in its context window, the more complex and detailed a task it can do. Models like Claude Opus 4.6 and Gemini 3.1 have 1 million token context windows, which means they can work with about 750,000 words at a time. API: short for application programming interface, an API is how software packages talk to each other. Your AI interface connects to an inference provider via an API. Zero Data Retention: A policy used by technology companies that states they do not keep information you send to them. Especially important for AI where your prompts and responses often contain valuable or sensitive information. Part 2: Reasons for Open Weights ModelsLet’s dig into the specific use cases. The most obvious question about open weights models is, why would you want to use an open weights model versus one of the premier SOTA models like Gemini 3.1 or Opus 4.6? If you already have ChatGPT, isn’t that good enough? There are four major reasons to consider open weights models. First is privacy - depending on the inference provider you work with, they may have policies like Zero Data Retention (ZDR). For data that is commercially sensitive (but still allowable with safe third parties), using an inference provider that offers ZDR will be more private than using a commercial provider like OpenAI or Google that may retain your data for 30 days or more - and if you’re using the free versions, your data is probably being used to train future models and is retained in perpetuity. Almost every major SOTA big tech provider has some form of data retention, so if privacy is important to you (and you’re working with material that is still acceptable to be briefly on a third party’s infrastructure), then using open weights models via an inference provider might fit the bill. You still get near-SOTA capabilities but much more privacy. Note that for truly sensitive, confidential data, even a ZDR inference provider is still technically a third party. Use local models hosted on your own infrastructure if you have truly confidential data that cannot ever be in the hands of a third party. The second major reason is cost. Open weights models typically cost much less than their closed weights counterparts. In this chart from Artificial Analysis, we see the typical consultant’s 2x2 matrix - intelligence versus cost. The model that’s just about right is GLM-5 from z.ai, the Chinese AI company Zhipu. |