Blog posts

Local LLM models: Part 4 - a simple web UI

In this post we will take the command line chat / tool calling app we described in part 3 of this series which interacts with a local gpt-oss model and add a web browser user interface.

This will use HTML + CSS to render the frontend and Javascript + Websockets to interact with a local http server using the go standard library net/http and the Gorilla WebSocket package.

For styling for lists and buttons we’ll use Pure CSS. To render Markdown content generated by the model we’ll use goldmark along with chroma for code syntax highlighting and KaTeX to format math equations.

Local LLM models: Part 3 - calling tool functions

For the previous post in this series which is an intro to the completions API in go see part 2. In this post we will extend the command line chat app to add simple tool calling using the Open Weather API.

We will need to add the function schema definitions to the request that we send to the model so that it knows what functions it can call. Then if the returned response ends with a tool call instead of a final response we extract the parameters, call the function and add the call and the response to the message list before resending the request.

Local LLM models: Part 2 - a basic cmd line app in go

For the first post in this series which covers setting up the environment see part 1.

The llama-server binary provides an OpenAI compatible Completions API using a REST + JSON interface at /v1/completions URL. In this post we test this out by writing some simple command line programs in go.

Local LLM models: Part 1 - getting started

This will be a series of posts about running LLMs locally. By the end we should have our own web based chat interface which can call tools we have defined - e.g. to use a web search API, retrieve pages and feed them back into the model.

This post covers the first steps:

  • Setting up an inference server with llama.cpp,
  • Downloading a model - we’ll use gpt-oss from OpenAI.