Local LLM models: Part 4 - a simple web UI

In this post we will take the command line chat / tool calling app we described in part 3 of this series which interacts with a local gpt-oss model and add a web browser user interface.

This will use HTML + CSS to render the frontend and Javascript + Websockets to interact with a local http server using the go standard library net/http and the Gorilla WebSocket package.

For styling for lists and buttons we’ll use Pure CSS. To render Markdown content generated by the model we’ll use goldmark along with chroma for code syntax highlighting and KaTeX to format math equations.

Frontend layout and styling

We’re going to use Javascript to handle input and update the content dynamically so we just need a two static files for the layout: index.html and chat.css along with the pure-min.css file downloaded from https://cdn.jsdelivr.net/npm/purecss@3.0.0/build/pure-min.css

settingschat

The dynamic parts of this we’ll need to generate using Javascript are:

#chat-list - main chat message list
<li> elements with classes .chat-item and one of

.user, .analysis or .final depending on message type.
each contains a <div> with class .msg
#conv-list - left menu with saved conversations
<li> elements with class pure-menu-item
each contains a <a> link with class .pure-menu-link and .selected-link if active
#model-name - header div
contains the name of the current model file
#config-page - configuration options - initially hidden
contains #config-form <form> with identity, system, reasoning, weather_tool fields

Check the HTML and CSS source for the gory details.

Static website using net/http

This is very simple - just stick all of our static files under an assets subdirectory and embed them into the server binary.

├── cmd
│   ├── chat
│   │   └── chat.go
│   ├── hello
│   │   └── hello.go
│   ├── tools
│   │   └── tools.go
│   └── webchat
│       ├── assets
│       │   ├── chat.css
│       │   ├── chat.js
│       │   ├── favicon.ico
│       │   ├── fonts
│       │   │   ├── ...
│       │   ├── index.html
│       │   ├── katex.min.css
│       │   ├── pure-min.css
│       │   └── send.svg
│       └── webchat.go

cmd/webchat/webchat.go

package main

import (
	"embed"
	"io/fs"
	"net/http"

	log "github.com/sirupsen/logrus"
)

//go:embed assets
var assets embed.FS

func main() {
	mux := &http.ServeMux{}
	mux.Handle("/", fsHandler())

	log.Println("Serving website at http://localhost:8000")
	err := http.ListenAndServe(":8000", logRequestHandler(mux))
	if err != nil {
		log.Fatal(err)
	}
}

// handler to server static embedded files
func fsHandler() http.Handler {
	sub, err := fs.Sub(assets, "assets")
	if err != nil {
		panic(err)
	}
	return http.FileServer(http.FS(sub))
}

// handler to log http requests
func logRequestHandler(h http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		h.ServeHTTP(w, r)
		log.Printf("%s: %s", r.Method, r.URL)
	})
}

JSON message format

Messages sent over the websocket connection will be encoded as JSON. We can define the format of these by adding some go structs to the api package:

api/messages.go

// Chat API request from frontend to webserver
type Request struct {
	Action  string  `json:"action"`           // add | list | load | delete | config
	ID      string  `json:"id,omitzero"`      // if action=load,delete uuid format
	Message Message `json:"message,omitzero"` // if action=add
	Config  *Config `json:"config,omitzero"`  // if action=config
}

// Chat API response from webserver back to frontend
type Response struct {
	Action       string       `json:"action"`                // add | list | load | config
	Message      Message      `json:"message,omitzero"`      // if action=add
	Conversation Conversation `json:"conversation,omitzero"` // if action=load
	Model        string       `json:"model,omitzero"`        // if action=list
	List         []Item       `json:"list,omitzero"`         // if action=list
	Config       Config       `json:"config,omitzero"`       // if action=config
}

type Conversation struct {
	ID       string    `json:"id"`
	Config   Config    `json:"config"`
	Messages []Message `json:"messages"`
}

type Message struct {
	Type    string `json:"type"`   // user | analysis | final
	Update  bool   `json:"update"` // true if update to existing message
	Content string `json:"content"`
}

type Item struct {
	ID      string `json:"id"`
	Summary string `json:"summary"`
}

type Config struct {
	SystemPrompt    string       `json:"system_prompt"`
	ModelIdentity   string       `json:"model_identity"`
	ReasoningEffort string       `json:"reasoning_effort"` // low | medium | high
	Tools           []ToolConfig `json:"tools,omitzero"`
}

type ToolConfig struct {
	Name    string `json:"name"`
	Enabled bool   `json:"enabled"`
}

Along with a few utilities in the same file:

  • func DefaultConfig(tools ...ToolFunction) Config - initialises the Config struct with default values
  • func NewConversation(cfg Config) Conversation - returns new Conversation struct with given Config and a new UUID v7 ID
  • func NewRequest(conv Conversation, tools ...ToolFunction) (req openai.ChatCompletionRequest) - creates a new completion request struct from the provided config and list of messages.

Rendering markdown content

The content generated by the model is usually formatted as Markdown. This may include code blocks and LaTex match equations. I’ve added a minimal markdown package which wraps the goldmark parser with extensions added. We can extend this later on when we come to add a web browser tool.

This pulls in a bunch of dependencies. goldmark-highlighting uses chroma for syntax highlighting of code blocks. goldmark-katex renders LaTeX maths to HTML by embedding the quickjs JavaScript interpreter to execute KaTex server side. I’ve also added the katex.min.css and KaTex fonts dir to cmd/webchat/assets so these can be loaded locally.

markdown/markdown.go

package markdown

import (
	"bytes"
	"regexp"

	katex "github.com/FurqanSoftware/goldmark-katex"
	"github.com/yuin/goldmark"
	highlighting "github.com/yuin/goldmark-highlighting/v2"
	"github.com/yuin/goldmark/extension"
	"github.com/yuin/goldmark/renderer/html"
)

var (
	reBlock  = regexp.MustCompile(`(?s)\\\[(.+?)\\\]`)
	reInline = regexp.MustCompile(`\\\((.+?)\\\)`)
)

// Render markdown document to HTML
func Render(doc string) (string, error) {
	// convert \(...\) inline math to $...$ and \[...\] block math to $$...$$
	doc = reBlock.ReplaceAllString(doc, `$$$$$1$$$$`)
	doc = reInline.ReplaceAllString(doc, `$$$1$$`)
	md := goldmark.New(
		goldmark.WithExtensions(
			extension.GFM,
			&katex.Extender{},
			highlighting.NewHighlighting(highlighting.WithStyle("monokai")),
		),
		goldmark.WithRendererOptions(html.WithHardWraps()),
	)

	var buf bytes.Buffer
	err := md.Convert([]byte(doc), &buf)
	return buf.String(), err
}

Interprocess communication with websockets

This is also pretty straightforward. On the server side we need a hander for the /websocket url. Once a new connection is established we create a new Connection instance which holds the state of the client-server communication and poll for incoming messages.

cmd/webchat/webchat.go

var upgrader websocket.Upgrader
.. 
	mux.HandleFunc("/websocket", websocketHandler)
..
// handler for websocket connections
func websocketHandler(w http.ResponseWriter, r *http.Request) {
	conn, err := upgrader.Upgrade(w, r, nil)
	if err != nil {
		log.Error("websocket upgrade: ", err)
		return
	}
	defer conn.Close()

	c := newConnection(conn)

	cfg := api.DefaultConfig(c.tools...)
	err = loadJSON("config.json", &cfg)
	if err != nil {
		log.Error(err)
	}
	log.Debugf("initial config: %#v", cfg)
	conv := api.NewConversation(cfg)

	for {
		var req api.Request
		err = conn.ReadJSON(&req)
		if err != nil {
			log.Error("read message: ", err)
			return
		}
		switch req.Action {
		case "list":
			err = c.listChats(conv.ID)
		case "add":
			conv, err = c.addMessage(conv, req.Message)
		case "load":
			conv, err = c.loadChat(req.ID, cfg)
		case "delete":
			conv, err = c.deleteChat(req.ID, cfg)
		case "config":
			conv, err = c.configOptions(conv, &cfg, req.Config)
		default:
			err = fmt.Errorf("request %q not supported", req.Action)
		}
		if err != nil {
			log.Error(err)
		}
	}
}

// connection state for websocket
type Connection struct {
	conn     *websocket.Conn
	client   *openai.Client
	tools    []api.ToolFunction
	channel  string
	content  string
	analysis string
	sequence int
}

// init connection with openai client and tool functions
func newConnection(conn *websocket.Conn) *Connection {
	config := openai.DefaultConfig("")
	config.BaseURL = "http://localhost:8080/v1"
	c := &Connection{
		conn:   conn,
		client: openai.NewClientWithConfig(config),
	}
	if apiKey := os.Getenv("OWM_API_KEY"); apiKey != "" {
		c.tools = []api.ToolFunction{
			weather.Current{ApiKey: apiKey},
			weather.Forecast{ApiKey: apiKey},
		}
	} else {
		log.Warn("skipping tools support - OWM_API_KEY env variable is not defined")
	}
	return c
}

Similarly on the client side. The App class opens the websocket connection, initialises the event handlers and sends and receives JSON messages to communicate with the webchat server.

cmd/webchat/assets/chat.js

class App {
	connected = false;
	showReasoning = false;

	constructor() {
		this.socket = this.initWebsocket();
		this.chat = document.getElementById("chat-list");
		initInputTextbox(this);
		initMenuControls(this);
		initFormControls(this);
	}

	initWebsocket() {
		const socket = new WebSocket("/websocket");
		socket.addEventListener("open", e => {
			console.log("websocket connected");
			this.connected = true;
			this.send({ action: "list" });
		});
		socket.addEventListener("close", e => {
			console.log("websocket closed");
			this.connected = false;
		});
		socket.addEventListener("message", e => {
			const resp = JSON.parse(e.data);
			this.recv(resp);
		});
		return socket;		
	}

	recv(resp) {
		switch (resp.action) {
			case "add":
				addMessage(this.chat, resp.message.type, resp.message.content, resp.message.update);
				break;
			case "list":
				const id = (resp.conversation) ? resp.conversation.id : "";
				refreshChatList(resp.model, resp.list, id);
				break;
			case "load":
				loadChat(this.chat, resp.conversation, this.showReasoning);
				break
			case "config":
				showConfig(resp.config);
				break
			default:
				console.error("unknown action", resp.action)
		}
	}

	send(req) {
		if (!this.connected) {
			document.getElementById("input-text").placeholder = "Error: websocket not connected";
			return;
		}
		this.socket.send(JSON.stringify(req));
	}
}

const app = new App();

list action

Called when the page is loaded to get the model name and list of saved chats. If a conversation is currently in progress then the conversation ID is also set. Conversations are loaded from JSON files under ~/.gpt-go while the model name is derived from calling the llama-server /v1/models endpoint.

// get list of saved conversation ids and current model id
func (c *Connection) listChats(currentID string) error {
	log.Infof("list saved chats: current=%s", currentID)
	var err error
	resp := api.Response{Action: "list", Conversation: api.Conversation{ID: currentID}}
	resp.Model, err = modelName(c.client)
	if err != nil {
		return err
	}
	resp.List, err = getSavedConversations()
	if err != nil {
		return err
	}
	return c.conn.WriteJSON(resp)
}

On the client side function refreshChatList(model, list, currentID) updates the HTML DOM with the results.

add action

Called when a new user message is submitted, either by via a keypress event for the Enter key on the input-text textarea or clicking send-button.

The streamMessage callback accumulates the updates sent back from the model and converts to the api.Message format with Type either ‘analysis’ or ‘final’, Content with the HTML encoded message content so far and Update set to true if this is the first message of the given type. Tool call responses are added to analysis content wrapped in a code block.

Once complete the generated messages are added to the conversation which is saved in JSON format. If this is a the first response on a new chat a listChats is called to refresh the list of chats on the left nav bar.

// add new message from user to chat, get streaming response, returns updated message list
func (c *Connection) addMessage(conv api.Conversation, msg api.Message) (api.Conversation, error) {
	log.Infof("add message: %q", msg.Content)
	conv.Messages = append(conv.Messages, msg)
	req := api.NewRequest(conv, c.tools...)
	c.channel = ""
	c.content = ""
	c.analysis = ""
	c.sequence = 0
	resp, err := api.CreateChatCompletionStream(context.Background(), c.client, req, c.streamMessage, c.tools...)
	if err != nil {
		return conv, err
	}
	err = c.sendUpdate("final", "\n")
	if err != nil {
		return conv, err
	}
	if c.analysis != "" {
		conv.Messages = append(conv.Messages, api.Message{Type: "analysis", Content: c.analysis})
	}
	conv.Messages = append(conv.Messages, api.Message{Type: "final", Content: resp.Message.Content})
	err = saveJSON(conv.ID, conv)
	if err == nil && len(conv.Messages) <= 3 {
		err = c.listChats(conv.ID)
	}
	return conv, err
}

// chat completion stream callback to send updates to front end
func (c *Connection) streamMessage(delta openai.ChatCompletionStreamChoiceDelta) error {
	if delta.Role == "tool" {
		log.Debug("tool response: ", delta.Content)
		return c.sendUpdate("analysis", "\n```\n"+delta.Content+"\n```\n")
	}
	if delta.ReasoningContent != "" {
		return c.sendUpdate("analysis", delta.ReasoningContent)
	}
	if delta.Content != "" {
		return c.sendUpdate("final", delta.Content)
	}
	return nil
}

func (c *Connection) sendUpdate(channel, text string) error {
	if c.channel != channel {
		c.channel = channel
		c.content = ""
		c.sequence = 0
	}
	c.content += text
	if channel == "analysis" {
		c.analysis += text
	}
	// only render final markdown content when new line is generated
	if channel == "final" && !strings.Contains(text, "\n") {
		return nil
	}
	r := api.Response{Action: "add", Message: api.Message{Type: channel, Content: toHTML(c.content, channel), Update: c.sequence > 0}}
	c.sequence++
	return c.conn.WriteJSON(r)
}

On the client side function addMessage(chat, type, content, update, hidden) updates the DOM for the chat-list element.

load action

Called when a saved conversation is selected or the new chat button is clicked. Returns the conversation id, messages and saved config options in the case where a saved conversation is loaded.

// load conversation with given id, or new conversation if blank
func (c *Connection) loadChat(id string, cfg api.Config) (conv api.Conversation, err error) {
	log.Infof("load chat: id=%s", id)
	if id != "" {
		if err = loadJSON(id, &conv); err != nil {
			return conv, err
		}
		for _, tool := range cfg.Tools {
			if !slices.ContainsFunc(conv.Config.Tools, func(t api.ToolConfig) bool { return t.Name == tool.Name }) {
				conv.Config.Tools = append(conv.Config.Tools, api.ToolConfig{Name: tool.Name})
			}
		}

	} else {
		conv = api.NewConversation(cfg)
	}
	resp := api.Response{Action: "load", Conversation: api.Conversation{ID: conv.ID}}
	for _, msg := range conv.Messages {
		resp.Conversation.Messages = append(resp.Conversation.Messages, api.Message{Type: msg.Type, Content: toHTML(msg.Content, msg.Type)})
	}
	err = c.conn.WriteJSON(resp)
	return conv, err
}

On the client side function loadChat(chat, conv, showReasoning) updates the DOM for the chat list. The showReasoning flag is set based on the `show all reasoning’ checkbox and determines if analysis messages are hidden.

delete action

Called when the ‘delete’ button is clicked and there is a chat currently selected. Deletes the saved JSON from disk and sends back a list response to refresh the front end.

// delete chat with given id and return new conversation
func (c *Connection) deleteChat(id string, cfg api.Config) (conv api.Conversation, err error) {
	log.Infof("delete conversation: id=%s", id)
	err = os.Remove(filepath.Join(DataDir, id+".json"))
	if err != nil {
		return conv, err
	}
	err = c.listChats(conv.ID)
	if err != nil {
		return conv, err
	}
	return c.loadChat("", cfg)
}

config action

If the options button is clicked then the current chat is hidden and the config form is shown. This triggers a config request to the backend with nil Config field which loads the current configuration and returns it to the client to populate the form values. When the form is submitted then the config request is called again but with Config values set - these update the current state and are saved to disk.

// if update is nil return current config settings, else update with provided values
func (c *Connection) configOptions(conv api.Conversation, cfg, update *api.Config) (api.Conversation, error) {
	var err error
	if update == nil {
		log.Info("get config")
		resp := api.Response{Action: "config", Config: conv.Config}
		err = c.conn.WriteJSON(resp)
	} else {
		if len(conv.Messages) == 0 {
			log.Infof("update default config: %#v", update)
			*cfg = *update
			err = saveJSON("config.json", cfg)
		} else {
			log.Infof("update config for current chat: %#v", update)
			conv.Config = *update
			err = saveJSON(conv.ID, conv)
		}
	}
	return conv, err
}

Putting it all together

All of the source code for this example is under https://github.com/jnb666/gpt-go/tree/main. To run it just run the llama-server as described in part 1 (scripts/llama-server.sh has an example startup script), run the web server with go run cmd/webchat/webchat.go and connect with a browser on http://localhost:8000.

In the final part of this series we’ll add a web browser tool plugin to add the functionality so the model can search the web process the results.