Ollama (Offline)
Run AI completely offline — no internet, no cloud, no cost.
Ollama is free and works fully air-gapped. Your contract data never leaves your Mac. This makes it ideal for highly confidential documents where you cannot send content to a cloud-based AI provider.
What is Ollama?
Ollama is an open-source tool that lets you run large language models locally on your Mac. It downloads model weights to your machine and runs inference entirely on your hardware. There are no API calls, no cloud servers, and no usage costs. Once set up, you can disconnect from the internet entirely and QuickContract's AI features will still work.
Setting up Ollama
Install Ollama
Download Ollama from ollama.ai and run the installer. It installs as a lightweight background service on your Mac.
Download a model
Open Terminal and pull a model. We recommend starting with llama3 or mistral:
ollama pull llama3
The download is typically 4–8 GB depending on the model. Once complete, the model is stored locally and ready to use.
Verify it works
Test the model in Terminal to make sure it responds:
ollama run llama3 "Hello, world"
You should see a response within a few seconds.
Select Ollama in QuickContract
Open Settings > AI Provider and select Ollama from the dropdown. No API key is needed. QuickContract detects the locally installed models and lists them in the model selector. Choose the model you downloaded.
Recommended models
| Model | Size | Best for |
|---|---|---|
llama3 |
~4.7 GB | Best overall local model for contracts |
mistral |
~4.1 GB | Fast, good for shorter contracts and QuickEdit |
llama3:70b |
~40 GB | Highest quality local option (requires 64 GB+ RAM) |
gemma2 |
~5.4 GB | Good alternative with strong instruction following |
Performance expectations
Local models are slower than cloud APIs. On an M1/M2/M3 Mac with 16 GB of RAM, expect contract generation to take 30–90 seconds depending on the model and contract length. Larger models produce higher quality output but are slower and require more memory.
Most 7B–8B parameter models (like llama3 and mistral) require at least 8 GB of RAM. The 70B models require 64 GB or more. If generation is very slow or the app becomes unresponsive, try a smaller model.
Ollama must be running in the background for QuickContract to use it. If you see an error about not being able to connect, make sure Ollama is running (check for the Ollama icon in your menu bar, or run ollama serve in Terminal).