Systems Library / AI Model Setup / How to Install and Run Local LLMs with Ollama
AI Model Setup foundations

How to Install and Run Local LLMs with Ollama

Install Ollama and run open-source language models locally on your machine.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Running a local LLM with Ollama is the move for anything you cannot send to a cloud API. Client contracts with data confidentiality clauses, internal HR documents, financial records, anything where you cannot let the data leave your machine. The run local llm ollama setup guide takes about 10 minutes to complete, and once it is done you have a local AI server that behaves like an API you control completely.

Ollama is also the right call for volume work where cloud API costs add up. Once the model is running locally, every call is essentially free. The tradeoff is quality and speed, which is why I use Ollama for classification and extraction tasks where Llama 3 or Mistral is good enough, and reserve Claude/GPT-4 for tasks that need top-tier reasoning.

What You Need Before Starting

Step 1: Install Ollama

On macOS:

curl -fsSL https://ollama.com/install.sh | sh

Or download the macOS app from ollama.com.

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

On Windows: Download the installer from ollama.com. It installs as a background service and adds ollama to your PATH. If you use WSL2, run the Linux install command inside your WSL terminal.

Verify the install:

ollama --version

You should see a version number. Ollama automatically starts a local server at http://localhost:11434.

Step 2: Pull and Run Your First Model

Download Llama 3.1 8B (good balance of speed and quality, about 4.7GB):

ollama pull llama3.1

For a smaller, faster model (great for classification on older hardware):

ollama pull llama3.2:3b

For code tasks:

ollama pull codellama

Run a model in interactive chat mode to test it:

ollama run llama3.1

Type a message and press Enter. Type /bye to exit.

Step 3: Use the REST API Directly

Ollama exposes a REST API at http://localhost:11434. You can hit it with any HTTP client:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama3.1",
    "prompt": "What are the benefits of running AI models locally?",
    "stream": false
  }'

For chat-style (with message history):

curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3.1",
    "messages": [
      {"role": "system", "content": "You are a helpful business analyst."},
      {"role": "user", "content": "Summarize the benefits of local AI for data privacy."}
    ],
    "stream": false
  }'

Step 4: Connect Ollama to Python

Install the official Python library:

pip install ollama

Basic usage:

import ollama

response = ollama.chat(
    model="llama3.1",
    messages=[
        {
            "role": "system",
            "content": "You are a document analyst. Be concise and specific."
        },
        {
            "role": "user",
            "content": "What are the key clauses I should look for in an NDA?"
        }
    ]
)

print(response["message"]["content"])

Step 5: Build a Local AI Function That Mirrors the Cloud Pattern

I build local functions to match the same signature as my cloud AI wrappers. This way I can swap models without changing application code:

import ollama
from typing import Optional


def ask_local(
    prompt: str,
    system_prompt: Optional[str] = None,
    model: str = "llama3.1",
    temperature: float = 0.3
) -> str:
    """
    Send a prompt to a local Ollama model.
    
    Args:
        prompt: User message
        system_prompt: Optional system instructions
        model: Ollama model name (must be pulled first)
        temperature: 0.0 deterministic, 1.0 creative
    
    Returns:
        Response text
    """
    messages = []
    
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    messages.append({"role": "user", "content": prompt})
    
    response = ollama.chat(
        model=model,
        messages=messages,
        options={"temperature": temperature}
    )
    
    return response["message"]["content"]


# Example: Classify documents without sending data to the cloud
def classify_document(text: str) -> str:
    system = "Classify this document as one of: CONTRACT, INVOICE, REPORT, EMAIL, OTHER. Return only the category word."
    return ask_local(text, system_prompt=system, temperature=0.0)


if __name__ == "__main__":
    doc_snippet = "This agreement is entered into as of January 1, 2024 between Party A and Party B..."
    result = classify_document(doc_snippet)
    print(f"Document type: {result}")

Step 6: List Available Models and Manage Storage

See what models you have pulled:

ollama list

Check a model's details:

ollama show llama3.1

Remove a model to free disk space:

ollama rm llama3.2:3b

In Python, list models programmatically:

import ollama

models = ollama.list()
for model in models["models"]:
    size_gb = model["size"] / (1024**3)
    print(f"{model['name']}: {size_gb:.1f}GB")

Step 7: Run Ollama as a Background Service

On Linux, set it to start automatically:

sudo systemctl enable ollama
sudo systemctl start ollama

Check status:

sudo systemctl status ollama

On macOS, Ollama runs as a menu bar app and starts at login automatically.

What to Build Next

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems