Systems Library / AI Model Setup / How to Install and Run Local LLMs with Ollama

AI Model Setup foundations

How to Install and Run Local LLMs with Ollama

Install Ollama and run open-source language models locally on your machine.

Jay Banlasan

The AI Systems Guy

Running a local LLM with Ollama is the move for anything you cannot send to a cloud API. Client contracts with data confidentiality clauses, internal HR documents, financial records, anything where you cannot let the data leave your machine. The run local llm ollama setup guide takes about 10 minutes to complete, and once it is done you have a local AI server that behaves like an API you control completely.

Ollama is also the right call for volume work where cloud API costs add up. Once the model is running locally, every call is essentially free. The tradeoff is quality and speed, which is why I use Ollama for classification and extraction tasks where Llama 3 or Mistral is good enough, and reserve Claude/GPT-4 for tasks that need top-tier reasoning.

What You Need Before Starting

A machine with at least 8GB RAM (16GB recommended for 7B models)
macOS, Linux, or Windows (WSL2 recommended on Windows)
About 5-10GB of disk space per model
Basic terminal comfort
Python 3.9+ for the API integration steps

Step 1: Install Ollama

On macOS:

curl -fsSL https://ollama.com/install.sh | sh

Or download the macOS app from ollama.com.

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

On Windows: Download the installer from ollama.com. It installs as a background service and adds ollama to your PATH. If you use WSL2, run the Linux install command inside your WSL terminal.

Verify the install:

ollama --version

You should see a version number. Ollama automatically starts a local server at http://localhost:11434.

Step 2: Pull and Run Your First Model

Download Llama 3.1 8B (good balance of speed and quality, about 4.7GB):

ollama pull llama3.1

For a smaller, faster model (great for classification on older hardware):

ollama pull llama3.2:3b

For code tasks:

ollama pull codellama

Run a model in interactive chat mode to test it:

ollama run llama3.1

Type a message and press Enter. Type /bye to exit.

Step 3: Use the REST API Directly

Ollama exposes a REST API at http://localhost:11434. You can hit it with any HTTP client:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama3.1",
    "prompt": "What are the benefits of running AI models locally?",
    "stream": false
  }'

For chat-style (with message history):

curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3.1",
    "messages": [
      {"role": "system", "content": "You are a helpful business analyst."},
      {"role": "user", "content": "Summarize the benefits of local AI for data privacy."}
    ],
    "stream": false
  }'

Step 4: Connect Ollama to Python

Install the official Python library:

pip install ollama

Basic usage:

import ollama

response = ollama.chat(
    model="llama3.1",
    messages=[
        {
            "role": "system",
            "content": "You are a document analyst. Be concise and specific."
        },
        {
            "role": "user",
            "content": "What are the key clauses I should look for in an NDA?"
        }
    ]
)

print(response["message"]["content"])

Step 5: Build a Local AI Function That Mirrors the Cloud Pattern

I build local functions to match the same signature as my cloud AI wrappers. This way I can swap models without changing application code:

import ollama
from typing import Optional


def ask_local(
    prompt: str,
    system_prompt: Optional[str] = None,
    model: str = "llama3.1",
    temperature: float = 0.3
) -> str:
    """
    Send a prompt to a local Ollama model.
    
    Args:
        prompt: User message
        system_prompt: Optional system instructions
        model: Ollama model name (must be pulled first)
        temperature: 0.0 deterministic, 1.0 creative
    
    Returns:
        Response text
    """
    messages = []
    
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    messages.append({"role": "user", "content": prompt})
    
    response = ollama.chat(
        model=model,
        messages=messages,
        options={"temperature": temperature}
    )
    
    return response["message"]["content"]


# Example: Classify documents without sending data to the cloud
def classify_document(text: str) -> str:
    system = "Classify this document as one of: CONTRACT, INVOICE, REPORT, EMAIL, OTHER. Return only the category word."
    return ask_local(text, system_prompt=system, temperature=0.0)


if __name__ == "__main__":
    doc_snippet = "This agreement is entered into as of January 1, 2024 between Party A and Party B..."
    result = classify_document(doc_snippet)
    print(f"Document type: {result}")

Step 6: List Available Models and Manage Storage

See what models you have pulled:

ollama list

Check a model's details:

ollama show llama3.1

Remove a model to free disk space:

ollama rm llama3.2:3b

In Python, list models programmatically:

import ollama

models = ollama.list()
for model in models["models"]:
    size_gb = model["size"] / (1024**3)
    print(f"{model['name']}: {size_gb:.1f}GB")

Step 7: Run Ollama as a Background Service

On Linux, set it to start automatically:

sudo systemctl enable ollama
sudo systemctl start ollama

Check status:

sudo systemctl status ollama

On macOS, Ollama runs as a menu bar app and starts at login automatically.

What to Build Next

Create a document processing pipeline that routes sensitive documents to Ollama and everything else to Claude
Run Ollama on a VPS and expose it to your internal network for team use
Use Ollama's model library to test different open-source models against your specific tasks before committing to one

How to Install and Run Local LLMs with Ollama

What You Need Before Starting

Step 1: Install Ollama

Step 2: Pull and Run Your First Model

Step 3: Use the REST API Directly

Step 4: Connect Ollama to Python

Step 5: Build a Local AI Function That Mirrors the Cloud Pattern

Step 6: List Available Models and Manage Storage

Step 7: Run Ollama as a Background Service

What to Build Next

Related Reading

Related Systems

How to Set Up Your First Claude API Call

How to Set Up Perplexity API for Research Automation

How to Create AI API Keys Securely