Developer laptop running Ollama in terminal
February 13, 2026 Self-Hosted LLM 4 min read

How to Run Llama 3 Locally with Ollama — Step-by-Step

Running a large language model on your own machine used to require a PhD-level understanding of Python environments, CUDA drivers, and model weight formats. Not anymore. Ollama changed the game by making local LLM deployment as simple as running a single command.

I've been running local models for client projects where data privacy is non-negotiable — insurance underwriting, legal document review, healthcare intake forms. Here's exactly how I set it up, and how you can too.

Step 1: Install Ollama

Head to ollama.com and download the installer for your OS. On macOS, it's a standard .dmg. On Linux, one command does it:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version
# ollama version 0.6.2

Step 2: Pull and Run Llama 3

This is the part that surprises most people. No Python. No pip. No conda. Just:

ollama run llama3:8b

Ollama downloads the quantized model (about 4.7 GB for the 8B variant), loads it into memory, and drops you into an interactive chat. The first run takes a few minutes depending on your connection. Every subsequent run starts in seconds.

Step 3: Use It as an API

The real power is using Ollama as a local API server. It runs on port 11434 by default:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:8b",
  "prompt": "Summarize this lease agreement clause: ...",
  "stream": false
}'

This is what makes it production-useful. You can plug this into any application — a document processor, a chatbot, an internal knowledge base — without sending a single byte to a third-party API.

When to Use This Setup

  • Regulated industries where data cannot leave your infrastructure (healthcare, finance, legal)
  • Development and testing when you don't want to burn API credits iterating on prompts
  • Internal tools like code review assistants, documentation generators, or email drafters
  • Offline environments — air-gapped networks, field deployments, embedded systems

Hardware Reality Check

Llama 3 8B runs comfortably on a MacBook Pro with 16 GB RAM. For the 70B model, you'll need at least 48 GB of RAM (or a dedicated GPU with 24+ GB VRAM). My recommendation for most teams: start with the 8B model. It handles summarization, classification, and Q&A surprisingly well.

FAQ

Can I run Ollama on Windows?

Yes. Ollama has native Windows support. Download from ollama.com and it runs as a background service. WSL is not required.

How much disk space do I need?

The 8B model is about 4.7 GB. The 70B model is roughly 40 GB. Ollama stores models in ~/.ollama/models by default.

Is Ollama free for commercial use?

Ollama itself is open source (MIT license). The models have their own licenses — Llama 3 requires a Meta community license agreement, which permits most commercial use.

Need help deploying LLMs for your business?

We build production-grade AI agents with self-hosted models. Zero data leakage, full control.

Book a Free SaaS Waste Audit