Systems Library / AI Capabilities / How to Set Up a Vector Database for AI Search
AI Capabilities rag knowledge

How to Set Up a Vector Database for AI Search

Deploy a vector database for semantic search across your content.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

A vector database setup for ai search with pinecone or ChromaDB is the storage engine behind every RAG system. I use these to make any collection of text searchable by meaning instead of keywords. When someone searches "how to get reimbursed," it matches your document about "expense report submission process" even though the words are completely different.

This tutorial covers both local (ChromaDB) and cloud (Pinecone) setups so you can pick what fits.

What You Need Before Starting

Step 1: Set Up ChromaDB (Local)

pip install chromadb sentence-transformers
import chromadb
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.PersistentClient(path="./vector_store")
collection = client.get_or_create_collection(
    name="my_content",
    metadata={"hnsw:space": "cosine"}
)

Step 2: Set Up Pinecone (Cloud)

pip install pinecone-client
from pinecone import Pinecone

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

index_name = "business-content"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384,  # Match your embedding model dimension
        metric="cosine",
        spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
    )

index = pc.Index(index_name)

Step 3: Index Your Content

Works the same for both databases, just different insert calls:

def embed_and_store(items, use_pinecone=False):
    for item in items:
        text = f"{item['title']} {item['content']}"
        embedding = model.encode(text).tolist()

        if use_pinecone:
            index.upsert(vectors=[{
                "id": item["id"],
                "values": embedding,
                "metadata": {"title": item["title"], "source": item["source"]}
            }])
        else:
            collection.add(
                ids=[item["id"]],
                embeddings=[embedding],
                documents=[text],
                metadatas=[{"title": item["title"], "source": item["source"]}]
            )

Step 4: Query the Vector Store

def search(query, top_k=5, use_pinecone=False):
    query_embedding = model.encode(query).tolist()

    if use_pinecone:
        results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
        return [{
            "id": match["id"],
            "score": match["score"],
            "title": match["metadata"]["title"],
            "source": match["metadata"]["source"]
        } for match in results["matches"]]
    else:
        results = collection.query(query_embeddings=[query_embedding], n_results=top_k)
        return [{
            "id": results["ids"][0][i],
            "score": round(1 - results["distances"][0][i], 3),
            "title": results["metadatas"][0][i]["title"],
            "text": results["documents"][0][i][:200]
        } for i in range(len(results["ids"][0]))]

Step 5: Maintain the Index

Keep your vector store in sync with your content:

def update_item(item_id, new_content, use_pinecone=False):
    embedding = model.encode(new_content).tolist()

    if use_pinecone:
        index.upsert(vectors=[{
            "id": item_id,
            "values": embedding,
            "metadata": {"content": new_content[:1000], "updated_at": datetime.now().isoformat()}
        }])
    else:
        collection.update(
            ids=[item_id],
            embeddings=[embedding],
            documents=[new_content]
        )

def delete_item(item_id, use_pinecone=False):
    if use_pinecone:
        index.delete(ids=[item_id])
    else:
        collection.delete(ids=[item_id])

What to Build Next

Add metadata filtering. Vector search finds semantically similar content, but combining it with metadata filters (date range, category, author) gives you precise results. Search for "quarterly revenue" filtered to "finance department" documents only.

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems