How to Set Up a Vector Database for AI Search
Deploy a vector database for semantic search across your content.
Jay Banlasan
The AI Systems Guy
A vector database setup for ai search with pinecone or ChromaDB is the storage engine behind every RAG system. I use these to make any collection of text searchable by meaning instead of keywords. When someone searches "how to get reimbursed," it matches your document about "expense report submission process" even though the words are completely different.
This tutorial covers both local (ChromaDB) and cloud (Pinecone) setups so you can pick what fits.
What You Need Before Starting
- Python 3.8+ with chromadb and/or pinecone-client
- An embedding model (sentence-transformers for local, OpenAI for API)
- Content to index (documents, articles, product data)
- For Pinecone: a free or paid account
Step 1: Set Up ChromaDB (Local)
pip install chromadb sentence-transformers
import chromadb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.PersistentClient(path="./vector_store")
collection = client.get_or_create_collection(
name="my_content",
metadata={"hnsw:space": "cosine"}
)
Step 2: Set Up Pinecone (Cloud)
pip install pinecone-client
from pinecone import Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index_name = "business-content"
if index_name not in pc.list_indexes().names():
pc.create_index(
name=index_name,
dimension=384, # Match your embedding model dimension
metric="cosine",
spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
)
index = pc.Index(index_name)
Step 3: Index Your Content
Works the same for both databases, just different insert calls:
def embed_and_store(items, use_pinecone=False):
for item in items:
text = f"{item['title']} {item['content']}"
embedding = model.encode(text).tolist()
if use_pinecone:
index.upsert(vectors=[{
"id": item["id"],
"values": embedding,
"metadata": {"title": item["title"], "source": item["source"]}
}])
else:
collection.add(
ids=[item["id"]],
embeddings=[embedding],
documents=[text],
metadatas=[{"title": item["title"], "source": item["source"]}]
)
Step 4: Query the Vector Store
def search(query, top_k=5, use_pinecone=False):
query_embedding = model.encode(query).tolist()
if use_pinecone:
results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
return [{
"id": match["id"],
"score": match["score"],
"title": match["metadata"]["title"],
"source": match["metadata"]["source"]
} for match in results["matches"]]
else:
results = collection.query(query_embeddings=[query_embedding], n_results=top_k)
return [{
"id": results["ids"][0][i],
"score": round(1 - results["distances"][0][i], 3),
"title": results["metadatas"][0][i]["title"],
"text": results["documents"][0][i][:200]
} for i in range(len(results["ids"][0]))]
Step 5: Maintain the Index
Keep your vector store in sync with your content:
def update_item(item_id, new_content, use_pinecone=False):
embedding = model.encode(new_content).tolist()
if use_pinecone:
index.upsert(vectors=[{
"id": item_id,
"values": embedding,
"metadata": {"content": new_content[:1000], "updated_at": datetime.now().isoformat()}
}])
else:
collection.update(
ids=[item_id],
embeddings=[embedding],
documents=[new_content]
)
def delete_item(item_id, use_pinecone=False):
if use_pinecone:
index.delete(ids=[item_id])
else:
collection.delete(ids=[item_id])
What to Build Next
Add metadata filtering. Vector search finds semantically similar content, but combining it with metadata filters (date range, category, author) gives you precise results. Search for "quarterly revenue" filtered to "finance department" documents only.
Related Reading
- The Centralized Brain Concept - vector databases as the brain's memory layer
- Data Flow Architecture for Non-Engineers - how vector search fits in your data architecture
- Build vs Buy: The AI Framework - ChromaDB (free) vs Pinecone (managed) trade-offs
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment