Systems Library / AI Model Setup / How to Set Up Perplexity API for Research Automation
AI Model Setup foundations

How to Set Up Perplexity API for Research Automation

Connect Perplexity search-augmented AI for automated research tasks.

Jay Banlasan

Jay Banlasan

The AI Systems Guy

Setting up the Perplexity API for research automation gives you something no standard LLM can do: grounded answers with real-time web access and citations. When I run competitive research, market analysis, or prospect intelligence for clients, I route those tasks through Perplexity. The perplexity api research automation setup process is one of the fastest in this series because Perplexity uses an OpenAI-compatible endpoint, meaning you can swap it in with minimal code changes.

The business case is straightforward. Regular GPT-4 or Claude answers based on training data that may be 6 to 12 months stale. Perplexity answers based on what is live on the web right now. For anything time-sensitive, market-related, or factual, Perplexity is the right tool.

What You Need Before Starting

Step 1: Get Your API Key

Go to perplexity.ai. Click your profile icon and navigate to "API." Click "Generate" to create a new API key. Copy it.

Add to your .env file:

PERPLEXITY_API_KEY=pplx-your-key-here

Perplexity charges per 1,000 tokens, similar to OpenAI. The sonar models are priced competitively for research tasks.

Step 2: Make Your First Research Request

Perplexity's API is OpenAI-compatible, so you use the openai SDK with a custom base URL:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

perplexity_client = OpenAI(
    api_key=os.getenv("PERPLEXITY_API_KEY"),
    base_url="https://api.perplexity.ai"
)

response = perplexity_client.chat.completions.create(
    model="llama-3.1-sonar-large-128k-online",
    messages=[
        {
            "role": "system",
            "content": "You are a market research analyst. Provide specific, factual information with sources."
        },
        {
            "role": "user",
            "content": "What are the top 3 AI automation tools for small businesses in 2024? Include pricing information."
        }
    ]
)

print(response.choices[0].message.content)

The online suffix in the model name means it has real-time web access. Remove online for offline-only processing.

Step 3: Extract Citations from Responses

Perplexity responses include source citations. Here is how to extract and display them:

def research_with_citations(query: str, model: str = "llama-3.1-sonar-large-128k-online") -> dict:
    """
    Run a research query and return both the answer and its sources.
    
    Returns:
        Dict with 'answer' and 'citations' keys
    """
    response = perplexity_client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are a research assistant. Answer based on current information. Cite your sources."
            },
            {
                "role": "user",
                "content": query
            }
        ]
    )
    
    answer = response.choices[0].message.content
    
    # Extract citations if present in the response object
    citations = []
    if hasattr(response, 'citations'):
        citations = response.citations
    
    return {
        "query": query,
        "answer": answer,
        "citations": citations,
        "model": response.model
    }


# Test it
result = research_with_citations("What is the current market size of AI in business automation?")
print(f"Answer:\n{result['answer']}\n")
if result['citations']:
    print("Sources:")
    for i, cite in enumerate(result['citations'], 1):
        print(f"{i}. {cite}")

Step 4: Build a Competitor Research Automation

This is one of the highest-value uses: automate the research you would do manually in a browser:

import json
from datetime import datetime


def research_competitor(company_name: str, industry: str) -> dict:
    """
    Research a competitor and return structured intelligence.
    """
    
    queries = [
        f"What products and services does {company_name} offer? Include pricing if available.",
        f"What are the main customer complaints about {company_name}? Check recent reviews.",
        f"What is {company_name}'s main marketing message and target audience?",
        f"Has {company_name} made any recent announcements, funding rounds, or product launches in 2024?"
    ]
    
    research = {
        "company": company_name,
        "industry": industry,
        "researched_at": datetime.now().isoformat(),
        "findings": {}
    }
    
    sections = ["products_and_pricing", "customer_complaints", "marketing_positioning", "recent_news"]
    
    for query, section in zip(queries, sections):
        print(f"Researching: {section}...")
        result = research_with_citations(query)
        research["findings"][section] = {
            "summary": result["answer"],
            "sources": result.get("citations", [])
        }
    
    return research


def format_competitor_report(research: dict) -> str:
    """Format competitor research into a readable report."""
    
    report = f"""COMPETITOR INTELLIGENCE REPORT
Company: {research['company']}
Industry: {research['industry']}
Generated: {research['researched_at']}
{'='*60}

"""
    
    section_titles = {
        "products_and_pricing": "Products and Pricing",
        "customer_complaints": "Customer Complaints",
        "marketing_positioning": "Marketing Positioning",
        "recent_news": "Recent News"
    }
    
    for key, title in section_titles.items():
        if key in research["findings"]:
            report += f"## {title}\n\n"
            report += research["findings"][key]["summary"]
            
            sources = research["findings"][key].get("sources", [])
            if sources:
                report += f"\n\nSources: {', '.join(sources[:3])}"
            
            report += "\n\n" + "-"*40 + "\n\n"
    
    return report


# Run it
report_data = research_competitor("HubSpot", "CRM and Marketing Software")
report = format_competitor_report(report_data)
print(report)

# Save to file
with open(f"competitor_research_{report_data['company'].lower().replace(' ', '_')}.txt", "w") as f:
    f.write(report)

Step 5: Automate Prospect Research

def research_prospect(company_name: str, website: str = None) -> dict:
    """
    Research a sales prospect before an outreach or meeting.
    """
    
    context = f"for the company {company_name}"
    if website:
        context += f" (website: {website})"
    
    questions = {
        "business_overview": f"What does {company_name} do? What market do they serve? Approximate size?",
        "pain_points": f"What common challenges do companies like {company_name} in their industry face with operations or marketing?",
        "tech_stack": f"What software tools does {company_name} appear to use? Check job listings and reviews.",
        "recent_activity": f"What has {company_name} been doing recently? Hiring, launching, funding, partnerships?"
    }
    
    results = {}
    
    for key, question in questions.items():
        resp = perplexity_client.chat.completions.create(
            model="llama-3.1-sonar-small-128k-online",  # Use smaller model for cost efficiency
            messages=[
                {"role": "system", "content": "Answer concisely in 3-5 sentences. Focus on facts."},
                {"role": "user", "content": question}
            ]
        )
        results[key] = resp.choices[0].message.content
    
    return results

Step 6: Schedule Research to Run Automatically

import time

def daily_market_brief(topics: list[str], output_file: str = "daily_brief.md") -> None:
    """
    Generate a daily market research brief on specified topics.
    Run this as a scheduled job.
    """
    
    brief = f"# Daily Market Brief\n{datetime.now().strftime('%Y-%m-%d %H:%M')}\n\n"
    
    for topic in topics:
        print(f"Researching: {topic}")
        
        response = perplexity_client.chat.completions.create(
            model="llama-3.1-sonar-large-128k-online",
            messages=[
                {"role": "system", "content": "Provide a 3-4 sentence current summary with specific facts and numbers where available."},
                {"role": "user", "content": f"What are the latest developments in: {topic}?"}
            ]
        )
        
        brief += f"## {topic}\n\n{response.choices[0].message.content}\n\n---\n\n"
        time.sleep(1)  # Be kind to the API
    
    with open(output_file, "w") as f:
        f.write(brief)
    
    print(f"Brief saved to {output_file}")


# Example usage
daily_market_brief([
    "AI automation for small business",
    "Meta Ads performance trends",
    "Marketing agency industry news"
])

What to Build Next

Related Reading

Want this system built for your business?

Get a free assessment. We will map every system your business needs and show you the ROI.

Get Your Free Assessment

Related Systems