How to Set Up Google Gemini API Access

Configure Google Gemini API credentials and make your first multimodal request.

Jay Banlasan

The AI Systems Guy

The google gemini api setup guide is shorter than most people expect. Google built Gemini to be accessible from day one, and the free tier is genuinely useful for testing and low-volume production work. I set this up for clients when they need multimodal capabilities, meaning the model can process text, images, PDFs, and audio in a single request. That is something GPT-4o can also do, but Gemini's context window (up to 1 million tokens) makes it the right call for processing large documents.

For business use, Gemini Pro is the workhorse. Gemini 1.5 Flash is cheaper and faster for high-volume tasks. Gemini 1.5 Pro is what you reach for when you need to process a 300-page PDF or analyze an hour-long video transcript in one shot.

What You Need Before Starting

A Google account
Python 3.9 or higher
Access to Google AI Studio at aistudio.google.com
pip for installing packages
A .env file for key storage

Step 1: Get Your Gemini API Key

Go to aistudio.google.com. Sign in with your Google account. Click "Get API key" in the left sidebar, then "Create API key." Select a Google Cloud project (or create one). Copy the key.

Add it to your .env file:

GOOGLE_API_KEY=AIza-your-key-here

The free tier gives you 15 RPM (requests per minute) and 1 million tokens per minute on Gemini 1.5 Flash. That is enough for most automation tasks at low volume.

Step 2: Install the Google Generative AI SDK

pip install google-generativeai python-dotenv

For newer projects, Google also ships a unified SDK:

pip install google-genai python-dotenv

This tutorial uses google-generativeai since it is more widely documented. The unified google-genai SDK is the future direction but is still maturing.

Step 3: Make Your First Text Request

import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content("What are the top 3 uses of AI in small business operations?")

print(response.text)

Run it:

python gemini_test.py

You will get a response in under 2 seconds. Flash is notably faster than Pro for simple text tasks.

Step 4: Make a Multimodal Request (Text + Image)

This is where Gemini separates itself. You can send an image and ask the model questions about it:

import os
import google.generativeai as genai
from dotenv import load_dotenv
import PIL.Image

load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-1.5-flash")

# Load a local image
image = PIL.Image.open("screenshot.png")

# Ask the model about the image
response = model.generate_content([
    "Describe what this screenshot shows. List any key data points or metrics visible.",
    image
])

print(response.text)

This works with PNG, JPG, WEBP, HEIC, and HEIF formats. For business applications, I use this to process invoices, screenshots of dashboards, and product photos for description generation.

Step 5: Process a PDF Document

Gemini can handle PDFs directly without extraction preprocessing:

import os
import google.generativeai as genai
from dotenv import load_dotenv
import pathlib

load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-1.5-pro")

# Upload the file using the Files API
pdf_path = pathlib.Path("contract.pdf")
uploaded_file = genai.upload_file(path=pdf_path, display_name="Contract")

print(f"Uploaded: {uploaded_file.display_name}")

# Ask questions about the document
response = model.generate_content([
    uploaded_file,
    "Summarize the key obligations for both parties in this contract. Use bullet points."
])

print(response.text)

# Clean up uploaded file
genai.delete_file(uploaded_file.name)

Note: Use gemini-1.5-pro for documents longer than 50 pages. Flash handles shorter documents fine.

Step 6: Add System Instructions and Configure Safety Settings

For consistent business outputs, configure the model with a system prompt and adjust safety thresholds:

import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    system_instruction="You are a business data analyst. Respond in plain English. Use bullet points. Keep responses under 200 words unless asked to elaborate.",
    generation_config=genai.GenerationConfig(
        temperature=0.2,
        max_output_tokens=1000,
        response_mime_type="text/plain"
    )
)

# For JSON output, change response_mime_type:
model_json = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        response_mime_type="application/json"
    )
)

response = model_json.generate_content(
    'Extract the company name, date, and total amount from this invoice text: "Invoice from Acme Corp, dated 2024-05-15, total $1,250.00"'
)

print(response.text)  # Returns clean JSON

Step 7: Build a Reusable Wrapper

import os
import google.generativeai as genai
from dotenv import load_dotenv
from typing import Optional, Union
import PIL.Image

load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

def ask_gemini(
    prompt: str,
    image_path: Optional[str] = None,
    model_name: str = "gemini-1.5-flash",
    system_prompt: Optional[str] = None,
    temperature: float = 0.3
) -> str:
    """
    Send a request to Gemini, optionally with an image.
    
    Args:
        prompt: Text question or instruction
        image_path: Optional path to an image file
        model_name: gemini-1.5-flash (fast/cheap) or gemini-1.5-pro (powerful)
        system_prompt: Optional behavior instructions
        temperature: 0.0 deterministic, 1.0 creative
    
    Returns:
        Response text
    """
    model = genai.GenerativeModel(
        model_name=model_name,
        system_instruction=system_prompt,
        generation_config=genai.GenerationConfig(
            temperature=temperature,
            max_output_tokens=2000
        )
    )
    
    content = [prompt]
    
    if image_path:
        image = PIL.Image.open(image_path)
        content.append(image)
    
    response = model.generate_content(content)
    return response.text


if __name__ == "__main__":
    result = ask_gemini(
        "What are 5 ways a law firm could use AI to save time on admin tasks?",
        system_prompt="You are a business consultant. Be specific and practical."
    )
    print(result)

What to Build Next

Connect Gemini to your Google Drive to process documents as they are uploaded
Use the 1M context window to analyze entire CRM export files in one call
Combine Gemini Vision with a file watcher to auto-process incoming invoice scans

How to Set Up Google Gemini API Access

What You Need Before Starting

Step 1: Get Your Gemini API Key

Step 2: Install the Google Generative AI SDK

Step 3: Make Your First Text Request

Step 4: Make a Multimodal Request (Text + Image)

Step 5: Process a PDF Document

Step 6: Add System Instructions and Configure Safety Settings

Step 7: Build a Reusable Wrapper

What to Build Next

Related Reading

Related Systems

How to Set Up Your First Claude API Call

How to Set Up Perplexity API for Research Automation

How to Create AI API Keys Securely