How to Fine-Tune GPT on Your Business Data
Train a custom GPT model on your company writing style and knowledge.
Jay Banlasan
The AI Systems Guy
Fine-tuning GPT on custom business data is the right tool for one specific problem: when a model consistently produces outputs that are almost right but wrong in the same predictable ways, and system prompts have failed to fix it. I have used fine-tuning to train email generators on a founder's specific writing style, to teach a classifier the difference between deal types that a base model consistently confuses, and to reduce system prompt length by baking instructions into the weights. It is not magic, but used correctly it eliminates whole categories of output problems.
The most common mistake is fine-tuning when the real fix is a better system prompt. Fine-tuning costs money and takes time. Always exhaust prompt engineering first.
What You Need Before Starting
- An OpenAI account with fine-tuning access
openaiPython library v1.0+ (pip install openai)- At least 50 high-quality training examples (100+ for better results)
- Clear before/after examples of the exact behavior you want to change
Step 1: Decide If Fine-Tuning Is the Right Tool
Fine-tuning is the right choice when:
- You need consistent style that is expensive to describe in a system prompt
- Your task requires knowledge not in the base model (internal jargon, proprietary frameworks)
- You need to reduce per-call costs by shortening system prompts (fine-tuned behavior needs less instruction)
- You have 50+ high-quality examples and a clear quality metric
Fine-tuning is NOT the right choice when:
- The base model works but outputs need light correction (fix the prompt)
- You need the model to access new information frequently (use RAG instead)
- You have fewer than 50 examples (not enough signal)
- Your requirement can be met with JSON mode or function calling
Step 2: Build Your Training Dataset
Fine-tuning datasets are JSONL files. Each line is one training example in the messages format.
import json
def create_training_example(system: str, user: str, assistant: str) -> dict:
return {
"messages": [
{"role": "system", "content": system},
{"role": "user", "content": user},
{"role": "assistant", "content": assistant}
]
}
# Example: fine-tuning for a specific email style
SYSTEM_PROMPT = "You write follow-up emails for a B2B software company. Casual, direct, under 80 words, end with a question."
training_examples = [
create_training_example(
SYSTEM_PROMPT,
"Follow-up for Sarah who attended our demo on Monday.",
"Hey Sarah, good to connect on Monday. The workflow automation piece you asked about - we just shipped an update that handles exactly that use case. Worth a quick look? Would a 20-minute call this week work for you?"
),
create_training_example(
SYSTEM_PROMPT,
"Follow-up for Marcus who downloaded our pricing guide 3 days ago.",
"Marcus, saw you grabbed the pricing guide - hope it was useful. Most people have questions about the Enterprise tier after reading it. Anything I can clarify, or does the scope feel right for what you're building?"
),
# Add 48+ more examples following the same pattern
]
def save_training_file(examples: list, filepath: str):
with open(filepath, "w") as f:
for example in examples:
f.write(json.dumps(example) + "\n")
print(f"Saved {len(examples)} examples to {filepath}")
save_training_file(training_examples, "training_data.jsonl")
Quality rules for training data:
- Every assistant response should be an example of exactly what you want
- No mediocre examples - if you would not send it, do not train on it
- Vary the user inputs - similar inputs with slight variation teach generalization
- Match the diversity of real-world inputs your model will see
Step 3: Validate Your Training Data
OpenAI provides a validation script. Run it before uploading.
import json
from collections import defaultdict
def validate_training_file(filepath: str) -> bool:
errors = []
warnings = []
data = []
with open(filepath, "r") as f:
for i, line in enumerate(f, 1):
try:
example = json.loads(line.strip())
data.append(example)
except json.JSONDecodeError as e:
errors.append(f"Line {i}: Invalid JSON - {e}")
if errors:
for e in errors:
print(f"ERROR: {e}")
return False
print(f"Loaded {len(data)} examples")
# Check required structure
for i, example in enumerate(data, 1):
if "messages" not in example:
errors.append(f"Example {i}: Missing 'messages' key")
continue
messages = example["messages"]
roles = [m.get("role") for m in messages]
if "user" not in roles:
errors.append(f"Example {i}: No 'user' message")
if "assistant" not in roles:
errors.append(f"Example {i}: No 'assistant' message")
for msg in messages:
if not msg.get("content", "").strip():
warnings.append(f"Example {i}: Empty content in {msg.get('role')} message")
# Token count estimate
total_tokens = sum(
sum(len(m.get("content", "").split()) * 1.3 for m in ex["messages"])
for ex in data
)
estimated_cost = (total_tokens / 1000) * 0.008 # GPT-3.5 fine-tune rate approx
for e in errors:
print(f"ERROR: {e}")
for w in warnings:
print(f"WARNING: {w}")
print(f"\nEstimated tokens: {int(total_tokens):,}")
print(f"Estimated training cost (1 epoch): ~${estimated_cost:.2f}")
return len(errors) == 0
validate_training_file("training_data.jsonl")
Step 4: Upload and Start Fine-Tuning
import openai
import os
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def start_fine_tune(
training_file: str,
model: str = "gpt-4o-mini-2024-07-18",
suffix: str = "email-followup",
n_epochs: int = 3
) -> str:
# Upload the training file
print("Uploading training file...")
with open(training_file, "rb") as f:
upload_response = client.files.create(
file=f,
purpose="fine-tune"
)
file_id = upload_response.id
print(f"File uploaded: {file_id}")
# Start the fine-tuning job
print("Starting fine-tune job...")
job = client.fine_tuning.jobs.create(
training_file=file_id,
model=model,
suffix=suffix,
hyperparameters={"n_epochs": n_epochs}
)
print(f"Fine-tune job started: {job.id}")
print(f"Status: {job.status}")
print(f"Estimated completion: check status in ~15-30 minutes for small datasets")
return job.id
job_id = start_fine_tune("training_data.jsonl", suffix="email-v1")
Step 5: Monitor the Training Job
import time
def wait_for_fine_tune(job_id: str, poll_interval: int = 60) -> str:
print(f"Monitoring job: {job_id}")
while True:
job = client.fine_tuning.jobs.retrieve(job_id)
status = job.status
print(f"[{time.strftime('%H:%M:%S')}] Status: {status}")
if status == "succeeded":
model_id = job.fine_tuned_model
print(f"\nFine-tune complete!")
print(f"Model ID: {model_id}")
return model_id
elif status == "failed":
print(f"\nFine-tune failed: {job.error}")
raise RuntimeError(f"Fine-tune failed: {job.error}")
elif status in ["running", "queued", "validating_files"]:
# Print training metrics if available
events = client.fine_tuning.jobs.list_events(job_id, limit=5)
for event in reversed(events.data):
if "train_loss" in event.message:
print(f" {event.message}")
time.sleep(poll_interval)
# Run this after starting the job
# model_id = wait_for_fine_tune(job_id)
Step 6: Test and Evaluate Your Fine-Tuned Model
def test_fine_tuned_model(fine_tuned_model_id: str, test_inputs: list) -> None:
SYSTEM_PROMPT = "You write follow-up emails for a B2B software company. Casual, direct, under 80 words, end with a question."
print(f"Testing: {fine_tuned_model_id}\n")
for user_input in test_inputs:
response = client.chat.completions.create(
model=fine_tuned_model_id,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}
],
max_tokens=200,
temperature=0.7
)
print(f"Input: {user_input}")
print(f"Output: {response.choices[0].message.content}")
print("-" * 40)
test_inputs = [
"Follow-up for Janet who requested a demo for her 20-person marketing team.",
"Follow-up for Tom who went cold after two calls.",
"Follow-up for a startup founder who said they couldn't afford it.",
]
# test_fine_tuned_model("ft:gpt-4o-mini:your-org:email-v1:abc123", test_inputs)
Compare outputs side by side against the base model on the same inputs. If the fine-tuned model is not clearly better on your test cases, it is not worth the per-call cost premium.
What to Build Next
- Build a training data collection pipeline that captures high-quality human-edited outputs as training examples over time
- A/B test the fine-tuned model against the base model with a system prompt on real traffic before committing
- Set up a monthly re-training pipeline to incorporate new examples as your style evolves
Related Reading
- GPT-4o for Multimodal Business Tasks - gpt-4o multimodal business tasks
- GPT-4.1 for Business Applications - gpt-4.1 business applications
- Using GPT-4.1 and Claude Together - gpt-4.1 claude together operations
Want this system built for your business?
Get a free assessment. We will map every system your business needs and show you the ROI.
Get Your Free Assessment