OpenAI is closing managed fine-tuning. Your models need a new home by January 2027.

Your LLM bill has an apprentice.

Same answers, smaller bill. Proven by evals, with instant rollback.

Our public benchmark
88.9vs85.6
Fine-tuned 4B beats GPT-4o-mini on held-out evals
Run it yourself → GitHub, ~40 min
scroll
Evidence

Proven on your data, not ours.

Apprentice measures every replacement against your own gold set. Promotion gates are yours to set; rollback is always one click away.

Eval score
89.6%
+18.4pp vs baseline
gold-set agreement gate ≥ 88%
Cost per 1k req
$0.42
−89% vs $3.85 frontier
Qwen3.5-4B on your vLLM
p95 latency
412ms
−68% vs frontier 1,680ms
net inference, no gateway hops
Rollback
1-click
traffic back in < 1s
audit log entry on every rollback
Process

Watch. Learn. Take over.

Three stages, zero risk. Your frontier model keeps running until the small model earns the traffic, gate by gate.

01Shadow

Apprentice watches.

Every request your team sends to GPT or Claude flows through the shadow router. Zero latency added, zero production impact. Just quiet observation.

Shadow mode stats
Requests observed142,813
Production impactNone
Eval pairs captured10,000
02Canary 25%

It learns your workflows.

Your gold set drives continuous fine-tuning. The model is evaluated on every new run before any traffic shifts. Promotion gates are yours to configure.

Promotion gates
Gold-set eval ≥ 88%PASS 89.6%
Fallback rate < 2%PASS 0.8%
p95 latency < 600msPASS 412ms
Live agreement ≥ 95%LIVE 97.2%
03Primary

It takes over safely.

Once every gate passes, the small model handles 100% of the traffic. Your frontier model stays on standby. Rollback is always one click, instantly reversible.

Production state
SLM route100%
Frontier on standbyReady
Monthly saving$18,420
Integration

One line to connect.

Wrap your existing OpenAI call. Apprentice handles the routing, eval, promotion, and rollback. Your code stays unchanged.

Before$3.85 / 1k
# direct frontier call from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": prompt }] )
After$0.42 / 1k when ready
# one line added, everything else unchanged from langchain.chat_models import init_chat_model from apprentice.langchain import ApprenticeCallback model = init_chat_model( "gpt-4o", # your model, unchanged callbacks=[ApprenticeCallback("ticket-triage", client)], # ← the line )

Install: uv add apprentice-sdk[langchain] · Works with LangChain (all providers); direct OpenAI SDK and LlamaIndex supported too.

Calculator

What does your bill look like?

Drag the slider to your current monthly frontier-LLM spend. Apprentice typically cuts 60–90% on repeatable, structured workflows.

Current monthly GPT / Claude spend$20k
Conservative (−60%) · estimate
$12k
saved per month
Typical (−80%) · estimate
$16k
saved per month

Estimates only. Actual savings depend on task repeatability. Shadow mode gives a data-backed projection before any traffic shifts.

Safety model

Built for the teams that can't afford to be wrong.

Every traffic shift requires your gates to pass. No exceptions.

01

Eval-gated rollout.

Nothing reaches production unless it passes your gold set. You set the thresholds. We fail closed.

Continuous eval on every fine-tuning run
Configurable thresholds per task, per team
Auto-demote on fallback spike, no pages needed
02

Instant rollback.

One click. Traffic flips back to the frontier model in under a second. Every rollback is permanent audit log with a metrics snapshot.

Sub-second traffic revert, zero downtime
Audit log on every promotion and rollback
Frontier model always on hot standby
03

Your data, your cloud.

Fine-tuning runs inside your VPC. Model weights never leave your infrastructure. We see traffic shape, not content.

Bring-your-own compute (AWS, GCP, Azure)
No training data leaves your perimeter
SOC 2 Type II · GDPR · HIPAA-ready configs
OpenAI managed fine-tuning closes January 2027. The time to move is now.

Start your first task.
Results in two weeks.

Teams spending $20k+/month on frontier APIs typically see their first task go live within 14 days. The pilot is free. We only win when you save.

Already migrating from OpenAI fine-tuning? Book a 30-min migration call →