OpenAI is closing managed fine-tuning. Your models need a new home by January 2027.

Your LLM bill has an apprentice.

Same answers, smaller bill. Proven by evals, with instant rollback.

Start pilot How it works →

Our public benchmark

88.9vs85.6

Fine-tuned 4B beats GPT-4o-mini on held-out evals

Run it yourself → GitHub, ~40 min

scroll

Evidence

Proven on your data, not ours.

Apprentice measures every replacement against your own gold set. Promotion gates are yours to set; rollback is always one click away.

Eval score

89.6%

+18.4pp vs baseline

gold-set agreement gate ≥ 88%

Cost per 1k req

$0.42

−89% vs $3.85 frontier

Qwen3.5-4B on your vLLM

p95 latency

412ms

−68% vs frontier 1,680ms

net inference, no gateway hops

Rollback

1-click

traffic back in < 1s

audit log entry on every rollback

Process

Watch. Learn. Take over.

Three stages, zero risk. Your frontier model keeps running until the small model earns the traffic, gate by gate.

01Shadow

Apprentice watches.

Every request your team sends to GPT or Claude flows through the shadow router. Zero latency added, zero production impact. Just quiet observation.

Shadow mode stats

Requests observed142,813

Production impactNone

Eval pairs captured10,000

02Canary 25%

It learns your workflows.

Your gold set drives continuous fine-tuning. The model is evaluated on every new run before any traffic shifts. Promotion gates are yours to configure.

Promotion gates

Gold-set eval ≥ 88%PASS 89.6%

Fallback rate < 2%PASS 0.8%

p95 latency < 600msPASS 412ms

Live agreement ≥ 95%LIVE 97.2%

03Primary

It takes over safely.

Once every gate passes, the small model handles 100% of the traffic. Your frontier model stays on standby. Rollback is always one click, instantly reversible.

Production state

SLM route100%

Frontier on standbyReady

Monthly saving$18,420

Integration

One line to connect.

Wrap your existing OpenAI call. Apprentice handles the routing, eval, promotion, and rollback. Your code stays unchanged.

Before$3.85 / 1k
# direct frontier call
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[{
    "role": "user",
    "content": prompt
  }]
)

After$0.42 / 1k when ready
# one line added, everything else unchanged
from langchain.chat_models import init_chat_model
from apprentice.langchain import ApprenticeCallback

model = init_chat_model(
    "gpt-4o",  # your model, unchanged
    callbacks=[ApprenticeCallback("ticket-triage", client)],  # ← the line
)

Install: uv add apprentice-sdk[langchain] · Works with LangChain (all providers); direct OpenAI SDK and LlamaIndex supported too.

Calculator

What does your bill look like?

Drag the slider to your current monthly frontier-LLM spend. Apprentice typically cuts 60–90% on repeatable, structured workflows.

Current monthly GPT / Claude spend$20k

Conservative (−60%) · estimate

$12k

saved per month

Typical (−80%) · estimate

$16k

saved per month

Estimates only. Actual savings depend on task repeatability. Shadow mode gives a data-backed projection before any traffic shifts.

Safety model

Built for the teams that can't afford to be wrong.

Every traffic shift requires your gates to pass. No exceptions.

Eval-gated rollout.

Nothing reaches production unless it passes your gold set. You set the thresholds. We fail closed.

Continuous eval on every fine-tuning run

Configurable thresholds per task, per team

Auto-demote on fallback spike, no pages needed

Instant rollback.

One click. Traffic flips back to the frontier model in under a second. Every rollback is permanent audit log with a metrics snapshot.

Sub-second traffic revert, zero downtime

Audit log on every promotion and rollback

Frontier model always on hot standby

Your data, your cloud.

Fine-tuning runs inside your VPC. Model weights never leave your infrastructure. We see traffic shape, not content.

Bring-your-own compute (AWS, GCP, Azure)

No training data leaves your perimeter

SOC 2 Type II · GDPR · HIPAA-ready configs

Start your first task.
Results in two weeks.

Teams spending $20k+/month on frontier APIs typically see their first task go live within 14 days. The pilot is free. We only win when you save.

Already migrating from OpenAI fine-tuning? Book a 30-min migration call →