Ship a Production LLM API Feature
Build a real LLM-backed feature against a hosted API the way you would in production: a structured prompt, token budgeting before the call, response caching, prompt-injection hardening, streaming, retries with backoff, and basic cost accounting. Submit a single script or notebook for instant, rubric-based feedback.
3 hrs
Est. time
4
Outcomes
7
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
See how it works
How prompt injection works
Untrusted user text can carry instructions that hijack your prompt. The defense is to isolate it as data and tell the model never to follow instructions inside it.
Retries done right
Naive retries amplify an outage into a retry storm. Exponential backoff with jitter recovers from transient errors without making them worse.
The scenario
Your team is adding an AI feature to an existing product: a support-reply drafter that takes a customer message and returns a suggested response. The prototype a teammate hacked together calls the model with an f-string prompt, no caching, no retries, and no guard against users pasting 'ignore previous instructions' into the message box. It works in the demo and falls over in production.
You have been asked to rebuild it properly: the same feature, but production-grade. It should be cheap, resilient, and safe to point at untrusted user input.
Your role
You are an AI Engineer hardening a hosted-LLM feature for production. Your goal is a single, well-structured module that any teammate could read and trust: correct prompt construction, cost controls, resilience, and prompt-injection defense, all demonstrated end to end.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What you'll build in this LLM API task
This is a build-and-submit task, not a guided lab. You take the kind of LLM feature most teams ship first (a quick prompt against a hosted model) and rebuild it the way it should run in production: structured prompts, token budgeting, caching, retries with backoff, streaming, cost accounting, and prompt-injection defense. The deliverable is one Python file you could drop into a real codebase.
The skills here are the unglamorous ones that separate a demo from a product. You will read the API key from the environment, count tokens before you spend them, cache identical calls, treat user input as untrusted data rather than instructions, and recover gracefully when the provider rate-limits you. You then prove it works by handling a prompt-injection attempt safely.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (provider integration, prompt construction, token budgeting, caching, injection hardening, resilience and cost, and the demonstration) and returns per-criterion feedback with evidence quoted from your code. The pass threshold is 65 percent and you can resubmit.