Build a Structured-Output Extraction Service
Turn messy text into validated, typed JSON. Build an extraction service that uses an LLM to pull structured fields from unstructured documents, validates the output against a schema, retries when the model returns something invalid, and measures its own accuracy on a small labeled set. Submit a single script or notebook for instant, rubric-based feedback.
3 hrs
Est. time
4
Outcomes
6
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
See how it works
From free text to typed JSON
class Person(BaseModel):
name: str
age: int{ "name": str, "age": int },
strict, no extra keys{
"name": "Dana",
"age": 31
}Structured outputs constrain the model to your schema so every response is valid, typed, and safe to hand to downstream code.
Designing the schema
{
"type": "object",
"required": ["id", "customer", "items", "total"],
"properties": {
"id": { "type": "string" },
"customer": {
"type": "object",
"required": ["name"],
"properties": { "name": { "type": "string" } }
},
"items": {
"type": "array",
"items": {
"type": "object",
"required": ["sku", "quantity", "unit_price"],
"properties": {
"sku": { "type": "string" },
"quantity": { "type": "integer", "minimum": 1 },
"unit_price": { "type": "number" }
}
}
},
"total": { "type": "number" }
}
}{
"id": "ord_a1f8",
"customer": { "name": "Maria Chen" },
"items": [
{ "sku": "TS-001", "quantity": 2, "unit_price": 24.00 },
{ "sku": "MG-002", "quantity": 1, "unit_price": 12.00 }
],
"total": 60.00
}{▮The schema is the contract. Required fields, types, and enums decide what the model is allowed to return.
The scenario
Your company receives thousands of unstructured documents: invoices, support emails, resumes, scanned forms. Today a team copies fields into a spreadsheet by hand. Leadership wants an extraction service that reads each document and emits clean, typed JSON your downstream systems can rely on.
The hard part is not getting the model to answer, it is getting it to answer in exactly the shape you need, every time, and knowing when it got it wrong. You have been asked to build a small, trustworthy extractor: structured output, schema validation, retries on bad output, and a measured accuracy number.
Your role
You are an AI Engineer building a structured-extraction service. Your goal is a single module that reliably converts free text into validated, typed records, fails loudly when the model misbehaves, and reports how accurate it is.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What you'll build in this extraction task
This is a build-and-submit task. You build an extraction service that turns unstructured documents into validated, typed JSON, the workhorse pattern behind document processing, data entry automation, and any pipeline that feeds LLM output into real systems. The deliverable is one Python file: schema, extraction, validation, retries, and an accuracy report.
The lesson is reliability. Getting a model to answer is easy. Getting it to answer in exactly your schema, every time, and knowing when it failed, is the actual engineering. You will define a typed schema, constrain the model to it with structured outputs or tool calling, validate every response, re-ask when validation fails, and measure accuracy against a labeled set you build.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (provider integration, typed schema, structured extraction, validation, retry-on-invalid, and accuracy evaluation) and returns per-criterion feedback with evidence quoted from your code. The pass threshold is 65 percent and you can resubmit.