Build & submit taskBetaintermediate

Build a Structured-Output Extraction Service

Turn messy text into validated, typed JSON. Build an extraction service that uses an LLM to pull structured fields from unstructured documents, validates the output against a schema, retries when the model returns something invalid, and measures its own accuracy on a small labeled set. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Structured outputs

Constrain an LLM to emit a typed schema with json_schema, tool calling, or a helper like instructor.

Validation as control flow

Validate against the schema and treat failures as a recoverable state, not an exception to ignore.

Retry-on-invalid loops

Feed validation errors back to the model and re-ask within a bounded attempt budget.

Measuring extraction quality

Build a labeled set and compute field-level or exact-match accuracy so the service is testable.

See how it works

From free text to typed JSON

Structured output with Pydantic

Your Pydantic model

class Person(BaseModel):
    name: str
    age: int

→ response_format json_schema

{ "name": str, "age": int },
  strict, no extra keys

Model returns

{
  "name": "Dana",
  "age": 31
}

Person ✓parsed, typed, no json.loads

Input"Dana is 31 years old."

Let the type be the contract. client.beta.chat.completions.parse(response_format=Person) builds the strict schema from your model and parses the reply back into a Person. No "output only JSON" pleading, no hand-parsing, validation is the check.

Structured outputs constrain the model to your schema so every response is valid, typed, and safe to hand to downstream code.

Designing the schema

User message

Order from Maria Chen. Two black t-shirts, SKU TS-001 at $24 each, plus one mug SKU MG-002 at $12. Charge her card.

Order schema (JSON Schema)

{
  "type": "object",
  "required": ["id", "customer", "items", "total"],
  "properties": {
    "id":       { "type": "string" },
    "customer": {
      "type": "object",
      "required": ["name"],
      "properties": { "name": { "type": "string" } }
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["sku", "quantity", "unit_price"],
        "properties": {
          "sku":        { "type": "string" },
          "quantity":   { "type": "integer", "minimum": 1 },
          "unit_price": { "type": "number" }
        }
      }
    },
    "total": { "type": "number" }
  }
}

Model output

matches schema

{
  "id": "ord_a1f8",
  "customer": { "name": "Maria Chen" },
  "items": [
    { "sku": "TS-001", "quantity": 2, "unit_price": 24.00 },
    { "sku": "MG-002", "quantity": 1, "unit_price": 12.00 }
  ],
  "total": 60.00
}

Schema-locked mode runs constrained decoding against the JSON Schema. Every token that would violate the schema gets zero probability before sampling. The keys, types, and nesting are guaranteed at the wire format.

Inside one decode step (schema as a token mask)

Candidate tokens for the next position

{["I

Schema says root must be an object. Only "{" is legal.

Emitted so far

{▮

1 / 5

Schema-locked mode runs constrained decoding inside the provider: at every token, illegal candidates are masked before sampling. The output is guaranteed to parse and match the shape. The contents are still whatever the model predicts.

The schema is the contract. Required fields, types, and enums decide what the model is allowed to return.

The scenario

Your company receives thousands of unstructured documents: invoices, support emails, resumes, scanned forms. Today a team copies fields into a spreadsheet by hand. Leadership wants an extraction service that reads each document and emits clean, typed JSON your downstream systems can rely on.

The hard part is not getting the model to answer, it is getting it to answer in exactly the shape you need, every time, and knowing when it got it wrong. You have been asked to build a small, trustworthy extractor: structured output, schema validation, retries on bad output, and a measured accuracy number.

Your role

You are an AI Engineer building a structured-extraction service. Your goal is a single module that reliably converts free text into validated, typed records, fails loudly when the model misbehaves, and reports how accurate it is.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

Pydantic documentation

Typed models, validation, and error handling.

docs.pydantic.dev

OpenAI structured outputs

Constrain responses to a JSON Schema.

platform.openai.com

instructor: structured LLM outputs

Pydantic-validated extraction with automatic retries.

python.useinstructor.com

What you'll build in this extraction task

This is a build-and-submit task. You build an extraction service that turns unstructured documents into validated, typed JSON, the workhorse pattern behind document processing, data entry automation, and any pipeline that feeds LLM output into real systems. The deliverable is one Python file: schema, extraction, validation, retries, and an accuracy report.

The lesson is reliability. Getting a model to answer is easy. Getting it to answer in exactly your schema, every time, and knowing when it failed, is the actual engineering. You will define a typed schema, constrain the model to it with structured outputs or tool calling, validate every response, re-ask when validation fails, and measure accuracy against a labeled set you build.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (provider integration, typed schema, structured extraction, validation, retry-on-invalid, and accuracy evaluation) and returns per-criterion feedback with evidence quoted from your code. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

Do I have to use Pydantic?

It is the recommended path because it gives you typed validation and clear errors, and pairs with libraries like instructor for automatic retries. An explicit JSON Schema with manual validation also satisfies the rubric. The point is a typed contract, however you express it.

What documents should I extract from?

Anything unstructured with a few fields worth pulling out: invoices, emails, resumes, product reviews. Keep it to a small built-in set of three or more so your accuracy report is easy to compute and the run stays fast.

How is the retry loop graded?

The grader checks that an invalid model response does not crash the service and instead triggers a bounded retry that feeds the validation error back to the model. Including one deliberately tricky document makes this path actually execute.

What counts as a complete submission?

A single .py or .ipynb that defines a typed schema, extracts it from text with a structured-output mechanism, validates and retries on failure, and prints an accuracy report over a small labeled set.