Build & submit taskBetaintermediate

Build a Structured-Output Extraction & Review Pipeline with Claude

Extract validated, typed JSON from messy text using Claude's tool_use with a JSON schema: explicit criteria, few-shot examples, a validation-and-retry loop, nullable and enum fields for uncertainty, a multi-instance review pass for unbiased scoring, and a Batch API path for bulk. Submit a single script or notebook for instant, rubric-based feedback.

3 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Structured output via tool_use

Forcing a tool call with a JSON schema is the reliable way to get typed output from Claude, far better than parsing free text.

Explicit criteria and few-shot

Precise field definitions plus 2-4 examples cut false positives more than any amount of vague guidance.

Validate and retry

Schema validation with a retry that feeds the error back catches the rare malformed extraction.

Honesty and scale

Nullable and enum fields keep the model from inventing data; multi-instance review measures confidence; the Batch API makes bulk cheap.

See how it works

Why structured output beats parsing

Structured output with Pydantic

Your Pydantic model

class Person(BaseModel):
    name: str
    age: int

→ response_format json_schema

{ "name": str, "age": int },
  strict, no extra keys

Model returns

{
  "name": "Dana",
  "age": 31
}

Person ✓parsed, typed, no json.loads

Input"Dana is 31 years old."

Let the type be the contract. client.beta.chat.completions.parse(response_format=Person) builds the strict schema from your model and parses the reply back into a Person. No "output only JSON" pleading, no hand-parsing, validation is the check.

Forcing a tool call with a schema makes the model return typed fields directly, instead of prose you have to parse and repair.

Schema-shaped extraction

User message

Order from Maria Chen. Two black t-shirts, SKU TS-001 at $24 each, plus one mug SKU MG-002 at $12. Charge her card.

Order schema (JSON Schema)

{
  "type": "object",
  "required": ["id", "customer", "items", "total"],
  "properties": {
    "id":       { "type": "string" },
    "customer": {
      "type": "object",
      "required": ["name"],
      "properties": { "name": { "type": "string" } }
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["sku", "quantity", "unit_price"],
        "properties": {
          "sku":        { "type": "string" },
          "quantity":   { "type": "integer", "minimum": 1 },
          "unit_price": { "type": "number" }
        }
      }
    },
    "total": { "type": "number" }
  }
}

Model output

matches schema

{
  "id": "ord_a1f8",
  "customer": { "name": "Maria Chen" },
  "items": [
    { "sku": "TS-001", "quantity": 2, "unit_price": 24.00 },
    { "sku": "MG-002", "quantity": 1, "unit_price": 12.00 }
  ],
  "total": 60.00
}

Schema-locked mode runs constrained decoding against the JSON Schema. Every token that would violate the schema gets zero probability before sampling. The keys, types, and nesting are guaranteed at the wire format.

Inside one decode step (schema as a token mask)

Candidate tokens for the next position

{["I

Schema says root must be an object. Only "{" is legal.

Emitted so far

{▮

1 / 5

Schema-locked mode runs constrained decoding inside the provider: at every token, illegal candidates are masked before sampling. The output is guaranteed to parse and match the shape. The contents are still whatever the model predicts.

The schema is the contract: nullable fields and enums let the model represent uncertainty honestly rather than inventing values.

The scenario

You need to turn a pile of free-text records (support emails, invoices, or resumes) into clean structured rows for a database. A first attempt prompts Claude for JSON and parses the reply, but it returns prose half the time, guesses values it should leave blank, and there is no way to know how accurate it is.

You are going to build it properly: force structured output with a schema, give the model explicit extraction criteria and a couple of examples, validate and retry on bad output, represent uncertainty honestly, and measure accuracy. Then make it cheap at scale with the Batch API.

Your role

You are a Claude solutions architect building a production extraction service. Your deliverable is one module that extracts schema-valid JSON reliably, handles uncertainty honestly, measures its own accuracy, and scales cheaply.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 8-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

Anthropic API: tool use

Forcing a tool call to get JSON-schema-shaped output.

docs.anthropic.com

Anthropic Message Batches API

Bulk processing, cost savings, and constraints.

docs.anthropic.com

Pydantic

Typed models and validation for extracted data.

docs.pydantic.dev

What you'll build in this structured-output task

This is a build-and-submit task, not a guided lab. You build a production extraction service on Claude: structured output forced through tool_use with a JSON schema, guided by explicit criteria and a few examples, validated and retried, with uncertainty represented honestly and accuracy measured. The deliverable is one Python module you could point at a real backlog of records.

The techniques here are the ones that make extraction trustworthy. You stop parsing free-text JSON and force a schema-shaped tool call, you write precise criteria instead of vague guidance, you validate and retry, you use nullable fields and enums so the model does not fabricate values, you run a multi-instance review to estimate confidence, and you add a Batch API path so bulk work is roughly half the cost.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (SDK integration, structured tool_use, criteria and few-shot, validation and retry, nullable and enum, the Batch API path, multi-instance review, and accuracy) with per-criterion feedback quoted from your code. The pass threshold is 65 percent and you can resubmit. These are the prompt-engineering and structured-output skills the Claude Certified Architect exam tests.

Frequently asked questions

Why tool_use instead of just asking for JSON?

Asking for JSON in prose means you have to parse and repair it, and the model can drift back into prose. Forcing a tool call whose input_schema is your target schema makes Claude return typed fields directly, which is the reliable production pattern.

What is multi-instance review for?

Running the same extraction several times with independent calls and comparing the results gives you an unbiased aggregate and a confidence signal. Where the instances agree, you can trust the value; where they disagree, you flag it for review.

When should I use the Batch API?

For bulk, non-interactive extraction where you can tolerate a turnaround of up to 24 hours. It is roughly 50 percent cheaper, but a batched request has no multi-turn, so it suits one-shot extraction rather than conversations.

What counts as a complete submission?

A single .py or .ipynb on the Anthropic SDK that forces structured output via tool_use with a schema, uses explicit criteria and few-shot, validates and retries, uses nullable and enum fields, runs a multi-instance review, measures accuracy on a labeled set, and includes a Batch API path.