Build a Reliable, Long-Context Claude Agent with Crash Recovery
Build an agent that survives long context and failures: mitigate lost-in-the-middle, use a scratchpad for large work, recover from a crash via a manifest and exported state, propagate structured error context, route a stratified sample to human review, and track provenance to resolve source conflicts. Submit a single script or notebook for instant, rubric-based feedback.
3.5 hrs
Est. time
4
Outcomes
7
Rubric criteria
65%
Pass score
What you'll learn
Skills you'll have real reps in after shipping this.
See how it works
Lost in the middle
A model recalls content at the start and end of a long context far better than content buried in the middle. Layout is a reliability decision.
Scratchpad over raw history
Writing findings to external memory and re-injecting a summary keeps the working context small, so long jobs do not degrade as they fill up.
The scenario
Your agent processes long documents and large codebases. It works on small inputs and then quietly gets worse on big ones: it misses facts buried in the middle of a long context, it loses its place when the process is killed halfway through, and when two sources disagree it picks one with no record of why. A reviewer has no idea which outputs to spot-check.
You are going to make it reliable. Lay out context so the important parts are not lost, keep a scratchpad for long work, checkpoint state so a crash can resume, propagate errors with enough context to debug, route a representative sample to humans, and track where every fact came from.
Your role
You are a Claude solutions architect responsible for reliability. Your deliverable is one module that processes a long input dependably: lost-in-the-middle mitigation, a scratchpad, crash-safe state, structured error propagation, stratified human review, and provenance with conflict resolution.
Start the task to unlock the full brief
You'll get the step-by-step requirements, setup commands, the 7-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.
Free to start · submit when you're ready
Learning resources
What you'll build in this context-and-reliability task
This is a build-and-submit task, not a guided lab. You make a Claude agent reliable on the inputs that break naive ones: long documents, large codebases, and runs that get interrupted. The deliverable is one Python module that lays out context to avoid lost-in-the-middle, keeps a scratchpad, checkpoints state for crash recovery, and tracks where every fact came from.
The work here is what separates a demo from something you can leave running. You place important context where the model actually uses it, you offload intermediate findings to a scratchpad so the working context stays small, you checkpoint to a manifest so a killed run resumes instead of restarting, you propagate structured error context, you route a representative stratified sample to human review, and you record provenance so source conflicts are resolved with a rule and a trail.
Grading is rubric-based and explainable. Your submission is scored against weighted criteria (SDK integration, lost-in-the-middle, scratchpad, crash recovery, error propagation, stratified review, and provenance) with per-criterion feedback quoted from your code. The pass threshold is 65 percent and you can resubmit. These are the context-management and reliability skills the Claude Certified Architect exam tests.