Build & submit taskBetaintermediate

Build a RAG-Powered Support Assistant

Go past a single-shot pipeline and build a real support assistant: ingest a small set of docs, retrieve grounded context, answer questions with inline citations, and say 'I don't have that information' when the answer is not in the docs. A short multi-turn loop. Submit a single script or notebook for instant, rubric-based feedback.

4 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

Grounded generation with citations

Tie every answer to its retrieved source so it can be verified.

Refusal and fallback

Detect when retrieval did not surface an answer and decline to guess.

Multi-document retrieval

Ingest and search across a corpus, not a single document.

Conversational RAG

Run retrieval inside a short multi-turn loop a user could actually use.

See how it works

The two-phase RAG pipeline

the two-phase RAG pipeline

OFFLINE · runs once per document (can be slow)

›››

Load

Pull raw documents (PDFs, pages, tickets) and extract clean text.

index ▸VECTOR STORE◂ queryboth halves meet here

Build the index once; query it forever. Index time is the slow, offline half: every document is loaded, chunked, embedded, and written to the store, and you only redo it when the corpus changes. Query time is the fast, online half that runs on every question: embed the query, find the nearest chunks, and hand them to the LLM as grounding. The vector store is the seam where the two pipelines connect.

Index time builds the store once; query time retrieves and grounds every answer. Your assistant lives on the query-time half.

Grounding and citations

grounding and citations

click a [n] in the answer

context (retrieved chunks)

[1]

The v3 billing API allows 120 requests per minute per API key.

billing-v3.md

[2]

Exceeding a rate limit returns HTTP 429 with a Retry-After header.

errors.md

[3]

API keys are scoped per project and can be rotated in the dashboard.

auth.md

answer

The v3 billing API permits 120 requests per minute per key, and going over returns a 429 with a Retry-After header. Limits reset at midnight UTC [no source].

"Limits reset at midnight UTC" appears in no chunk. It is an ungrounded claim the model invented.

Every claim should trace to a chunk. Retrieval found the evidence; grounding is the prompt discipline that makes the model use it and only it. You number the chunks, instruct the model to answer from the context and cite the number behind each claim, and now every sentence is checkable: click a citation and you land on its source. The sentence with no citation is exactly the failure mode to hunt, a confident assertion the context never supported.

A trustworthy assistant points at the chunk its answer came from, and declines when no chunk supports an answer.

The scenario

Your company's support team answers the same questions over and over from a handful of internal documents: a product FAQ, a returns policy, a setup guide. They want an assistant that answers from those documents and only those documents, with a citation so an agent can verify the source, and an honest 'I don't know' when the answer is not there.

You have already seen a basic RAG pipeline. This is the application: multiple documents, grounded answers with citations, a refusal path when the context does not contain the answer, and a small conversational loop a support agent could actually use.

Your role

You are an AI Engineer building a grounded support assistant. Your goal is a single module that retrieves from a small corpus, answers with citations, refuses to guess when the answer is absent, and runs as a short multi-turn conversation.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

LangChain: build a RAG app

Loaders, retrievers, and grounded generation.

python.langchain.com

LlamaIndex starter tutorial

Index documents and query them.

docs.llamaindex.ai

LangChain: add citations to RAG

Return the sources behind an answer.

python.langchain.com

What you'll build in this support-assistant task

This is a build-and-submit task that takes RAG from a single-shot pipeline to a real application. You build a support assistant over a small document corpus that answers with inline citations, refuses to guess when the answer is not in the docs, and runs as a short conversation. The deliverable is one Python file you could grow into a production assistant.

The difference between a RAG demo and a RAG product is trust. Your assistant must ground every answer in retrieved text, point at the source so a human can verify it, and say 'I do not have that information' rather than inventing one. You will ingest multiple documents, retrieve across them, thread the source through to the answer, and prove the refusal path works.

Grading is rubric-based and explainable. Your submission is scored against weighted criteria (framework setup, multi-document ingestion, retrieval, grounded citations, honest fallback, and the multi-turn demonstration) and returns per-criterion feedback with evidence quoted from your code. The pass threshold is 65 percent and you can resubmit.

Frequently asked questions

How is this different from the intro RAG pipeline task?

The intro task builds a single-function pipeline that answers questions about one document. This is the application layer on top: multiple documents, inline citations so answers are verifiable, an honest refusal path when retrieval comes up empty, and a short multi-turn loop. It is the difference between a pipeline and an assistant.

Do I need a hosted vector database?

No. An in-memory store such as FAISS or Chroma is ideal for a small corpus and keeps the focus on grounding, citations, and the refusal path rather than database operations.

How is the fallback graded?

The grader checks that you instruct the model to answer only from retrieved context and that your demonstration includes an out-of-corpus question where the assistant declines instead of hallucinating. Showing both an answered and an unanswerable question is the proof.

What counts as a complete submission?

A single .py or .ipynb that ingests at least three documents, retrieves across them, answers with citations, falls back honestly when the answer is absent, and demonstrates both an in-corpus and an out-of-corpus question.