Build & submit taskBetaintermediate

Build a LangChain or LlamaIndex RAG Pipeline

Build a proof-of-concept Retrieval-Augmented Generation pipeline over a document of your choice: ingest and chunk the document, embed it into a vector store, wire up a retriever, and have an LLM answer questions grounded in the retrieved context. Submit your script or notebook and we grade it against a rubric.

4 hrs

Est. time

Outcomes

Rubric criteria

65%

Pass score

What you'll learn

Skills you'll have real reps in after shipping this.

RAG Pipeline Fundamentals

Understand the architecture and core components of a Retrieval-Augmented Generation system.

LangChain / LlamaIndex Proficiency

Get hands-on experience with either LangChain or LlamaIndex for building LLM applications.

Document Processing & Chunking

Learn techniques for loading, parsing, and splitting text documents for RAG applications.

Vector Stores & Embeddings

Understand the role of vector embeddings and in-memory vector databases in semantic search and context retrieval.

LLM Integration

Learn how to integrate and use a Large Language Model inside a custom application pipeline.

See how it works

The two-phase RAG pipeline

the two-phase RAG pipeline

OFFLINE · runs once per document (can be slow)

›››

Load

Pull raw documents (PDFs, pages, tickets) and extract clean text.

index ▸VECTOR STORE◂ queryboth halves meet here

Build the index once; query it forever. Index time is the slow, offline half: every document is loaded, chunked, embedded, and written to the store, and you only redo it when the corpus changes. Query time is the fast, online half that runs on every question: embed the query, find the nearest chunks, and hand them to the LLM as grounding. The vector store is the seam where the two pipelines connect.

RAG is two pipelines that meet at the vector store. Toggle between index time (offline, once per document) and query time (online, every question), and click any stage to see its job.

Why chunking matters

Chunking with overlap

7 chunks

▰ chunk▰ overlap (shared with previous)

chunk size6

overlap2

balanced

Overlap is insurance; size is a tradeoff. Step forward by (size − overlap), so each window shares a few tokens with the last. That overlap means a sentence sitting on a boundary still appears whole in one chunk. Size is the real dial: small chunks match precisely but arrive with little surrounding context, big chunks carry context but dilute what actually matched the query.

How you split the document decides what the retriever can find: chunks too big blur topics together, too small and they lose the context an answer needs.

The scenario

Your company, InnovateTech Solutions, provides technical support and product information. The team currently answers customer questions by manually searching a growing library of internal documentation, and that lookup is getting slower as the library grows. Leadership wants to use Generative AI to automate answering questions over a fixed set of documents.

You have been asked to build a proof-of-concept RAG pipeline. It should let a support agent ask a natural-language question about the contents of one internal document (a product manual, an FAQ list, or a technical spec sheet) and get back an answer that is generated by an LLM and grounded in that document.

Your role

You are an AI Engineer responsible for the foundational components of this document Q&A system. Your goal is a functional RAG pipeline built with either LangChain or LlamaIndex, integrating an LLM and a retriever over a document you provide.

Start the task to unlock the full brief

You'll get the step-by-step requirements, setup commands, the 6-criterion grading rubric, tips, and the ability to submit your solution for instant AI grading.

Free to start · submit when you're ready

Learning resources

What is retrieval-augmented generation?

Concept overview of RAG and why grounding matters.

ibm.com

LangChain: Build a Retrieval Augmented Generation (RAG) App

Official end-to-end RAG tutorial with loaders, splitters, vector stores, and retrievers.

python.langchain.com

LlamaIndex: Starter Tutorial

Official quickstart: build an index over your documents and query it.

docs.llamaindex.ai

What you'll build in this RAG pipeline task

This is a build-and-submit task, not a guided lab. You build a working Retrieval-Augmented Generation (RAG) pipeline on your own machine or notebook, then upload it to be graded against a transparent rubric. RAG is the pattern behind almost every production document Q&A system: instead of hoping a model remembers your internal docs, you retrieve the relevant passages at query time and feed them to the model as context, so the answer is grounded in your source material rather than the model's training data.

You choose the document and the framework. Load a product manual, FAQ, or spec sheet, split it into chunks with a sensible strategy, embed those chunks into an in-memory vector store (FAISS, Chroma, or your framework's built-in store), wire up a retriever, and integrate an LLM that answers questions using the retrieved context. You then demonstrate the pipeline by answering at least three questions. The deliverable is a single Python script or Jupyter notebook, the same artifact you would commit to a real proof-of-concept repository.

Grading is rubric-based and explainable. Each criterion (framework setup, ingestion and chunking, vector store and embeddings, retriever, LLM integration, and the demonstration) carries a weight, and you get a per-criterion score with evidence pulled from your own submission plus specific feedback on what to improve. The pass threshold is 65 percent. You can resubmit. The point is not the number: it is the feedback and the portfolio-ready artifact you walk away with.

Frequently asked questions

Do I build this in a sandbox or on my own machine?

On your own machine or in any notebook environment you like (local, Colab, etc.). This is a take-home style task: you build the pipeline wherever you are comfortable, then upload the script or notebook here for grading. There is no provisioned sandbox, which is exactly why it mirrors real project work.

Should I use LangChain or LlamaIndex?

Either. Both are first-class for RAG. Pick the one you want to learn or already know, and stay within its idioms. The rubric rewards using a framework's components correctly, not which framework you chose.

How is my submission graded?

Your submission is checked against a fixed rubric with six weighted criteria. It runs deterministic checks first (does the file parse, does it import a supported framework, does it demonstrate at least three questions) and then scores each criterion qualitatively, returning evidence quoted from your submission and concrete feedback. Your score is the sum of the per-criterion points; 65 percent or above passes.

What counts as a complete submission?

A single Python script (.py) or Jupyter notebook (.ipynb) that ingests and chunks a document, embeds it into a vector store, retrieves relevant chunks for a query, and uses an LLM to answer at least three questions grounded in the retrieved context. Printing the retrieved context next to each answer makes your grounding visible and helps your score.

Can I resubmit if I do not pass?

Yes. The feedback is the point. Read the per-criterion notes, fix the weak spots, and submit again. Submissions are rate-limited to keep grading costs sane, but you have plenty of attempts.