Question 1

Why build hybrid + reranking when a strong dense encoder like BGE gets most queries right?

Accepted Answer

Because 'most' isn't good enough for enterprise RAG. BGE-small is excellent on semantic paraphrase but still fragile on rare proper nouns, acronyms, product codes, and exact-match keyword queries — the exact patterns BM25 was designed for. Hybrid via RRF costs you one extra sparse retriever (milliseconds, no GPU) and strictly dominates either component alone. Reranking on top adds joint query-passage attention to the top-K and tightens precision where it matters most — the ranked list you actually pass to the LLM.

Question 2

Why Reciprocal Rank Fusion instead of score averaging or min-max normalization?

Accepted Answer

Because dense cosine scores and BM25 scores live in incompatible distributions — cosine is bounded in roughly [-1, 1] while BM25 scales with IDF and document length and can easily exceed 20. Any normalization you pick (min-max, z-score, linear scaling) is a brittle hyperparameter that drifts as the corpus changes. RRF throws scores away entirely and fuses ranks with `1 / (k + rank)`, `k=60`. It is calibration-free, monotonic, and what every modern retrieval paper defaults to.

Question 3

Why rerank only the top-5 and not the top-100?

Accepted Answer

Cost. A cross-encoder must run the full transformer over every `(query, passage)` pair with no caching — latency scales linearly with the candidate count. Top-5 gives you visible precision gains at essentially zero extra wall-clock budget; top-100 is where you start hurting the end-user. In production you tune K to your SLA: 10-50 is typical, 100+ only when recall matters more than latency and you can batch aggressively on the GPU.

Question 4

Does BM25 really need to be written from scratch — isn't there a library?

Accepted Answer

There are several (rank_bm25, pyserini, Lucene bindings) and you absolutely use them in production. The lab implements it by hand because the formula — IDF weighting, term-frequency saturation, length normalization — is the entire reason BM25 beats naive keyword matching, and you should be able to read it off a page. Once you've written the 30-line version, swapping in a fast library is a trivial optimization, not a conceptual step.

Question 5

What if my dense encoder already gets the rare-keyword query right?

Accepted Answer

The check script handles that case: BGE-small is strong enough that on a tiny corpus it sometimes succeeds on exactly the query you designed to break it. The pipeline still demonstrates the full hybrid + rerank path either way. At real corpus scale (hundreds of thousands of passages) and with naturally adversarial queries, dense-only regresses frequently enough that hybrid remains the production default — this lab gives you the instrumentation to see it.

Question 6

How is the cross-encoder different from just a bigger bi-encoder?

Accepted Answer

Architecture, not size. A bi-encoder encodes query and passage independently and compares fixed vectors — attention never flows between them. A cross-encoder concatenates query and passage into a single sequence, so every query token attends to every passage token through all layers. That joint attention captures reasoning about negation, quantifiers, and co-reference that no amount of embedding-dimensionality can express. Cost: you cannot precompute passage vectors, so you only run it on a short candidate list.

Advanced RAG: Hybrid Search + Cross-Encoder Reranking

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this hybrid retrieval + reranking lab

Frequently asked questions