Component	Purpose	Key Details
Positional Encoding	Preserve sequence order	Sinusoidal: PE(pos,2i) = sin(pos/10000^(2i/d))
Self-Attention	Model token relationships	Computes Q, K, V projections
Multi-Head Attention	Different representation subspaces	Typically 8-16 heads, d_model/h per head
Feed-Forward Network	Position-wise transformation	2 layers: d_model → 4×d_model → d_model
Layer Normalization	Stabilize training	Normalizes across features (not batch)
Residual Connections	Enable deep networks	x + Sublayer(x) pattern

Type	Attention	Use Cases	Examples
Encoder-Only	Bidirectional (no masking)	Classification, NER, embeddings	BERT, RoBERTa
Decoder-Only	Causal/masked (autoregressive)	Text generation, completion	GPT-3/4, Llama 2/3
Encoder-Decoder	Both types + cross-attention	Translation, summarization	T5, BART, mT5

Technique	Description	Token Overhead	Use Case
Zero-Shot	Direct instruction, no examples	10-50	Simple, well-defined tasks
Few-Shot	2-10 examples in prompt	100-500	Pattern learning, formatting
Chain-of-Thought (CoT)	"Let's think step by step"	50-200	Reasoning, math problems
Self-Consistency	Sample multiple CoT, vote	500+	Complex reasoning, verification
ReAct	Reasoning + Acting cycles	Variable	Tool use, multi-step tasks

Method	Trainable Params	Memory	Speed	Best For
Full Fine-Tuning	100%	Highest	Slowest	Maximum customization, large datasets
LoRA	<1% (0.1-1%)	Low	Fast	Most tasks, limited GPU
QLoRA	<1%	Lowest	Fast	Large models (65B+) on consumer GPU
Adapter Layers	<10%	Medium	Medium	Multi-task, modular approaches
Prefix Tuning	<0.1%	Very Low	Very Fast	Prompt-like tuning

Metric	Range	Higher Better?	Formula/Notes
Perplexity	[1, ∞)	❌ Lower	PPL = exp(-1/n × Σ log P(w_i\|context))
BLEU	[0, 1]	✅ Higher	N-gram precision (1-4 grams) with brevity penalty
ROUGE-L	[0, 1]	✅ Higher	Longest Common Subsequence F1 score
BERTScore	[0, 1]	✅ Higher	Cosine similarity of BERT embeddings
METEOR	[0, 1]	✅ Higher	Harmonic mean of precision/recall with synonyms

Chain	Purpose	Example
`LLMChain`	Single LLM call with prompt	Q&A, classification
`SequentialChain`	Chain multiple LLM calls	Multi-step reasoning
`RetrievalQA`	RAG pipeline	Knowledge-grounded QA
`ConversationalRetrievalChain`	RAG with chat history	Chatbots
`AgentExecutor`	Dynamic tool selection	Task automation

Parameter	Effect	Typical Values
`temperature`	Randomness (higher = more random)	0.7-1.0
`top_p`	Nucleus sampling (cumulative prob)	0.9-0.95
`top_k`	Sample from top K tokens	40-50
`max_length`	Maximum tokens to generate	Task-dependent
`num_beams`	Beam search width (1 = greedy)	1-5

Component	Options	Best Choice
Embedding Model	OpenAI Ada-002, sentence-transformers	Ada-002: general, S-T: domain-specific
Vector DB	Pinecone, Weaviate, ChromaDB, FAISS	Pinecone: managed, FAISS: local
Chunking	Fixed (512), Semantic, Recursive	Semantic: best context
Retrieval	Semantic only, Hybrid (semantic + BM25)	Hybrid: best accuracy

Tool	Purpose	CPU Equivalent
`cuDF`	DataFrame operations	pandas
`cuML`	Machine learning	scikit-learn
`cuGraph`	Graph analytics	NetworkX
`cuPy`	Array operations	NumPy

Method	Used By	Vocabulary	Strengths
BPE	GPT-2/3/4	50K	Efficient, handles rare words
WordPiece	BERT	30K	Maximizes training likelihood
SentencePiece	T5, Llama	Variable	Language-agnostic, no pre-tokenization
Unigram	XLNet	Variable	Probabilistic subwords

Risk	Mitigation	Implementation
Hallucinations	RAG, citations, verification	Ground in knowledge base
Bias	Diverse data, fairness testing	Test across demographics
PII Leakage	Input/output filtering	Regex + NER models
Prompt Injection	Input validation, sandboxing	Detect malicious patterns
Toxic Content	Content moderation	Perspective API, classifiers

❌ Wrong	✅ Right
"Self-attention reduces complexity"	"Self-attention captures long-range dependencies"
"LoRA updates all parameters"	"LoRA adds trainable low-rank matrices, freezes base"
"BLEU measures semantics"	"BLEU measures n-gram overlap, BERTScore is semantic"
"Encoder uses causal masking"	"Decoder uses causal masking, encoder is bidirectional"
"Higher perplexity is better"	"Lower perplexity is better (less surprised)"

Domain 1: Core ML & AI Knowledge (30%)

Transformer Architecture Overview

Self-Attention Formula

Self-Attention Visualization

Multi-Head Attention

Multi-Head Attention Visualization

Component Comparison Table

Model Architecture Types

Attention Mechanism Steps

Attention Mechanism Calculator

LLM Training & Scaling

Scaling Laws

Parameter Count Estimation

Memory Requirements (Inference)

Model Memory Calculator

Domain 2: Experimentation (22%)

Prompt Engineering Techniques

Fine-Tuning Methods

LoRA (Low-Rank Adaptation) Details

LoRA Core Concept

LoRA Parameter Efficiency

LoRA Parameter Efficiency Calculator

QLoRA Optimization

Evaluation Metrics

LLM Evaluation Metrics Quick Reference

Perplexity (PPL)

BLEU Score

ROUGE-L F1 Score

ROUGE-L Score Ranges

Domain 3: Software Development (24%)

NVIDIA Platform Tools

LangChain Essentials

Hugging Face Transformers

Domain 4: RAG Architecture & Data (14%)

RAG Pipeline

Chunking Strategies

Retrieval Optimization

RAPIDS & cuDF

Tokenization Methods

Master These Concepts with Practice

Domain 5: Trustworthy AI (10%)

Common Risks & Mitigations

Content Filtering

Bias Detection

Hallucination Prevention

Quick Command Reference

Model Loading & Optimization

Tokenizer Operations

Training Configuration

Exam Strategy Quick Tips

Time Management

Common Mistake Patterns

Formula Quick Reference

Domain Coverage Checklist

Additional Resources

Ready to Pass the NCA-GENL Exam?

More NCA-GENL Articles

How to Pass NCA-GENL on Your First Attempt (2026 Tips)

NCA-GENL Exam Domains 2026: Weights, Topics & Study Strategy

NCA-GENL 4-Week Study Plan: Week-by-Week Preparation Guide