Preporato

AIP-C01 Study Guide

Foundation Model Integration, Data Management, and ComplianceGenAI Solution Design and ArchitectureUse Case Analysis and FM Selection Criteria

Key Concepts

  • Matching models to business requirements

  • Performance benchmarks evaluation

  • Cost-capability tradeoff analysis

  • Regulatory and compliance requirements

  • Model availability and support

Use Case Analysis and FM Selection Criteria

Overview

Selecting the right foundation model (FM) for your GenAI application is a strategic decision that impacts cost, performance, accuracy, and user experience. With Amazon Bedrock offering access to dozens of models from multiple providers, you need a systematic approach to narrow candidates and validate choices.

This topic covers the framework for matching models to business requirements, evaluating performance benchmarks, analyzing cost-capability tradeoffs, and considering regulatory requirements. For the AIP-C01 exam, understanding model selection criteria is essential for recommending appropriate solutions.

Key Principle

There is no universally "best" model - the optimal choice depends on your specific use case requirements, latency constraints, budget, and compliance needs. Model selection should be data-driven through evaluation and benchmarking.

Exam Tip

Exam questions often present scenarios with specific requirements (latency, cost, capability) and ask which model to recommend. Know the strengths of each model family and how to match them to use cases.


Architecture Diagram

The following diagram illustrates the model selection decision framework:

Foundation Model Selection Framework
Figure 1: Multi-dimensional framework for evaluating and selecting foundation models

Key Concepts

Matching Models to Business Requirements

Requirements Analysis

Before evaluating models, define your requirements across these dimensions:

Functional Requirements:

  • Primary task type (generation, classification, extraction)
  • Input/output modalities (text, image, audio)
  • Language support requirements
  • Context length needs
  • Output format requirements (structured JSON, prose)

Non-Functional Requirements:

  • Latency targets (real-time vs batch)
  • Throughput requirements (requests per minute)
  • Availability requirements (SLA)
  • Budget constraints (monthly spend)

Compliance Requirements:

  • Data residency needs
  • Industry regulations (HIPAA, PCI, SOC2)
  • Content safety requirements
  • Audit and logging needs

Model Filtering Process

AWS recommends a systematic filtering approach:

  1. Hard Requirements Filter

    • Use Bedrock Model Information API
    • Filter by modality, context length, languages
    • Exclude models below minimum thresholds
    • Typically reduces candidates from dozens to 3-7 models
  2. Capability Assessment

    • Task-specific performance evaluation
    • Benchmark against your actual use cases
    • Consider fine-tuning requirements
  3. Cost-Performance Analysis

    • Calculate theoretical costs at projected scale
    • Factor in provisioned vs on-demand pricing
    • Account for token efficiency differences
  4. Operational Fit

    • Regional availability
    • API compatibility requirements
    • Support for needed features (streaming, tool use)

Model Selection Dimensions

DimensionKey FactorsTrade-off
Task PerformanceAccuracy, relevance, fluency, coherenceBetter performance often means higher cost/latency
ArchitectureContext length, modality, parameter sizeLarger models are more capable but slower
OperationsLatency, throughput, availabilityLower latency often requires provisioned capacity
Responsible AISafety, bias, hallucination rateStricter guardrails may limit creativity

Performance Benchmarks Evaluation

Evaluation Metrics

Key Metrics for Model Evaluation:

Quality Metrics:

  • Accuracy - Correctness of factual information
  • Relevance - How well response addresses the query
  • Fluency - Natural language quality
  • Coherence - Logical consistency within response
  • Hallucination Rate - Frequency of fabricated content

Performance Metrics:

  • Time to First Token (TTFT) - Initial response latency
  • Tokens per Second - Generation throughput
  • P50/P90/P99 Latency - Response time distribution

Safety Metrics:

  • Bias Detection - Fairness across demographics
  • Toxicity Rate - Harmful content generation
  • Refusal Accuracy - Appropriate handling of harmful requests

Amazon Bedrock Evaluations

Bedrock provides built-in evaluation capabilities:

Automatic Evaluation:

  • Compare model outputs for brand voice, friendliness, relevance
  • Assess RAG workflows for context relevance and correctness
  • No additional cost for automated metrics

LLM-as-a-Judge:

  • Use foundation models to evaluate other model outputs
  • Custom evaluation criteria and prompts
  • Evaluate for harmfulness, completeness, accuracy

Human Evaluation:

  • Manual review for nuanced feedback
  • $0.21 per completed human evaluation task
  • Useful for subjective quality assessment

Custom Evaluation Pipelines:

  • Integrate with Step Functions for automation
  • Use Lambda for custom scoring logic
  • Store results in S3/DynamoDB for analysis
Benchmark Best Practice

Never rely solely on public benchmarks. Model performance varies significantly by domain and task. Always evaluate with your actual data and use cases before selecting a model for production.

Cost-Capability Tradeoff Analysis

Cost Optimization Framework

Understanding Bedrock Pricing:

On-Demand Pricing:

  • Pay per input/output token
  • No upfront commitment
  • Subject to shared capacity limits
  • Best for: Variable workloads, experimentation

Batch Inference:

  • Up to 50% cost savings
  • Submit multiple prompts, retrieve from S3
  • Best for: Large-scale processing, overnight jobs

Provisioned Throughput:

  • Reserved capacity (model units)
  • Fixed hourly cost
  • Guaranteed performance
  • Required for: Custom models, consistent latency needs

Cost Optimization Strategies

Strategies to Reduce Costs:

  1. Intelligent Prompt Routing

    • Route simple queries to smaller, cheaper models
    • Reserve capable models for complex tasks
    • Can reduce costs by up to 30% without accuracy loss
  2. Model Distillation

    • Transfer knowledge from large "teacher" to small "student"
    • Student becomes performant for specific use cases
    • Significant cost reduction at inference time
  3. Token Efficiency

    • Optimize prompt design to reduce token usage
    • Use concise system prompts
    • Implement response length limits
  4. Caching

    • Cache common responses
    • Semantic similarity for cache hits
    • Reduce redundant model calls
  5. Right-Sizing

    • Don't use Opus when Haiku suffices
    • Benchmark smaller models first
    • Scale up only when needed

Model Cost-Performance Tiers

TierModelsUse CaseRelative Cost
EconomyClaude 3 Haiku, Llama 3 8B, Nova LiteSimple tasks, high volume$ (lowest)
BalancedClaude 3.5 Sonnet, Llama 3 70B, Nova ProMost production workloads$$ (medium)
PremiumClaude 3 Opus, Llama 3.1 405BComplex reasoning, research$$$ (highest)

Regulatory and Compliance Requirements

Compliance Considerations

Industry-Specific Requirements:

Healthcare (HIPAA):

  • PHI protection requirements
  • Audit logging mandatory
  • BAA required with AWS
  • Consider on-premises options for sensitive data

Financial Services (PCI-DSS, SOX):

  • PCI compliance for payment data
  • Audit trails for regulatory reporting
  • Data encryption requirements
  • Consider Private Link/VPC endpoints

Government (FedRAMP):

  • FedRAMP authorized regions only
  • Data sovereignty requirements
  • Specific model restrictions may apply

EU (GDPR, EU AI Act):

  • Data residency in EU regions
  • Right to explanation for AI decisions
  • High-risk AI system classifications
  • Transparency requirements

AWS Compliance Support

Bedrock Compliance Features:

  • VPC Endpoints - Private connectivity, no internet exposure
  • KMS Encryption - Customer-managed keys for data at rest
  • CloudTrail Logging - Complete API audit trail
  • Model Invocation Logging - Optional input/output logging
  • Data Processing - Models don't store customer data
  • Regional Availability - Deploy in compliant regions
  • AWS Artifact - Compliance reports and certifications

Model Availability and Support

Model Availability

Factors Affecting Availability:

Regional Availability:

  • Not all models available in all regions
  • Check region support before design
  • Plan for multi-region if needed

Model Lifecycle:

  • Models have versioning (v1, v2, etc.)
  • Older versions may be deprecated
  • Plan for migration/testing new versions

Access Requirements:

  • Some models require access request
  • Anthropic models may need use case details
  • Enterprise agreements for some providers

Provisioned Throughput:

  • Must be available for your region
  • Commitment terms (1 month, 6 months)
  • Limited capacity - plan ahead

Model Availability by Region (Sample)

Model FamilyUS East (N. Virginia)US West (Oregon)EU (Frankfurt)AP (Tokyo)
Claude 3 FamilyYesYesYesYes
Llama 3 FamilyYesYesYesYes
Amazon NovaYesYesYesYes
MistralYesYesYesLimited
CohereYesYesLimitedLimited

How It Works

Model Selection Decision Flow

Model Selection Decision Flow
Figure 2: Step-by-step decision process for selecting the optimal foundation model

Cost Analysis Workflow

Cost Analysis Workflow
Figure 3: Workflow for analyzing and comparing model costs for a workload

Use Cases

Use Case 1: Customer Service Chatbot

Scenario: Build a customer support chatbot with real-time responses, FAQ handling, and escalation for complex issues.

Requirements Analysis:

  • Latency: <1 second TTFT
  • Volume: 10,000 queries/day
  • Languages: English, Spanish
  • Integration: Knowledge base for FAQ
  • Budget: Cost-conscious

Model Selection Process:

  1. Filter: Exclude models with >2s latency
  2. Candidates: Claude 3 Haiku, Llama 3 8B, Nova Lite
  3. Evaluate: Test with sample customer queries
  4. Cost analysis: Calculate monthly spend at volume

Recommendation: Claude 3 Haiku

  • Excellent speed for real-time interaction
  • Strong reasoning for query understanding
  • Good empathy in responses
  • Cost-effective for high volume

Scenario: Analyze 200-page contracts for clause extraction and risk identification.

Requirements Analysis:

  • Context: 150K+ tokens per document
  • Accuracy: High (legal implications)
  • Latency: Batch acceptable (not real-time)
  • Compliance: Sensitive legal data

Model Selection Process:

  1. Filter: Exclude models with <200K context
  2. Candidates: Claude 3.5 Sonnet, Nova Pro
  3. Evaluate: Test clause extraction accuracy
  4. Compliance: VPC endpoint, encryption

Recommendation: Claude 3.5 Sonnet

  • 200K context window fits full documents
  • Strong reasoning for complex legal language
  • High accuracy for extraction tasks
  • Use batch inference for cost savings

Use Case 3: Multi-Language Content Generation

Scenario: Generate marketing content in 10+ languages for global campaign.

Requirements Analysis:

  • Languages: 12 languages including Asian
  • Quality: Native-level fluency
  • Volume: 5,000 pieces/month
  • Brand: Consistent voice across languages

Model Selection Process:

  1. Filter: Strong multilingual support required
  2. Candidates: Mistral Large, Claude 3.5, Llama 3
  3. Evaluate: Native speaker review per language
  4. Cost: Balance quality vs volume

Recommendation: Mistral Large

  • Excellent multilingual capabilities
  • Strong non-English language support
  • Competitive pricing for volume
  • Consider Claude for quality-critical content

Best Practices

Model Selection Best Practices
  1. Start with requirements - Define hard constraints before evaluating models
  2. Benchmark with real data - Don't rely on public benchmarks alone
  3. Use Bedrock Evaluations - Leverage built-in tools for comparison
  4. Consider total cost - Include development, optimization, and operational costs
  5. Plan for fallbacks - Design multi-model architectures for resilience
  6. Monitor in production - Continuously evaluate quality and cost metrics
  7. Stay current - New models launch frequently; re-evaluate periodically

Common Exam Scenarios

Exam Scenarios and Solutions

ScenarioKey FactorRecommended Approach
Real-time chat with <500ms latencyLatency constraintClaude 3 Haiku or Llama 3 8B
Process 100-page documentsContext windowClaude 3.5 Sonnet or Nova Pro (200K+)
Cost-sensitive batch processingCost optimizationSmaller model + batch inference (50% savings)
Healthcare application with PHIHIPAA complianceVPC endpoints, encryption, audit logging
Variable complexity queriesMixed workloadIntelligent prompt routing to multiple models
EU data residency requirementComplianceDeploy in EU region with compliant models

Common Pitfalls

Pitfall 1: Choosing Based on Public Benchmarks Only

Mistake: Selecting a model because it ranks highest on public benchmarks.

Why it's wrong: Benchmarks test generic capabilities; your use case may differ significantly.

Correct Approach:

  • Use benchmarks for initial filtering only
  • Always evaluate with your actual data and prompts
  • Conduct A/B testing with real users when possible
  • Use Bedrock Evaluations for systematic comparison
Pitfall 2: Ignoring Total Cost of Ownership

Mistake: Focusing only on per-token pricing when comparing models.

Why it's wrong: Token efficiency, prompt optimization costs, and operational overhead vary.

Correct Approach:

  • Calculate cost at projected scale and volume
  • Factor in prompt engineering effort per model
  • Consider provisioned vs on-demand based on patterns
  • Include monitoring and optimization costs
Pitfall 3: Over-Engineering for Future Requirements

Mistake: Selecting the largest, most capable model "just in case."

Why it's wrong: Wastes budget, increases latency, and delays deployment.

Correct Approach:

  • Start with the smallest model that meets current needs
  • Implement model routing for upgrade path
  • Design architecture to swap models easily
  • Scale up based on measured needs

Test Your Knowledge

Q

A company needs to build a chatbot with sub-second response times and handle 50,000 queries daily. Which approach should they take for model selection?

ASelect Claude 3 Opus for best quality
BUse Bedrock Evaluations to compare fast models like Haiku and Llama 3 8B
CChoose the model with the largest context window
DUse batch inference for all queries
Q

What is the PRIMARY benefit of Amazon Bedrock's Intelligent Prompt Routing feature?

AIncreases model context windows
BReduces costs by up to 30% by routing queries to appropriate models
CEliminates the need for prompt engineering
DProvides automatic model fine-tuning
Q

An organization must process sensitive healthcare data with PHI. What model selection consideration is MOST important?

AChoosing the model with lowest latency
BEnsuring the model has the largest context window
CUsing VPC endpoints, encryption, and ensuring HIPAA compliance
DSelecting the newest model available


Quick Reference

Model Selection Checklist

TEXTModel Selection Checklist
□ Define functional requirements (task, modality, languages)
□ Define non-functional requirements (latency, throughput, budget)
□ Identify compliance requirements (HIPAA, GDPR, etc.)
□ Filter models using Bedrock Model Information API
□ Shortlist 3-7 candidate models
□ Evaluate with actual data using Bedrock Evaluations
□ Calculate costs at projected scale
□ Verify regional availability
□ Test integration and API compatibility
□ Document selection rationale for audit

Cost Optimization Strategies

Cost Reduction Strategies

StrategySavings PotentialBest For
Batch inferenceUp to 50%Non-real-time bulk processing
Intelligent routingUp to 30%Mixed complexity workloads
Model distillation40-70%Specific use cases at scale
Prompt optimization10-30%All workloads
Response cachingVariableRepetitive queries

Further Reading

Related AWS Services

Amazon Bedrock