What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Designing Production LLM Architectures

Ends soon: Grow your skills with Coursera Plus for $239/year (usually $399). Save now.

Designing Production LLM Architectures

This course is part of LLM Engineering That Works: Prompting, Tuning, and Retrieval Professional Certificate

Instructor: Professionals from the Industry

Included with

Learn more

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Compare synchronous and asynchronous architectures and apply 12-factor principles and container orchestration to deploy scalable microservices.
Analyze multi-region deployments, pinpoint latency bottlenecks, and design resilient architecture improvements via fault analysis.
Create Airflow DAGs to automate data workflows and analyze the impact of schema evolution on downstream processes and tests.
Analyze trade-offs between self-hosting models vs. managed APIs and evaluate proposed infrastructure for fault tolerance and cost.

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your Design and Product expertise

This course is part of the LLM Engineering That Works: Prompting, Tuning, and Retrieval Professional Certificate

When you enroll in this course, you'll also be enrolled in this Professional Certificate.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate from Coursera

There are 5 modules in this course

This course is for ML engineers, solutions architects, and senior developers who build robust infrastructure powering large language models. This course teaches you how to design, deploy, and maintain the complex, interconnected systems required for scalable, resilient, and cost-effective LLM applications in the real world.

You will learn to think like an architect, starting with foundational design choices. Using sequence diagrams and structured analysis, you will compare synchronous and asynchronous architectures and evaluate the critical trade-offs between self-hosting open-source models and using managed APIs, considering total cost of ownership, latency, and data privacy. The course then dives deep into building for resilience and scale, applying the 12-factor app methodology to design stateless, configurable microservices. You’ll learn to analyze multi-region deployment strategies for fault tolerance and to use container orchestration manifests like Helm to deploy scalable applications capable of handling production workloads. Finally, you’ll master the data backbone of your system by designing automated data pipelines with tools like Airflow and learning to manage the complexities of schema evolution.

This module empowers engineers and architects to master the "build vs. buy" decision for LLM applications through a structured, strategic lens. You will learn to design complex system architectures using sequence diagrams to evaluate synchronous and asynchronous processing, while comparing the trade-offs of self-hosted open-source models against managed APIs. By focusing on critical metrics like Total Cost of Ownership (TCO), latency, and data privacy, you will develop the expertise to justify architectural choices. Ultimately, you'll gain the confidence to document and defend high-performance, business-aligned AI solutions to any stakeholder.

What's included

4 videos2 readings3 assignments

4 videosTotal 38 minutes

The Cost of Ambiguity8 minutes
Building Sequence Diagrams Step-by-Step9 minutes
The Build vs. Buy Dilemma9 minutes
A Practical Guide to TCO Calculation12 minutes

2 readingsTotal 24 minutes

Synchronous vs. Asynchronous Architectures12 minutes
The Deployment Decision Matrix12 minutes

3 assignmentsTotal 55 minutes

Architectural Decision Record (ADR)30 minutes
Hands-On Learning: Diagram an LLM-Powered Workflow15 minutes
Hands-On Learning: Calculate the TCO for Your LLM10 minutes

This module explores building resilient, scalable architectures for LLM applications. You will apply 12-factor app methodology to design portable, cloud-native microservices, mastering stateless design and dependency management. The curriculum bridges theory and practice by evaluating multi-region deployment strategies for fault tolerance and high availability. You'll learn to analyze failover mechanisms and mitigate architectural risks before production. By the end, you’ll be equipped to document reliable, future-proof AI systems. Prerequisites include a foundational understanding of cloud concepts (regions/zones) and microservice basics (containers/APIs).

What's included

1 video1 reading3 assignments

This module teaches how to transition LLM prototypes into production-grade services. You will learn to analyze multi-stage architectures like RAG to identify and quantify performance bottlenecks using evidence-based metrics. The curriculum focuses on mastering Kubernetes deployment through declarative Helm charts and implementing Horizontal Pod Autoscaling (HPA) to manage unpredictable traffic. By studying deployment lifecycles, including controlled rollouts and rapid rollbacks, you will gain the skills to transform fragile prototypes into resilient, scalable, and reliable production systems capable of handling real-world loads.

What's included

5 videos5 readings6 assignments

5 videosTotal 20 minutes

Why Performance is a Pipeline Problem4 minutes
How to Trace a Request and Spot Bottlenecks3 minutes
How to Quantify Latency from Logs4 minutes
Why Prototypes Fail in Production4 minutes
How to Write a Helm Chart with Autoscaling6 minutes

5 readingsTotal 24 minutes

Deconstructing a RAG Architecture5 minutes
Evidence Replaces Assumption: The Power of Profiling4 minutes
Interpreting Performance Dashboards5 minutes
Declarative Deployments with Helm and Kubernetes4 minutes
Anatomy of a Production Helm Chart6 minutes

6 assignmentsTotal 67 minutes

Scalable LLM Deployment Portfolio20 minutes
Hands-On Learning: Analyze the Architecture Diagram10 minutes
Scenario-Based Question: Architectural Analysis10 minutes
Hands-On Learning: Analyzing Production Logs to Identify Performance Bottlenecks10 minutes
Evidence-Based Performance Tuning Quiz10 minutes
Hands-On Learning: Review and Correct the Helm Manifest7 minutes

In today's dynamic data landscape, pipelines often break when source data structures change unexpectedly—a problem known as schema drift. This module tackles that challenge head-on, teaching you how to design and automate data pipelines that can gracefully handle schema evolution using Apache Airflow. By the end, you will be equipped to create resilient, scalable, and fully automated data pipelines that are built to withstand the complexities of real-world data environments.

What's included

5 videos5 readings7 assignments

5 videosTotal 23 minutes

Coding and Scheduling Your First DAG5 minutes
The Silent Pipeline Killer: Schema Drift4 minutes
Writing and Adapting dbt Tests5 minutes
When a Tree Falls: The Danger of Silent Failures3 minutes
Building-In Failure Alerts6 minutes

5 readingsTotal 21 minutes

The Core Components of Airflow5 minutes
How-To: Managing Connections and Variables4 minutes
Understanding Schema Drift and Data Lineage3 minutes
How-To: Documenting and Communicating Schema Changes4 minutes
Designing for Observability5 minutes

7 assignmentsTotal 82 minutes

Building a Resilient and Monitored Pipeline30 minutes
Hands-On Learning: Automating an Article Processing Workflow10 minutes
Knowledge Check: Airflow Fundamentals5 minutes
Hands-On Learning: Handling Schema Evolution with dbt Testing12 minutes
Knowledge Check: Schema Impact5 minutes
Hands-On Learning: Enhancing Your DAG with Monitoring and Alerting15 minutes
Knowledge Check: Monitoring Concepts5 minutes

In the module, you will step into the high-stakes role of a senior systems engineer tasked with diagnosing a failing AI service. A critical Retrieval-Augmented Generation (RAG) system is plagued by high latency and intermittent outages, and you must get to the root of the problem. Using architectural diagrams, system logs, and performance metrics, you will analyze the system’s design to identify the primary performance bottleneck and the most significant single point of failure. Your analysis will culminate in a concise, two-paragraph report for stakeholders, pinpointing the critical issues and recommending targeted fixes to restore stability and performance.

What's included

2 readings1 assignment

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Professionals from the Industry

424 Courses61,663 learners

Offered by

Coursera

Explore more from Design and Product

Coursera
Advancing Your Career in Production AI
Course
Coursera
Evaluating LLM Performance and Efficiency
Course
Coursera
Building Reliable LLM Systems
Course
Coursera
Testing and Refining LLM Applications
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

This course assumes hands-on experience with cloud concepts and containers. If you are new to cloud platforms or Kubernetes, first complete a foundational cloud or container course to gain the most from these modules.

You will work with sequence diagrams, Kubernetes manifests and Helm charts, Airflow DAGs, and cloud deployment patterns. Labs use common cloud and orchestration tooling; no proprietary vendor lock-in is required.

The course provides structured analysis and decision criteria—latency, total cost of ownership, data privacy, and operational complexity—so you can compare options and make informed architecture choices for your use case.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.