Coursera

Data Preparation & Infrastructure

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Coursera

Data Preparation & Infrastructure

John Whitworth
ansrsource instructors

Instructors: John Whitworth

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

7 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Clean and normalize campaign and CRM marketing datasets.

  • Write SQL queries to extract and join marketing data.

  • Validate data quality across analytics and ad platforms.

  • Profile datasets to identify reporting inconsistencies and gaps.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

18 assignments¹

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 8 modules in this course

This module focuses on the cleaning routines required to make marketing datasets reliable for analysis. Learners examine how inconsistent UTM tagging, fragmented channel labels, inconsistent case, whitespace, and naming conventions distort attribution and reporting. The module covers string normalization, duplicate detection, normalization and deduplication, and industry-standard conventions for utm_source, utm_medium, and utm_campaign fields. Learners also explore pipeline duplicates, tracking misfires, and manual-entry duplication. An AI-first workflow demonstrates how analysts can use AI tools to generate cleaning scripts while maintaining responsibility for validation and quality control. In the guided lab, learners apply TRIM and LOWER functions, create cleaned columns, remove duplicate records, and validate outputs against a reference file.

What's included

3 videos2 readings2 assignments

This module teaches learners how to validate and reconcile conversion data across analytics platforms, ad platforms, and systems of record. Learners examine why discrepancies occur between GA4, CRM, order -management systems, and ad platforms, including attribution windows, cookie -consent limitations, client-side pixels, server-side tracking, and modeled conversions. The module emphasizes establishing a source of truth based on reporting objectives and business context. Learners use validation scripts to compare records, flag variance thresholds, standardize dates, calculate variance percentages, identify outliers, and document discrepancies. AI-assisted workflows support script generation while reinforcing review of join logic, variance calculations, and validation steps. In the hands-on lab, learners build comparison tables, calculate variances, flag inconsistencies, and recommend a source of truth.

What's included

1 video2 readings3 assignments

Learn how to bridge the gap between "clicks" and "customers." This module teaches you how to write SQL joins that link website session data to CRM revenue records.

What's included

2 videos2 readings2 assignments

Big data requires smart queries. You will learn to refine complex SQL to run faster and ensure your aggregation logic correctly counts marketing events without double-counting.

What's included

1 video2 readings3 assignments

Learn the math behind data health. You will use profiling techniques to quantify how much of your marketing data is missing or duplicated.

What's included

3 videos2 readings2 assignments

Move from finding problems to solving them. Learn how to interpret profiling reports to decide which data issues need immediate fixing and which can wait.

What's included

2 videos2 readings3 assignments

Accelerate your data cleaning and querying workflow using Generative AI. You will learn how to use LLMs to generate complex SQL joins, debug cleaning scripts, and automate the normalization of messy marketing data.

What's included

2 videos1 reading2 assignments

Put your data infrastructure skills to the test. In this project, you will perform a full data audit and cleaning protocol on a multi-channel campaign dataset, using SQL and profiling techniques to transform raw exports into a high-quality analysis-ready dataset.

What's included

3 readings1 assignment

Instructors

John Whitworth
30 Courses3,567 learners
ansrsource instructors
245 Courses17,473 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.