Data Pipelines with Apache Airflow
Audiobook & Ebook

Data Pipelines with Apache Airflow by Julian de Ruiter | Free Audiobook

By Julian de Ruiter

Narrated by Julie Brierley

🎧 10 hours and 18 minutes 📘 Manning Publications 📅 November 22, 2021 🌐 English
🎧 Listen Free on Audible 📖 Read on Kindle

Free 30-day trial · Cancel anytime

About This Audiobook

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines.

Summary

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, and keeping processes along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, this book teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.

About the Technology

Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. It features easy-to-use UI, plug-and-play options, and flexible Python scripting.

About the Book

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.

What’s Inside

Build, test, and deploy Airflow pipelines as DAGs
Automate moving and transforming data
Analyze historical datasets using backfilling
Develop custom components
Set up Airflow in production environments

About the Audience

For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills.

About the Authors

Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer.

PLEASE NOTE: When you purchase this title, the accompanying PDF will be available in your Audible Library along with the audio.

🎧 Listen Free on Audible

Free 30-day trial · Cancel anytime

Quick Take

  • Narration: Julie Brierley delivers a technically precise and clear reading of demanding material, she handles DAG terminology, Python syntax references, and Airflow-specific vocabulary without stumbling, which is no small achievement for a ten-hour engineering book.
  • Themes: Data pipeline orchestration, workflow automation, production data engineering
  • Mood: Dense and methodical, with a practitioner’s confidence that rewards patient listening
  • Verdict: One of the stronger audiobook treatments of a DevOps/data engineering framework, the PDF companion is essential, but Brierley’s narration and the book’s careful build-up make it genuinely usable in audio format.

I was walking a long stretch of the Canal Saint-Martin on a Saturday morning when I first queued up Data Pipelines with Apache Airflow. I chose it partly because I had been meaning to understand Airflow properly for a while and partly because a data engineer friend had described it as the book that finally made DAGs click for him. It is not the kind of book you put on for background company. By the time I reached the canal’s first lock, I had replayed a section on scheduling semantics twice and was genuinely glad I hadn’t tried to listen while doing anything that required divided attention.

Bas Harenslak and Julian de Ruiter are data engineers with extensive production Airflow experience; Harenslak is an Airflow committer. That credential matters more than it might in other technical domains. Airflow has a reputation for counterintuitive behavior in production environments, the catch_up parameter, execution date semantics, and XCom limitations have surprised many engineers who understood the documentation but not its practical implications. A book written by someone who has committed code to the project carries a different weight than one written by someone who has used it from the outside.

The Build-Up Structure That Makes Complex Material Accessible

Reviewers consistently describe the book as building up piece by piece with clear explanation at every step, and that framing is accurate. Harenslak and de Ruiter begin with the simplest possible pipeline, a single operator, a single task, a single DAG, and add complexity incrementally. By the time you reach the chapters on dynamic DAG generation, custom operators, and cloud deployment, you have accumulated the conceptual vocabulary to process what’s being described without losing the thread. This is harder to execute than it sounds in a technical book covering a framework with many moving parts.

The directed acyclic graph concept, which is the structural foundation of everything in Airflow, gets careful attention early. Understanding why Airflow represents workflows as DAGs rather than allowing cycles, and what constraints that imposes on how you design your pipelines, is prerequisite knowledge for everything that follows. The authors take the time to ground this properly rather than assuming it, and that decision pays dividends in the later chapters.

Narration That Earns Its Keep

Julie Brierley’s performance on this material deserves specific mention. Narrating a data engineering textbook is a genuinely difficult assignment, the vocabulary includes terms like directed acyclic graph, backfilling, Kubernetes executor, and XCom, none of which have conventional pronunciation anchors for a narrator who isn’t already embedded in the community. Brierley handles this material with a clarity and confidence that suggests thorough preparation. She also manages the transition between explanatory prose and more technical sections without losing the instructional tone that keeps the listening accessible.

One reviewer described being able to read the first two chapters and be ready to go for fundamentals, with the rest of the book building from that base. That characterization is fair, and it suggests an effective listening strategy: the early chapters can function as an orientation layer for the full runtime that follows.

The PDF Companion and Production Realities

The PDF companion is included with this Audible title and is genuinely indispensable. Airflow DAG code benefits enormously from visual representation, and the book includes numerous code examples that illustrate specific operator configurations, custom sensor logic, and deployment patterns. Following these purely in audio is possible but significantly more taxing than having the code visible. The book has a good structural balance between conceptual explanation and code illustration, which means the audio component carries the conceptual load effectively even when the code requires the PDF.

Who Should Listen, Who Should Skip

The audience specification in the book is precise: DevOps engineers, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. That’s the correct audience. Someone coming to Airflow without Python familiarity will struggle with the later chapters. Someone already expert in Airflow will know most of the material but may still find value in the production and deployment sections. The ten-hour runtime is appropriate for the depth of coverage, and the 4.4 rating across seventy-four reviews reflects a technical audience that found it genuinely useful rather than just accessible.

Frequently Asked Questions

Do I need prior Airflow experience to follow this book in audio format?

No prior Airflow experience is required, but intermediate Python familiarity is assumed. The book is explicitly designed as an introduction to Airflow as well as a practical reference, and it builds from the simplest pipeline concepts through advanced production topics. The step-by-step structure means motivated beginners can follow it, though the production environment chapters will be more useful once you’ve worked with Airflow hands-on.

How important is the PDF companion for following the audio?

Very important, particularly for the code examples. The conceptual material, DAG structure, scheduling semantics, backfilling, operator types, translates to audio reasonably well. Code samples for custom operators, cloud integrations, and production configurations are significantly harder to follow without visual reference. Keep the PDF accessible during listening sessions that cover technical implementation chapters.

Is the Airflow version covered in this book still current?

The edition covers Airflow 2.x and addresses the provider model introduced in Airflow 2. Some implementation details and provider package names have evolved since publication, and the Airflow ecosystem changes relatively quickly. The conceptual foundations, DAG design patterns, and workflow orchestration principles remain valid. For specific operator APIs and cloud provider integrations, verify against the current Airflow documentation.

How does this compare to the official Apache Airflow documentation for learning purposes?

The official documentation is comprehensive but reference-oriented, it tells you what each component does rather than building a pedagogical progression through them. This book provides something the documentation doesn’t: a curated learning path that builds understanding sequentially, covers common pitfalls, and explains the why behind design decisions that the docs assume you already understand. For someone new to Airflow, the book provides better conceptual grounding even if the docs are ultimately a more complete reference.

Ready to listen?

🎧 Listen to Data Pipelines with Apache Airflow for free

Free 30-day trial · Cancel anytime

What Listeners Are Saying

★★★★★

An excellent resource for learning and using Airflow

This book is great. It builds up piece by piece and explains what is going on every step of the way. It shows you best practices and goes into great detail on relatively advanced topics, in addition to covering all the basics. The code examples can easily be adapted for…

– Evan Volgas
★★★★★

To the Point

This is the type of book where you can read the first two chapters and be good to go for fundamentals. The rest of the book is basically building up on what you learned. Such great instruction packed into a few papers. Probably one of the better written manuals for…

– Gino
★★★★★

A well written and thorough book on Airflow

A great book on Airflow, how operate it, configure it, interface with 3rd party systems (particularly cloud or db related). I particularly liked the emphasis on some counter-intuitive features to prevent beginners from wasting time on figuring a couple of tweaks for themselves.

– Daniel V.
★★★★★

A great guide to Airflow

This is a great guide to Airflow, covering the basics and advanced topics such as how to test dags and running tasks in containers. Highly recommended!

– CWC_NY
★★★★★

Great book

I’ve read a lot of CS books, this is in the top 5. It’s well written and full of domain knowledge.

– Chris Novitsky

Start Listening: Data Pipelines with Apache Airflow


Free 30-day trial · Cancel anytime

Alexandra Reed

Written by Alexandra Reed

Founder & Literary Critic