Microsoft Fabric includes three powerful tools for data processing: Dataflow Gen2, Data Pipeline, and Notebook. The goal of this post is to explain what each one does and where it shines, in plain language.
Dataflow Gen2, visual and easy data preparation
Dataflow Gen2 works like the Fabric version of Power Query. Data can be pulled from hundreds of sources and transformed through a drag-and-drop interface. It fits well for teams that prefer not to write code, or when a quick first-pass analysis is needed.
With Copilot integration, data preparation can now be done using natural language, for example: “bring only European customers.” On the performance side, the Spark engine keeps things smooth even with larger datasets.
Data Pipeline, the process orchestrator
Data Pipeline is Fabric’s orchestration tool. It is used to move data from different sources, run steps in sequence, and handle error scenarios.
It has a partly low-code, partly script-friendly approach. That means both drag-and-drop users and those who want to intervene with JSON or code can work comfortably.
It is typically used to trigger Dataflows or Notebooks in a specific order.
Notebook, for code-first data processing
Notebooks are the most flexible tool for data engineers and data scientists. With a Spark-based foundation, they are ideal for working with large data and performing advanced transformations.
Python, SQL, Scala, or R can be used. For data wrangling, ML model preparation, or complex join logic, they offer more control than the other tools.
However, more technical knowledge is required. If writing code is comfortable, the limits here are practically very far.
Which tool when?
These three are really different links in the same chain. For basic data cleansing, Dataflow Gen2 is a good fit, for managing workflows, Pipeline, and for complex transformations or modeling, Notebook. There is no single “correct” choice, the scenario determines the decision.
| Feature | Dataflow Gen2 | Data Pipeline | Notebook |
|---|---|---|---|
| Code Requirement | Low-code/no-code, visual, Power Query based | Low-code plus code-based activities possible | Requires code (Python, Spark, Scala) |
| Transformation Capability | Built-in transforms, cleansing, enrichment, denormalization | Complex ETL, multi-step workflows, conditional activities | Any level, custom algorithms, ML, advanced analytics |
| Automation/Orchestration | Limited (mostly source-to-target, basic scheduling) | Rich orchestration, scheduler, error handling, triggers | Can be integrated into pipelines, code-driven automation |
| Performance | Parallel batch processing via Spark | Large data movement, strong fault tolerance | Big data processing, advanced statistics and ML |
| Targets/Sources | Lakehouse, warehouse, broad connector support | Multiple sources/targets (files, APIs, databases, etc.) | Lakehouse, Parquet, Delta, external data sources |
| Primary Use | Data preparation, cleansing, pre-analytics setup | Data movement, workflow management, automation | Data exploration, advanced transformation, ML and analysis |
| Monitoring and Error Handling | Basic, lineage and dataflow tracking | Detailed, step-by-step error handling and alerting | Manual monitoring and logging in code |
Important
Dataflow Gen2, Data Pipeline, and Notebook are not isolated tools, they work together as different parts of the same solution. The best outcome typically comes from an end-to-end data flow where these tools are used in sequence.
In the ELT approach commonly used for data lakehouses, the Pipeline forms the backbone. It orchestrates multi-step workflows with scheduling, error handling, and retry mechanisms.
In this setup, data ingestion is often done via copy activities, data from different sources is written into the Bronze layer. Then the data landed in Bronze is transformed and promoted to Silver and Gold using Notebooks.