Pipeline Orchestration¶
Orchestrating data pipelines can be complex but with pipeline-flow, its made easy. By using a simple configuration file,
you define the sequence of steps and their dependencies, allowing you to focus on building reliable, efficient workflows.
Pipeline phases: From Raw Data to Insights¶
Every data pipeline is a journey, moving data through various stages of processing.
In pipeline-flow, each phase represents a key step in this process, ensuring that data flows smoothly from source to destination.
A pipeline consists of multiple phases, each representing a stage in the data processing workflow.
The typical phases in a pipeline are:
Extract: Retrieve data from various sources (e.g., databases, APIs, files).
Transform: Process, clean, and enrich the extracted data.
Load: Deliver the processed data to its target destination (e.g., data warehouses, files).
Transform at Load: Perform additional transformations on the external system for further processing (Type 2 SCD, etc.).
Execution Flow¶
pipeline-flow ensures a balanced approach between speed and data integrity through well-defined execution rules.
Phases execution in a defined sequence
Each phase runs in order, ensuring the output of one phase seamlessly feeds into the next.
This preserves data integrity and ensures that all dependencies between phases are respected.
Pipelines can run concurrently or sequnetially
Multiple pipelines can execute asynchronously for faster processing, or sequentially if there are dependencies between them that require ordered execution.
Phases contain multiple steps
A single phase can include multiple steps, with each step representing a distinct task or operation in the workflow.
Execution mode depends on the phase type
Asychronous execution is used where the speed is critical (e.g., Extract and Load phases).
Sychronous execution ensures acccuracy and consistency (e.g., Transform and Transform at Load phases).
Pipeline Phase |
Type |
Execution Mode |
|---|---|---|
Extract |
Async |
Runs conncurrently. Plugins must be asynchronous. |
Transform |
Sync |
Runs sequentially. Plugins must be synchronous. |
Load |
Async |
Runs concurrently. Plugins must be asynchronous. |
Transform at Load |
Sync |
Runs sequentially. Plugins must be synchronous. |
- Why are some phases asynchronous?
Extract and Load involve I/O operations (database queries, API calls, etc.)
Using async execution prevents blocking the pipeline.
- Why are some phases synchronous?
Each transformation depends on the previous step.
Running them out of order would lead to incorrect results.
Next Steps¶
Explore the User Guide to learn more about the Plugin Development process.