.. _quick_start: Quick Start =========== This guilde will walk you through setting up ``pipeline-flow``, running your first pipeline and understand the basic workflow in less than 5 minutes! |:rocket:| Prerequisites ------------- Before installing ``pipeline-flow``, ensure you have the following dependencies installed: - Python 3.12 or later installed on your machine. - A Python Package Manager (e.g., pip or poetry). - Basic Knowledge of YAML (used for pipeline configuration). - Access to required data sources and data sinks (e.g., S3 Buckets, databases, APIs). To check your Python version installed, run: .. code:: bash python --version Installation ------------ ``pipeline-flow`` is available on `PyPI `_ and can be installed using pip or poetry. To install using pip, run: .. code:: bash pip install pipeline-flow # or better use poetry Setup --------------------------------------- After installation, add import two following dependencies to your Python script: .. code:: python >>> import asyncio # Required for running asynchronous coroutines >>> from pipeline_flow.entrypoint import start_orchestration # Required for running the pipeline If you are new to asychronous programming, no need to worrry. Please refer to the :ref:`Asynchronous Workflows Basics ` section for more information. Next, you need to define your pipeline configuration, using a YAML file, and then you can run your first pipeline! Please refer to the configuration template below to help you get started. Running Your First Pipeline ---------------------------- When you have your pipeline configuration ready, you can run your first pipeline using ``start_orchestration`` function already imported in the previous step. Another important thing to note is that you ``start_orchestration`` function accepts a stream argument. The ``stream`` argument can be one of the following: - A string containing the YAML configuration. - An object containing the YAML configuration. - A file object containing the YAML configuration. .. note:: Remember to replace the placeholders, such as ``YOUR_FILE_PATH_TO_PIPELINE.YAML``, with the actual path to your YAML configuration file. The ``start_orchestration`` function will read the YAML configuration from the provided stream, and then run the pipeline using the provided information. To run the pipeline, you can use the ``asyncio.run`` function to run the ``start_orchestration`` function. .. code:: python >>> import asyncio >>> open_local_file = open('YOUR_FILE_PATH_TO_PIPELINE.YAML') >>> asyncio.run(start_orchestration(stream=open_local_file)) Alternatively, you could start the pipeline within an asynchronous function. To do this, you would need to await the ``start_orchestration`` function. .. code:: python >>> import asyncio >>> async def some_async_func(local_file_path: str): >>> _ = await start_orchestration(local_file_path) >>> ... # Some other code here - You can put your code here, if needed. >>> >>> open_local_file = open('YOUR_FILE_PATH_TO_PIPELINE.YAML') >>> asyncio.run(some_async_func(stream=open_local_file)) Using string ``stream`` or object ``stream`` is just as convenient as using file objects. Here is a simple example of using a string stream with asyncio.run. .. code:: python >>> import asyncio >>> yaml_str_body = ''' >>> pipelines: >>> pipeline1: >>> type: ETL >>> phases: >>> ... # Rest of the pipeline configuration here >>> ''' >>> asyncio.run(start_orchestration(stream=yaml_str_body)) Configuration Template ----------------------- Setup a configuration file for your pipeline. Create a new YAML file (e.g., ``pipeline.yaml``) and define your pipeline steps in the following order: #. Define your custom or community plugins in the ``plugins`` section. #. Define your pipeline type (ETL, ELT or ETLT) in the ``pipelines`` section. #. Define the extract phase in the ``extract`` section. #. Define the transform phase in the ``transform`` section (if ETL or ETLT defined). #. Define the load phase in the ``load`` section. #. Define the transform at load phase in the ``transform_at_load`` section (içf ETLT defined). YAML Configuration Example: .. code:: yaml plugins: # Step 1. Define your plugins here (custom or community) custom: dirs: - /path/to/custom/plugins # Directory where the custom plugins are located # (enables importing multiple plugins at once) files: - /path/to/custom/plugins/custom_plugin.py # Or the file name where the custom plugin is defined community: # Or use community plugins (if available) - plugin_name1 - plugin_name2 pipelines: pipeline1: type: ... # Step 2. Define your pipeline type (ETL, ELT or ETLT) phases: extract: steps: - plugin: # Step 3. Define your extract phase transform: steps: - plugin: # Step 4. Define your transform phase (if ETL or ETLT defined load: steps: - plugin: # Step 5. Define your load phase transform_at_load: steps: - plugin: # Step 6. Define your transform at load phase (if ETLT defined) Next Steps ------------- - Explore the full documentation to learn more about the pipeline configuration and advanced features. - Check out the :ref:`Core Concepts ` to understand the core concepts behind ``pipeline-flow``. - Learn more about :ref:`Building Custom Plugins `. Happy orchestrating! |:rocket:|