How to Install Apache Airflow on Ubuntu

How to Install Apache Airflow on Ubuntu VM

 

Update the Packages

We need to update the packages before we install the dependencies

sudo apt update

Install Dependencies

We need to install the necessary dependency packages

sudo apt install -y python3 python3-pip python3-venv

Create a Virtual Environment (Optional but recommended):

This is an optional but recommended to create virtual environment to have isolation and different Version Compatibility.

python3 -m venv ~/airflow_venv
source ~/airflow_venv/bin/activate

Install Apache Airflow

Use the below command to install Apache Airflow using pip

pip install apache-airflow

If you see any error like pip not found, Use the following command to install pip for Python 3:

sudo apt install python3-pip

Once the installation is completed, verify the installation by checking the pip version:

pip3 --version

Initialize Airflow Database:

Initialize the Airflow metadata database:

airflow db init

Start Airflow Web Server and Scheduler:

airflow webserver --port 8080
airflow scheduler

How to Access Apache Airflow from Browser :

Open any of a web browser like Google Chrome, Mozilla etc and go to http://localhost:8080 to access Airflow Web Interface, if you have installed the Airflow on Ubuntu VM then use the external https://ipaddresss:8080 to access Apache Airlflow Web Interface. 
Below image is the reference.

How to Install Apache Airflow on Ubuntu VM

Create a DAG (Directed Acyclic Graph):

Create a directory as dags in airflow directory and Define your workflow as a DAG by creating a Python script in the ~/airflow/dags/ directory.

Run Your DAG : 

If you have defined in a DAG, it will run automatically as per schedule or you can trigger the DAG manually from the Apache Airflow interface.

By following the above steps you will be able to know How to Install Apache Airflow on Ubuntu and create DAG and schedule, but to run parallel jobs you need to change the Executor in Apache Airflow configurations.

Please refer the below link for setting up the parallel jobs execution by using MySQL Database.

How to Configure MySQL with Apache Airflow for Parallel Jobs Execution

It’s a good idea to refer to the latest documentation for the most accurate instructions. The official Apache Airflow Documentation for more advanced features and setup options.

Also See : How to Install Apache Airflow on Ubuntu VM

2 Comments

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *