How to Install Apache Airflow on Ubuntu VM
Update the Packages
We need to update the packages before we install the dependencies
sudo apt update
Install Dependencies
We need to install the necessary dependency packages
sudo apt install -y python3 python3-pip python3-venv
Create a Virtual Environment (Optional but recommended):
This is an optional but recommended to create virtual environment to have isolation and different Version Compatibility.
python3 -m venv ~/airflow_venv
source ~/airflow_venv/bin/activate
Install Apache Airflow
Use the below command to install Apache Airflow using pip
pip install apache-airflow
If you see any error like pip not found, Use the following command to install pip for Python 3:
sudo apt install python3-pip
Once the installation is completed, verify the installation by checking the pip version:
pip3 --version
Initialize Airflow Database:
Initialize the Airflow metadata database:
airflow db init
Start Airflow Web Server and Scheduler:
airflow webserver --port 8080
airflow scheduler
How to Access Apache Airflow from Browser :
Open any of a web browser like Google Chrome, Mozilla etc and go to http://localhost:8080
to access Airflow Web Interface, if you have installed the Airflow on Ubuntu VM then use the external https://ipaddresss:8080 to access Apache Airlflow Web Interface.
Below image is the reference.
Create a DAG (Directed Acyclic Graph):
Create a directory as dags in airflow directory and Define your workflow as a DAG by creating a Python script in the ~/airflow/dags/ directory.
Run Your DAG :
If you have defined in a DAG, it will run automatically as per schedule or you can trigger the DAG manually from the Apache Airflow interface.
By following the above steps you will be able to know How to Install Apache Airflow on Ubuntu and create DAG and schedule, but to run parallel jobs you need to change the Executor in Apache Airflow configurations.
Please refer the below link for setting up the parallel jobs execution by using MySQL Database.
How to Configure MySQL with Apache Airflow for Parallel Jobs Execution
It’s a good idea to refer to the latest documentation for the most accurate instructions. The official Apache Airflow Documentation for more advanced features and setup options.
Also See : How to Install Apache Airflow on Ubuntu VM
2 Comments
Add a Comment