How to Install Apache Spark on Ubuntu VM

How to Install Apache Spark on Ubuntu VM

Before we proceed with the installation make sure you have Java installed on your Ubuntu system because Apache Spark requires Java. Check if Java is installed by running the following command:

java -version

If Java is not installed, use the following command to install :

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Once you have installed Java , you can proceed with installing Apache Spark:

Download Apache Spark:

Visit the Apache Spark Download page and select the your required version Copy the download link for the pre-built Spark package that is compatible with your Hadoop version. For example, you can choose “Pre-built for Apache Hadoop 2.7 and later.”

Use wget to download Spark. Replace <URL> with the actual download link which you copied from the Apache spark page.:

wget <URL>

Extract the Spark Archive:

Once the download is complete, you can extract the Spark archive using the following command:
Replace the <version> with the version you have downloaded.

tar -xvf spark-<version>-bin-hadoop2.7.tgz

Move Spark to a Desired Location:

You can move the extracted Spark directory to a location of your choice. For example, you can move it to the /opt directory:

sudo mv spark-<version>-bin-hadoop2.7 /opt/spark

Set Environment Variables:

You need to add Spark’s bin directory to your PATH and set the SPARK_HOME environment variable. To do this, open your .bashrc or .zshrc file in a text editor:

nano ~/.bashrc

Add the following lines at the end of the file:

export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH

Save the file and exit the text editor. Then, run the following command to apply the changes:

source ~/.bashrc

Verify the Installation:

You can verify the installation by running the following command, which should display Spark’s version information:

spark-shell --version

Optional: Start Spark Cluster (Standalone Mode): If you want to run Spark in standalone mode, you can start the Spark master and worker processes using the following commands:

start-master.sh
start-worker.sh

You can access the Spark web UI by opening a web browser and navigating to http://localhost:8080.
Replace localhost with your IP address, you will see the page like below.

How to Install Apache Spark on Ubuntu

Now you can now use Spark for distributed data processing and analytics.

Also See : How to Install Apache Airflow on Ubuntu VM

Leave a Reply

Your email address will not be published. Required fields are marked *