How to Install Apache Spark on Ubuntu VM
Before we proceed with the installation make sure you have Java installed on your Ubuntu system because Apache Spark requires Java. Check if Java is installed by running the following command:
java -version
If Java is not installed, use the following command to install :
sudo apt-get update sudo apt-get install openjdk-8-jdk
Once you have installed Java , you can proceed with installing Apache Spark:
Download Apache Spark:
Visit the Apache Spark Download page and select the your required version Copy the download link for the pre-built Spark package that is compatible with your Hadoop version. For example, you can choose “Pre-built for Apache Hadoop 2.7 and later.”
Use wget
to download Spark. Replace <URL>
with the actual download link which you copied from the Apache spark page.:
wget <URL>
Extract the Spark Archive:
Once the download is complete, you can extract the Spark archive using the following command:
Replace the <version> with the version you have downloaded.
tar -xvf spark-<version>-bin-hadoop2.7.tgz
Move Spark to a Desired Location:
You can move the extracted Spark directory to a location of your choice. For example, you can move it to the /opt
directory:
sudo mv spark-<version>-bin-hadoop2.7 /opt/spark
Set Environment Variables:
You need to add Spark’s bin
directory to your PATH
and set the SPARK_HOME
environment variable. To do this, open your .bashrc
or .zshrc
file in a text editor:
nano ~/.bashrc
Add the following lines at the end of the file:
export SPARK_HOME=/opt/spark export PATH=$SPARK_HOME/bin:$PATH
Save the file and exit the text editor. Then, run the following command to apply the changes:
source ~/.bashrc
Verify the Installation:
You can verify the installation by running the following command, which should display Spark’s version information:
spark-shell --version
Optional: Start Spark Cluster (Standalone Mode): If you want to run Spark in standalone mode, you can start the Spark master and worker processes using the following commands:
start-master.sh start-worker.sh
You can access the Spark web UI by opening a web browser and navigating to http://localhost:8080
.
Replace localhost with your IP address, you will see the page like below.
Now you can now use Spark for distributed data processing and analytics.
Also See : How to Install Apache Airflow on Ubuntu VM