Scalable Spark Deployment using Kubernetes - Part 6 : Building Spark 2.0 Two Node Cluster
In last post, we have built spark 2.0 docker image. As a next step we will be building two node spark standalone cluster using that image. In the context of of kubernetes, node analogues to a container. So in the sixth blog of the series, we will be building two node cluster containing single master and single worker.You can access all the posts in the series here.
TL;DR you can access all the source code on github.
Spark Master Deployment
To start with we define our master using kubernetes deployment abstraction. As you can recall from earlier post, deployment abstraction is used for defining one or morepods. Even though we need single master in our cluster, we will use deployment abstraction over pod as it gives us more flexiblity.
The above yaml configuration shows the configuration for the master. The noteworthy pieces are
-
image - We are using the image we built in our last post. This is availble in local docker images.
-
imagePullPolicy - By default kubernetes tries to pull the image from remote servers like dockerhub. But as our image is only available locally, we need to tell to kubernetes not to pull from remote. imagePullPolicy property of configuration allows to us to control that. In our example, we say IfNotPresent , which means pull only if there is no local copy. As we already have built the image, it will be avaialble and kubernetes will not try to pull from remote.
-
ports - We are exposing port 7077 on which spark master will listen.
-
command - Command is the configuration which tells what command to run when container bootstraps. Here we are specifying it to run start-master script
You can access complete configuration on github.
Starting Spark Master
Once we have our configuration ready, we can start the spark master pod using below command
Spark Master Service
Once we have defined and ran the spark master, next step is to define the service for spark master. This service exposes the spark master on network and other workers can connect to it.
The above yaml configuration for spark master service. We are naming the our service also spark-master which helps in resolving proper hosts on cluster.
We are also exposing the additional port 8080 for accessing spark web ui.
You can access complete configuration on github.
Starting Spark Master Service
Once we have defined the master service, we can now start the service using below command.
Spark Worker Configuration
Once we have our spark master and it’s service started, we can define the worker configuration.
As we are building two node cluster, we will be running only single worker as of now. Most of the configuration are same as master other than command which starts the worker.
You can access complete configuration on github.
Starting Worker
You can start worker deployment using below command
Now we have all services are ready
Verifying the Setup
Run below command to verify that both spark master and spark worker deployments are started.
The above command should two pods running as below
Please note that exact name of the pod will differ from machine to machine.
Once we verified the pods, verify the service using below command
The above command should show result as below
If both of the commands ran successfully, then we have spark cluster running successfully.
Testing our spark cluster
We can test our spark deployment using observing web ui and running some commands from spark shell.
Accessing Web UI
In our configuration of spark master, we have exposed the UI port 8080. Normally it will be only available within spark cluster. But using the port forwarding, we can access the port on our local machine.
First let’s see the pods running on cluster using below command
It should show the below result
We should port forward from master pod. Run below command. The exact name of the pod will differ from machine to machine.
Port-forward takes two parameters. One is the pod name and then port pair. In port pair the first port is container port and next one is local.
Once port is forwarded, go to this link http://localhost:8080.
You should see the below image
Spark Shell
Once we have spark ui, we can test the spark from shell. Let’s run the spark shell from master container.
First we need to login to our master pod. Run below command
Start the spark shell using below command
Run below command to run some spark code
If the code runs successfully, then our cluster setup is working.
Conclusion
In this blog, we have succesfully built two node spark cluster using kubernetes absttractions.
What’s Next?
Now we have defined our barebone cluster. In next blog, we will how to scale the cluster using kubernetes tools. Also we will discuss how to do resource management in the cluster.