Scalable Spark Deployment using Kubernetes - Part 7 : Dynamic Scaling and Namespaces
In our last post we created two node spark cluster using kubernetes. Once we have defined and created the cluster we can easily scale up or scale down using kubernetes. This elastic nature of kubernetes makes easy to scale the infrastructure as and when the demand increases rather than setting up everything upfront.
In this seventh blog of the series, we will discuss how to scale the spark cluster on kubernetes. You can access all the posts in the series here.
Dynamic Scaling
When we discussed deployment abstraction in our previous blog, we talked about replica factor. In deployment configuration, we can specify the number of replications we need for a given pod. This number is set to 1 in our current spark worker deployment.
One of the nice thing about deployment abstraction is, we can change replica size dynamically without changing configuration. This allows us to scale our spark cluster dynamically.
Scale Up
Run below command to scale up workers from 1 to 2.
The above command takes deployment name as parameters and number of replicas. You can check results using
When you run the above command, kubernetes creates more pods using template specified in spark-worker. Whenever these pods come up they automatically connect to spark-master and scales the cluster.
Scale Down
We can not only increase the workers, we can also scale down by setting lower replica numbers.
When above command executes, kubernetes will kill one of the worker to reduce the replica count to 1.
Kubernetes automatically manages all the service related changes. So whenever we scale workers spark will automatically scale.
Multiple Clusters
Till now, we have run single cluster. But sometime we may want to run multiple clusters on same kubernetes cluster. If we try to run same configurations twice like below
You will get below error
Kubernetes is rejecting the request as the spark-master named deployment is already exist. One of the way to solve this issue is to duplicate the configurations with different name. But it will be tedious and difficult to maintain.
Better way to solve this issue to use namespace abstraction of kubernetes.
Namespace Abstraction
Kubernetes allows users to create multiple virtual clusters on single physical cluster. These are called as namespaces.
Namespace abstraction is used for allowing multiple users to share the same physical cluster. This abstraction gives scopes for names. This makes us to have same named services in different namespace.
By default our cluster is running in a namespace called default. In next section, we will create another namespace where we can run one more single node cluster.
Creating Namespace
In order to create new cluster, first we need to cluster new namespace. Run below command to create namespace called cluster2.
You can list all the namespaces using below command
You should see the below result
kube-system is the namespace in which all the kubernetes related pods run.
Setting Context
By default, kubectl points to default namespace. We should change it to point to other one to create pods in our namespace. We can do it using changing the context variable.
Run below command to change the context
In the first step, we get CONTEXT variable. In the next command, we set namespace to cluster2.
Creating cluster in Namespace
Once we set the context, we can use same commands to create cluster. Let’s run below the command
Now our second cluster is started. We can see all the pods across the namespaces using below command
You should see some result something like below
As you can observe from the result, there are multiple spark-master running in different namespaces.
So using the namespace abstraction of kubernetes we can create multiple spark clusters on same kubernetes cluster.
Conclusion
In this blog we discussed how to scale our clusters using kubernetes deployment abstraction. Also we discussed how to use namespace abstraction to create multiple clusters.
What’s Next?
Whenever we run services on kubernetes we may want to restrict their resource usage. This allows better infrastructure planning and monitoring. In next blog, we will discuss about resource management on kubernetes.