Barrier Execution Mode in Spark 3.0 - Part 1 : Introduction

Barrier execution mode is a new execution mode added to spark in 3.0 version. This marks a significant change to platform which had only supported Map/Reduce based execution till now. This will allow spark to diversify the kind of workloads it can support on it’s platform.

In this series posts we will discuss about this execution mode in detail.This is the first post in the series. In this post we will discuss about what is barrier execution mode and why it is needed. You can access all the posts in this series here.

Execution Mode

An execution mode in spark is way of executing the jobs in platform. The mode will dictate how jobs are divided into multiple parallel tasks and how they are scheduled. The execution mode defines what kind of processing can be handled in platform.

Map/Reduce been a popular execution mode in majority of big data frameworks including spark. This execution mode is flexible enough to handle wide variety workloads like ETL, SQL and ML etc.

Map/Reduce Execution Mode

In this section of the post we will look into the Map/Reduce from a execution point of view. Understanding this will help us how its different from the barrier execution mode.

In Map/ Reduce

A job is collection of stages. Each stage can be Map or Reduce. Between these stages there will usually be shuffling.
Each stage is collection of tasks. These tasks are independent of each other. This approach is called shared nothing. This allows system to scale as more resources are available.
As the tasks are independent of each other, when one of the tasks is failed only that task is retried.
Number of tasks in Map task is determined by amount of data and number of tasks in reduce phase is determined by developer

The above points summarises the Map/Reduce approach in very high level. Even though there are many implementation details, this information is enough for our discussion.

Need for New Execution Mode

Map/Reduce execution mode has served well for many years for different workloads. Why we need different execution mode now?

One of the reasons is to support deep learning frameworks on spark. Deep learning frameworks don’t lend themselves to Map/Reduce model. They work well with other kind of execution model called MPI ( Message Passing Interface). For example, Horovod, an open source framework to do deep learning on scale by Uber, uses the MPI to implement the distributed deep learning for variety of DL frameworks. You can learn more here.

In order to support the deep learning natively, spark need to support an execution model that is different than Map/Reduce. The new execution Model is modeled after MPI