Azkaban is a scheduler for big data workloads like Hadoop, Spark. One of the differentiator of azkaban compared to other schedulers like oozie, airflow is it has good support for REST API to interact with scheduler problematically. This programmatic access is important for interactive applications.

In these series of blogs I will be discussing about setting up azkaban and using azkaban AJAX(REST) API.

This is the first post in series, where we discuss about setting up azkaban. In this post, we will be setting up azkaban 3.0.

Building Azkaban

Though azkaban provides binary downloads it is not up to date. So we will be getting latest code from the github in order to build azkaban 3.0.

The following are the steps to get code and build.

  • Clone code

git clone https://github.com/azkaban/azkaban.git
  • Build

./gradlew distZip
  • Copy from build

cp build/distributions/azkaban-solo-server-3.0.0.zip ~

Installing solo server

Azkaban supports different mode of executions like solo server, two server mode and multiple executor mode. Solo server is used for initial developments where as other ones are geared towards production scenarios. In this blog, we discuss about setting up solo server, for other modes refer azkaban documentation.

The below are the steps for installing.

  • Unzip

unzip ~/azkaban-solo-server-3.0.0.zip
cd ~/azkaban-solo-server-3.0.0
  • Starting solo server

bin/azkaban-solo-start.sh
  • Accessing log

tail -f logs/azkaban-execserver.log

Accessing web UI

Once azkaban solo server started, you can access at http://localhost:8081. By default username is azkaban and password is azkaban. You can change it in conf/azkaban-users.xml.

Now you have successfully installed azkaban server. In the next set of posts, we will explore how to use this installation to do scheduling.