Azkaban is a scheduler for big data workloads like Hadoop, Spark. One of the differentiator of azkaban compared to other schedulers like oozie, airflow is it has good support for REST API to interact with scheduler problematically. This programmatic access is important for interactive applications.
In these series of blogs I will be discussing about setting up azkaban and using azkaban AJAX(REST) API.
This is the first post in series, where we discuss about setting up azkaban. In this post, we will be setting up azkaban 3.0.
Though azkaban provides binary downloads it is not up to date. So we will be getting latest code from the github in order to build azkaban 3.0.
The following are the steps to get code and build.
git clone https://github.com/azkaban/azkaban.git
Copy from build
cp build/distributions/azkaban-solo-server-3.0.0.zip ~
Installing solo server
Azkaban supports different mode of executions like solo server, two server mode and multiple executor mode. Solo server is used for initial developments where as other ones are geared towards production scenarios. In this blog, we discuss about setting up solo server, for other modes refer azkaban documentation.
The below are the steps for installing.
unzip ~/azkaban-solo-server-3.0.0.zip cd ~/azkaban-solo-server-3.0.0
Starting solo server
tail -f logs/azkaban-execserver.log
Accessing web UI
Once azkaban solo server started, you can access at http://localhost:8081. By default username is azkaban and password is azkaban. You can change it in conf/azkaban-users.xml.
Now you have successfully installed azkaban server. In the next set of posts, we will explore how to use this installation to do scheduling.