Show Menu
Cheatography

Druid Architecture Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Coordi­nator

Overview
The Druid Coordi­nator process is primarily respon­sible for segment management and distri­bution. More specif­ically, the Druid Coordi­nator process commun­icates to Historical processes to load or drop segments based on config­ura­tions. The Druid Coordi­nator is respon­sible for loading new segments, dropping outdated segments, managing segment replic­ation, and balancing segment load.

Router

Overview
The Apache Druid (incub­ating) Router process can be used to route queries to different Broker processes. By default, the broker routes queries based on how Rules are set up. For example, if 1 month of recent data is loaded into a hot cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries for more important data are not impacted by queries for less important data.
 

Overlord

Overview
The Overlord process is respon­sible for accepting tasks, coordi­nating task distri­bution, creating locks around tasks, and returning statuses to callers. Overlord can be configured to run in one of two modes - local or remote (local being default). In local mode Overlord is also respon­sible for creating Peons for executing tasks. When running the Overlord in local mode, all Middle­Manager and Peon config­ura­tions must be provided as well. Local mode is typically used for simple workflows. In remote mode, the Overlord and Middle­Manager are run in separate processes and you can run each on a different server. This mode is recomm­ended if you intend to use the indexing service as the single endpoint for all Druid indexing.

Historical

 
 

Broker

Overview
The Broker is the process to route queries to if you want to run a distri­buted cluster. It unders­tands the metadata published to ZooKeeper about what segments exist on what processes and routes queries such that they hit the right processes. This process also merges the result sets from all of the individual processes together. On start up, Historical processes announce themselves and the segments they are serving in Zookeeper.

Middle Manager

Overview
The Middle­Manager process is a worker process that executes submitted tasks. Middle Managers forward tasks to Peons that run in separate JVMs. The reason we have separate JVMs for tasks is for resource and log isolation. Each Peon is capable of running only one task at a time, however, a Middle­Manager may have multiple Peons.