hashicorp nomad
TRANSCRIPT
Overview
❖ Docker support
❖ Operationally simple: one binary, multi-datacenter
❖ Built for scale
❖ Microservices
❖ Hybrid cloud deployment (AWS, Azure, GCE, Bare Metal, VMWare, …)
Concepts
❖ Task -> Task Group -> Job
❖ Driver
❖ Client / Server
❖ Allocation
❖ Evaluation
❖ Regions and Datacenters
Architecture
❖ Consensus Protocol: Raft
❖ default: possibly stale reads for network partitioning
❖ stale: faster reads from any server
❖ Gossip Protocol to manage membership
❖ Single global WAN gossip pool for cross-region requests
Scheduling
❖ Design inspired by Google’s papers Omega, Borg
❖ Allocation: set of tasks in a job to be run on some node
❖ Scheduling: process of determining the appropriate allocations
❖ Evaluation: process of handling state change
Scheduling
❖ State is changed -> create evaluation
❖ Evaluation Broker (Leader): «at least once», priority order, manage queued pending evaluations
❖ Scheduler types: batch, service, core
❖ Schedulers (all Servers): process evaluation, create evaluation plan
Scheduler❖ Generate allocation plan from desired state, real state
❖ Plan: set allocations to evict, update or create+place
❖ Place allocation:
❖ feasibility checking: filter out unhealthy nodes, no drivers, etc
❖ ranking: Scores for each node (bin packing + affinity/anti-affinity rules). Max value node wins
❖ Failed allocation rescheduled given the previous result
Job Specification❖ HCL or JSON
❖ Job -> [Task Group]
❖ Task Group -> [Task]
❖ Job: datacenters, region, type (service/batch), update strategy, priority, meta
❖ Task Group: count, meta
❖ Task: driver, config, resources (cpu, memory, ..), meta
❖ Resources: cpu, disk, iops, memory, network (ports, mbits)
job "my-service" { # Job should run in the US region region = "us"
# Spread tasks between us-west-1 and us-east-1 datacenters = ["us-west-1", “us-east-1"]
# Rolling updates should be sequential update { max_parallel = 1 }
group "webs" { # We want 5 web servers count = 5 task "frontend" { driver = "docker" config { image = “hashicorp/web-frontend" } resources { cpu = 500 memory = 128 network { dynamic_ports = ["http","https"] }}}}}
Runtime Environment
❖ Env: from job specification, from runtime during alloc
❖ NOMAD_META_{key} = {value} from job spec
❖ NOMAD_CPU_LIMIT: int, unit = 1MHz
❖ NOMAD_MEMORY_LIMIT: int, unit = 1MB
❖ NOMAD_IP, NOMAD_PORT_{LABEL} («http», …)
Task Drivers❖ To execute a task, isolate resources, mask details,
provide abstraction
❖ Docker
❖ Fork/Exec
❖ Java
❖ Qemu
❖ Custom
HTTP API
❖ /v1/jobs/v1/nodes/v1/allocations/v1/evaluations/v1/agent/{self,join,members,force-leave,servers} /v1/status/{leader,peers}
❖ CLI invokes HTTP API