Site Reliability Engineer (f/m/x)
Site Reliability Engineer (f/m/x)
- Art der Beschäftigung: Vollzeit
- 45.500 € – 62.000 € (von XING geschätzt)
- Vor Ort
- Zu den Ersten gehören
Site Reliability Engineer (f/m/x)
Über diesen Job
Location: Hybrid – Cologne (Rheinauhafen) — 3 days in the office, 2 remote (Tue + Thu)
Team: Engineering · Reports to CTO
Keep the world awake — build reliability at scale
ilert helps thousands of DevOps & IT teams detect, fix, and communicate incidents faster.
Our platform is mission-critical: customers rely on us 24/7 to keep their always-on businesses running.
As a Site Reliability Engineer at ilert, you’ll own the reliability, performance, and scalability of our core platform across AWS, Kubernetes, Kafka, and more.
Tasks
Build & operate a highly available platform
- Run and evolve our AWS-based infrastructure
- Operate and optimize self-managed Kafka, ClickHouse clusters and our Observability stack
- Ensure resilience, disaster recovery, and capacity planning across the stack
Improve reliability & performance
- Build and maintain SLOs, SLIs, error budgets, and observability dashboards
- Debug production issues across layers (networking, Kubernetes, application, DB)
- Improve performance of our ingestion pipeline
Automation & tooling
- Automate operations with Terraform, Helm, Kubernetes operators, and internal tooling
- Build tooling for safer deploys, blue/green rollouts, and automated verification
- Strengthen incident response workflows through deep collaboration with our AI SRE agent team
Security & compliance
- Implement best practices for workload isolation, secrets management, IAM, and auditability
- Support our ISO27001 posture by automating controls and hardening our infrastructure
Cross-functional impact
- Partner with Backend, AI, and Product teams to design reliable services
- Participate in on-call rotation
- Lead post-incident reviews and drive reliability improvements long-term
Requirements
- 3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure Engineer
- Strong hands-on experience with AWS, Kubernetes, Linux internals, networking, performance tuning
- Experience operating self-managed distributed systems, ideally Kafka or ClickHouse
- Strong understanding of observability
- Experience automating infrastructure with Terraform and CI/CD systems
- Fluent English (our working language); German optional
Benefits
- 🚀 Product-centric - 100 % focused on solving a mission-critical pain felt by every always-on business |
- 🏡 Hybrid freedom - 2 days remote by default; gorgeous Rheinauhafen roof terrace when you’re in town |
- 🕒 Focus > meetings - We time-box syncs, favour async docs and protect maker time |
- 🌴 28 days off - …plus public holidays |
- 🚲 Commute perks - subsidised public transport|