Private AI Runtime & Infrastructure Engineer (Local Data Residency)
Private AI Runtime & Infrastructure Engineer (Local Data Residency)
Private AI Runtime & Infrastructure Engineer (Local Data Residency)
Private AI Runtime & Infrastructure Engineer (Local Data Residency)
Omnilex
Informationsdienste
Zürich
- Art der Beschäftigung: Vollzeit
- 8.000 CHF – 12.000 CHF (Unternehmensangabe)
- Vor Ort
- Zu den Ersten gehören
Private AI Runtime & Infrastructure Engineer (Local Data Residency)
Über diesen Job
🌟 About You
You’re the person teams trust when the workload is messy, privacy-sensitive, and “it must not go down.” You like building systems that stay boring in production: hardened VMs, predictable deploys, clean rollbacks, useful alerts, and security controls that work by default.
You can talk comfortably about trade-offs, latency vs cost, isolation vs operational complexity, GPU vs CPU, shared vs per-tenant, and you’ve probably earned that comfort by running something real in production and being on the hook when it breaks.
🚀 About Omnilex
Omnilex is a fast-moving AI legal tech startup born out of ETH Zurich. Our interdisciplinary team of 14+ people builds an AI product that helps lawyers and in-house legal teams research faster and answer complex legal questions with confidence. We combine external/public sources, customer-internal knowledge, and our own AI-first legal commentaries to tackle real-world legal complexity, often under strict data residency and privacy expectations.
Tasks
🧭 The mission
Own the private AI runtime that powers our product: the infrastructure, security posture, and operational reliability of running LLM + agent workloads on Swiss-based VMs (optionally with GPUs). You’ll make it safe, observable, scalable, and easy for the team to ship changes without fear.
🛠️ What you’ll build and run
- LLM + agent runtime in production (Swiss/Local VMs)
- Design and operate a VM-centric architecture for agent execution and model serving (single-node vs multi-node, concurrency, streaming).
- Manage model artifacts: downloads, storage, integrity checks, versioning, and safe rollback paths.
- Implement pragmatic controls for runaway agents (timeouts, token limits, tool permission boundaries, sandboxing patterns).
- Make capacity predictable: tokens/sec, queueing/backpressure, p95 latency targets, peak load behavior, and graceful degradation modes.
- Linux / VM / network fundamentals (done right)
- Establish hardened VM baselines: users/sudo, SSH posture, patching approach, sensible defaults.
- Apply resource controls (cgroups/ulimits), disk + IO tuning, and repeatable “why is this VM slow?” investigation playbooks.
- Own TLS decisions end-to-end (termination, cert lifecycle, internal mTLS where it matters), plus egress controls and private networking.
- Build debugging muscle for real failure modes (DNS, certs, MTU, packet loss, noisy neighbors / CPU steal, memory pressure, IO wait).
- Observability and incident readiness for LLM services
- Instrument what actually matters for LLM workloads: tokens/sec, queue depth, context length distribution, timeouts, error classes, saturation signals.
- Turn logs/metrics/traces into action: dashboards that get used, alerts that don’t spam, and runbooks that work at 3am.
- Drive incident hygiene: triage patterns, mitigation tools, postmortems that result in concrete fixes.
- Security & privacy as engineering, not paperwork
- Build a real secrets lifecycle (creation → distribution → runtime access → rotation → revocation).
- Enforce least-privilege access on VM-centric infra (service identities, scoped credentials, audit trails).
- Prevent sensitive data leakage via prompts/logs/traces (redaction, sampling discipline, “never log this” guardrails).
- Reduce supply-chain risk: container/dependency hygiene, provenance checks for model artifacts, scanning and patch workflows.
- Prepare for uncomfortable scenarios (suspicious outbound traffic, prompt injection leading to attempted exfiltration) with detection + response playbooks.
- Safe delivery and change management
- Implement deploy strategies appropriate for AI runtime infra: canaries, rollback triggers, maintenance windows, and fast revert paths.
- Keep systems from drifting: config drift detection, baseline enforcement, and reproducible deployments (IaC/automation).
- Support database/environment changes that intersect with runtime reliability (migrations, customer splits, environment promotion).
What success looks like (first months)
- A clear map of the biggest reliability + security risks in the runtime, with a prioritized plan and measurable improvements.
- Repeatable deployments with rollback confidence and a sane patching cadence (OS + drivers + runtime).
- Dashboards/alerts that catch real incidents early (and stop waking people up for noise).
- Practical privacy boundaries: everyone on the team knows what data can/cannot leave the VM, and the system enforces it.
Requirements
📌 What we’re looking for
✅ Minimum qualifications
- Hands-on experience running production infrastructure (VMs, networking, Linux) for a SaaS or platform product.
- Strong operational skill: debugging, incident response, log/metric-based investigation, and making recurring problems disappear.
- Solid security fundamentals applied pragmatically: secrets, least privilege, egress control, dependency hygiene, auditability.
- Comfort automating everything that should not require humans (provisioning, deploys, checks, drift detection, runbooks).
- Clear communication and an ownership mindset, you can partner with product/dev without becoming a blocker.
- Proficiency in English.
- Full-time availability; Zurich-based with at least two on-site days per week (hybrid).
🎯 Preferred qualifications
- Experience operating LLM serving/agent systems (or similarly spiky, latency-sensitive workloads).
- GPU operations familiarity (VRAM sizing intuition, quantization trade-offs, fallback modes).
- Azure experience (identity, networking, observability, IaC).
- Familiarity with our ecosystem: TypeScript, Node.js, NestJS, Next.js.
- Exposure to ISO 27001-style environments or supporting security audits.
- Swiss work permit or EU/EFTA citizenship.
- Working proficiency in German.
Benefits
🤝 Benefits
- High-impact ownership over the most sensitive part of the stack: private AI runtime + reliability + security.
- A sharp interdisciplinary team working at the intersection of AI and law.
- Autonomy: you define the guardrails that let everyone ship faster with fewer surprises.
- Compensation: CHF 8’000–12’000 per month + ESOP, depending on experience and skills.
If you want to own the infrastructure that keeps privacy guarantees real and production predictable, even under LLM chaos, apply by pressing the Apply button.