AI Engineer - Synthetic Data Generation

AI Engineer - Synthetic Data Generation

AI Engineer - Synthetic Data Generation

AI Engineer - Synthetic Data Generation

Omnilex

Informationsdienste

Zürich

  • Art der Beschäftigung: Vollzeit
  • 8.000 CHF – 13.000 CHF (Unternehmensangabe)
  • Vor Ort
  • Zu den Ersten gehören

AI Engineer - Synthetic Data Generation

Passt der Job zu Dir?

Mit einem XING Profil siehst Du gleich, welche Deiner Fähigkeiten und Wünsche konkret zum Job passen. Damit Du Dich nicht nur im Home-Office wie zuhause fühlst.

Jetzt anmelden und herausfinden

Über diesen Job

🌟 About You

Do you get joy from turning messy legal texts into clean, structured, high-quality datasets that actually improve model behavior? Do you like building pipelines where every step is measurable: extraction quality, citation correctness, dedup rate, cost per item, throughput, and regression stability? Are you comfortable shipping pragmatic tooling (CLIs, validators, tests) around LLMs without hand-waving away edge cases? If so, we’d love to hear from you.

🚀 About Omnilex

Omnilex is a young dynamic AI legal tech startup with its roots at ETH Zurich. Our passionate interdisciplinary team of 10+ people is dedicated to empowering legal professionals in law firms and legal teams by leveraging the power of AI for legal research and answering complex legal questions. We already stand out with handling unique challenges, including our combination of external data, customer-internal data and our own innovative AI-first legal commentaries.

🧬 Your Mission: Synthetic Data for Legal AI

As an AI Engineer – Synthetic Data Generation, you will build and own pipelines that generate retrieval-ready and evaluation-grade synthetic datasets from real legal sources (court decisions, statutes, commentaries) across languages and jurisdictions, while keeping quality high and costs controlled.

Tasks

🛠️ Your Responsibilities

  • Build multi-step generation pipelines (10+ steps): From DB selection → pseudonymization → extraction → translation → normalization → deduplication→ validation → classification → rating → export.
  • LLM integration, production-grade: Design robust prompt suites for extraction, translation, classification, and rating; enforce structured JSON outputs; handle retries, partial failures, and weird model behavior.
  • Quality assurance & filtering: Implement scoring systems (multi-criteria, consistent rubrics), dedup/near-dup suppression, and deterministic validators (especially for citations).
  • Citation processing at legal-grade precision: Extract, normalize, and validate citations across languages and formats (e.g., Art. 336c Abs. 1 OR, BGE 137 III 266 E. 3.2), including abbreviation mapping and normalization rules.
  • Cost & throughput optimization: Use batch APIs where appropriate, tune reasoning effort, control concurrency, count tokens, and keep runs cost-efficient (without sacrificing quality).
  • Developer tooling & CLI workflows: Build CLIs with progress tracking, configurable concurrency, and solid ergonomics for long-running jobs.
  • Testing across levels: Write unit/smoke/integration tests for pipelines and validators (including mocked LLMs where sensible and real API runs where needed).
  • Cross-team collaboration: Work closely with legal experts to define what “good” looks like for exam questions/commentaries, and translate that into measurable QA checks.

Requirements

Minimum qualifications

  • Experience building backend/data tooling with TypeScript/Node.js (strict typing, generics, async patterns).
  • Hands-on experience integrating LLM APIs (OpenAI/Anthropic or similar), including structured outputs (JSON), prompt iteration, and failure handling.
  • Strong data pipeline mindset: ETL workflows, transformation steps, validation, and reproducibility.
  • Solid SQL/PostgreSQL skills and experience with an ORM (bonus if Drizzle).
  • Experience writing reliable tests (e.g., Jest) and maintaining CI-friendly pipelines.
  • Fluent English; willing to work hybrid in Zurich (on-site at least two days/week), full-time.

🎯 Preferred qualifications

  • Familiarity with the Swiss legal system (court structure, citation norms, multilingual legal terminology).
  • Working proficiency in German; plus French/Italian is a strong advantage.
  • Experience with batch processing and cost-aware LLM operations (token budgeting, batching strategy, caching, early-exit).
  • Practical text processing skills: regex-heavy parsing, dedup/near-dup detection, similarity search (e.g., BM25 / MiniSearch).
  • Familiarity with our environment: Yarn workspaces/monorepos, NestJS, and pragmatic CLI tooling.

Benefits

🤝 Benefits

  • Direct impact: Your datasets will directly shape model quality and evaluation reliability in legal research and reasoning.
  • Autonomy & ownership: Own the synthetic data pipeline end-to-end; prompts, validators, QA, exports, and cost controls.
  • Team: Work with a sharp interdisciplinary group at the intersection of AI, engineering, and law.
  • Compensation: CHF 7’000–11’000 per month + ESOP, depending on experience and skills.

We’re excited to hear from candidates who love building robust, cost-aware LLM pipelines and care about precision (especially when citations and multilingual legal nuance matter). Apply today by pressing the Apply button.

Ähnliche Jobs

Agentic AI Engineer (m/f/d)

Optimus Search

Berlin + 0 weitere

55.000 €90.000 €

Agentic AI Engineer (m/f/d)

Berlin + 0 weitere

Optimus Search

55.000 €90.000 €

Externes Job-Angebot. Von einem Partner.

Quantitative Researcher

Anson McCade

Zürich + 0 weitere

89.500 CHF123.000 CHF

Externes Job-Angebot. Von einem Partner.

Quantitative Researcher

Zürich + 0 weitere

Anson McCade

89.500 CHF123.000 CHF

Senior Experte für Machine Learning (m/w/d)

ROCKEN

Zürich + 0 weitere

95.000 CHF115.000 CHF

Senior Experte für Machine Learning (m/w/d)

Zürich + 0 weitere

ROCKEN

95.000 CHF115.000 CHF

Machine Learning Engineer - Wearables & Sensor Data (m/w/d) 80-100%

greenteg AG

Rümlang + 0 weitere

80.000 €105.000 €

Machine Learning Engineer - Wearables & Sensor Data (m/w/d) 80-100%

Rümlang + 0 weitere

greenteg AG

80.000 €105.000 €

Sr. Data Scientist – Traffic Simulation and Street Optimization

Esri

Zürich + 0 weitere

85.500 CHF99.000 CHF

Sr. Data Scientist – Traffic Simulation and Street Optimization

Zürich + 0 weitere

Esri

85.500 CHF99.000 CHF

Research Scientist, ML Systems - PhD New College Grad 2026

NVIDIA

Zürich + 0 weitere

87.000 CHF105.500 CHF

Research Scientist, ML Systems - PhD New College Grad 2026

Zürich + 0 weitere

NVIDIA

87.000 CHF105.500 CHF

Applied Data Scientist

Oracle

Zürich + 0 weitere

89.000 CHF109.000 CHF

Applied Data Scientist

Zürich + 0 weitere

Oracle

89.000 CHF109.000 CHF

Senior Reliability Data Scientist

LafargeHolcim Ltd.

Zug + 0 weitere

103.500 CHF126.500 CHF

Senior Reliability Data Scientist

Zug + 0 weitere

LafargeHolcim Ltd.

103.500 CHF126.500 CHF

Senior Product Data Scientist, Youtube Creator

YouTube

Zürich + 0 weitere

84.000 CHF110.500 CHF

Senior Product Data Scientist, Youtube Creator

Zürich + 0 weitere

YouTube

84.000 CHF110.500 CHF