Applied LLM / Data Scientist
Applied LLM / Data Scientist
Applied LLM / Data Scientist
Applied LLM / Data Scientist
Synagen GmbH
Computer-Software
Berlin
- Art der Beschäftigung: Vollzeit
- 53.500 € – 79.500 € (von XING geschätzt)
- Vor Ort
Applied LLM / Data Scientist
Über diesen Job
Synagen builds specialized AI agents for healthcare and oncology, designed to support complex clinical decisions and biomedical workflows with actionable, high-precision outputs. We combine modern AI with clinical expertise to create software that integrates into real provider environments and delivers value in practice.
Aufgaben
Synagen builds AI agents for healthcare and oncology that integrate into real clinical workflows. As our Applied LLM / Data Scientist, you will help turn high-volume patient and clinical data into scientific, research, and clinical insights—by building the data and model operations layer that makes this reliable, scalable, and compliant. A core part of this role is applied research and analytics delivery with pharma partners: you will work customer-near on real projects, translate scientific questions into data products and agent workflows, and ship outcomes that can be used in practice.
- This role bridges two modes (split may vary over time):
- Customer project work: deliver concrete analyses, data products, and insight pipelines for partner hospitals and projects.
- Internal platform work: build the reusable foundations (datalake/lakehouse, ontology/terminology layer, evaluation/monitoring) that make those projects fast, reproducible, and production-grade.
What you will do
- Lead applied research / analytics projects with pharma and clinical partners: independently scope questions, define datasets and success criteria, and deliver end-to-end outputs with medical stakeholders.
- Build and operate scalable pipelines that transform raw clinical/patient data into structured, queryable, analysis-ready datasets.
- Design and evolve a datalake / lakehouse approach on Azure (storage, compute patterns, governance, access controls).
- Develop and maintain ontologies / terminology mappings and a consistent internal data model to enable reliable downstream analytics and agent reasoning.
- Build “SynInsight”-style data products for partners (e.g., cohorts, endpoints, phenotypes, evidence-ready exports and reports) that are robust, reproducible, and measurable.
- Implement LLM/agent operations: prompt/workflow versioning, evaluation harnesses, monitoring, regression testing, and cost/performance controls—using AI-assisted development tools (e.g., Claude Code, Codex) where helpful.
- Build agents that automate R&D workflows (e.g., data-to-cohort pipelines, evidence synthesis, structured insight generation), andoperationalize them with proper evaluation and monitoring.
- Drive privacy-preserving data capabilities, including synthetic data generation for development, evaluation, and safer sharing/testing in projects (including Azure-based implementations).
- Ensure security, privacy, and compliance expectations are met when processing sensitive healthcare data in Germany/EU and the US (e.g., GDPR, ISO 27001, SOC 2, BSI C5; US healthcare compliance alignment).
Qualifikation
- Strong experience in applied data science / ML engineering / MLOps, ideally in pharma, R&D, or healthcare-adjacent environments.
- Proven ability to build production-grade pipelines for messy real-world data (ETL/ELT, data quality, lineage, reproducibility).
- Experience building and operating LLM/agent systems in production (workflows, evaluation, monitoring, reliability).
- Strong coding skills (Python + SQL) and comfort with engineering best practices (tests, CI/CD, documentation).
- Practical experience structuring data with ontologies/terminologies and making it usable for analytics and downstream systems.
- Experience in AI-assisted programming (Claude Code, Codex, etc.)
- Fluent in English (written and spoken).
Good to have
- Experience with clinical terminologies and standards (e.g., ICD-10, SNOMED CT, LOINC, RxNorm/ATC).
- Experience with modern data stack components (lakehouse patterns, columnar formats, distributed compute) on Azure.
- Familiarity with privacy-preserving data processing (pseudonymization/de-identification, access partitioning, audit trails).
- Experience delivering customer-facing data/ML projects end-to-end.
Real-world impact in oncology: build integrations that bring AI into clinical workflows where accuracy and trust matter.
High ownership: you will shape our interoperability layer end-to-end and define how we integrate at scale.
