Ähnliche Jobs

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

Technische Universität München

Fach- und Hochschulen

München

  • Art der Beschäftigung: Vollzeit
  • Vor Ort
  • Zu den Ersten gehören

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

Über diesen Job

Zurück zu Nachrichten-Bereich

Master Thesis on Uncertainty-Aware Goal Selection for Reinforcement Learning Agents

26.02.2026, Studentische Hilfskräfte, Praktikantenstellen, Studienarbeiten

DarkAlley: Uncertainty-Aware Goal Selection for Stochastic Sparse-Reward Environments

Olivia Garland, Tim Walter

1 Motivation

Exploration in sparse-reward goal-reaching tasks has seen significant recent progress through directed goal selection methods. Approaches such as DISCOVER [2] and MEGA [4] guide exploration by selecting sub-goals from a frontier of achieved states, balancing achievability, novelty, and relevance to the target goal. These methods represent the state of the art on challenging long-horizon tasks, substantially outperforming undirected curiosity-based and count-based alternatives.

However, they have been developed and evaluated almost exclusively in deterministic or near-deterministic environments. Realistic deployment settings generally involve some degree of stochasticity—sensor noise, stochastic contact dynamics, or task-irrelevant dynamic elements in visual observations—that these benchmarks do not capture.

2 Background: Goal-Conditioned Reinforcement Learning

In goal-conditioned reinforcement learning, we consider a Markov decision process extended with a goal space G ⊆ S. The agent learns a policy π(a|s,g) conditioned on both the current state s and a desired goal g. The sparse reward is defined as r(s,a;g) = -1 for s ∉ Sg, where Sg is the set of states where goal g is achieved.

The key challenge in this setting is goal selection: which intermediate goals should the agent pursue during training to eventually reach a difficult target goal g*? Recent work has shown that strategic goal selection—choosing goals that are achievable yet novel and relevant to the target—dramatically outperforms random exploration or always pursuing g* directly.

3 Problem Statement

Current directed exploration methods typically estimate which sub-goals are worth exploring by measuring disagreement across an ensemble of learned value functions—high disagreement suggests the agent is uncertain about a region and should explore it. In deterministic environments, this works well: disagreement reflects genuine knowledge gaps (epistemic uncertainty) that close with more data.

In stochastic environments, however, inherent randomness prevents ensemble members from agreeing, even after many visits. The agent cannot distinguish between:

  • Epistemic uncertainty: "uncertain because unexplored" (novel conditions not yet encountered during training)
  • Aleatoric uncertainty: "uncertain because unpredictable" (inherent stochasticity)

This conflation can cause the agent to waste its exploration budget repeatedly visiting inherently noisy regions where no learning progress is possible. This is structurally analogous to the "noisy TV problem" identified in curiosity-driven exploration [1], but has not been studied in the goal-conditioned setting.

This suggests that effective goal selection in stochastic environments should prioritize sub-goals where:

  1. The agent is genuinely uncertain (high epistemic uncertainty)
  2. The environment is predictable enough to learn from (low aleatoric uncertainty)
  3. Achieving the goal would help reach the target (high relevance)

4 Proposed Work

We propose to first benchmark existing directed exploration methods on stochastic variants of standard goal-reaching environments to characterize how they degrade, and then to augment goal selection with a notion of learnability—an estimate of whether visiting a candidate sub-goal will actually reduce the agent's uncertainty, rather than merely appearing novel. How best to quantify learnability in this setting is itself a core question of the proposed work.

Interesting algorithms would be:

  • SIERL [5]
  • DISCOVER [2]
  • MEGA [4]
  • SLOPE [3]

5 Desirable Skills

  • Strong Python programming skills
  • Familiarity with deep reinforcement learning (value functions, policy gradients)
  • Experience with JAX or PyTorch
  • Interest in exploration strategies and math

To Apply:

Please send your CV and academic transcript with the subject "Master’s Thesis: DarkAlley RL" to:

Tim Walter (tim.walter@tum.de)
Olivia Garland (olivia.garland@tum.de)

[1] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by Random Network Distillation, October 2018.

[2] Leander Diaz-Bone, Marco Bagatella, Jonas Hübotter, and Andreas Krause. DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning, October 2025.

[3] Yao-Hui Li et al. From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement Learning, February 2026.

[4] Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, and Jimmy Ba. Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning, July 2020.

[5] Georgios Sotirchos, Zlatan Ajanović, and Jens Kober. Search Inspired Exploration, 2026.

Kontakt: tim.walter@tum.de; olivia.garland@tum.de

Unternehmens-Details

company logo

Technische Universität München

Fach- und Hochschulen

5.001-10.000 Mitarbeitende

München, Deutschland

Ähnliche Jobs

AI Working Student - Computer Vision

Helsing

München + 0 weitere

AI Working Student - Computer Vision

München + 0 weitere

Helsing

Intern (f/m/d) for Edge AI Model Zoo Optimization Research Engineering

DE63 NXP Semiconductors Germany GmbH

München + 0 weitere

Intern (f/m/d) for Edge AI Model Zoo Optimization Research Engineering

München + 0 weitere

DE63 NXP Semiconductors Germany GmbH

Machine Learning Working Student/Internship (m/f/d)

Gini GmbH

München + 0 weitere

Machine Learning Working Student/Internship (m/f/d)

München + 0 weitere

Gini GmbH

Masterarbeit über maschinelles Lernen für die Konnektivität in Non-Terrestrial Networks (w/m/d)

Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)

Oberpfaffenhofen + 0 weitere

Masterarbeit über maschinelles Lernen für die Konnektivität in Non-Terrestrial Networks (w/m/d)

Oberpfaffenhofen + 0 weitere

Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)

Software Engineering Intern (Machine Learning & AI Workflows)

Apple Inc

München + 0 weitere

Software Engineering Intern (Machine Learning & AI Workflows)

München + 0 weitere

Apple Inc

Praktikant / Werkstudent mit Option auf Abschlussarbeit - Entwicklung von KI-Agenten für Softwareverständnis und Testautomatisierung (m/w/d)

Silver Atena GmbH

München + 0 weitere

Praktikant / Werkstudent mit Option auf Abschlussarbeit - Entwicklung von KI-Agenten für Softwareverständnis und Testautomatisierung (m/w/d)

München + 0 weitere

Silver Atena GmbH

AI Ecosystem Working Student (m/w/d)

TRUSTEQ

München + 0 weitere

AI Ecosystem Working Student (m/w/d)

München + 0 weitere

TRUSTEQ

Master s Thesis Spiking Neural Networks in the Physical Layer of Wireless Communication Systems

Technische Universität München

München + 0 weitere

Neu · 

Master s Thesis Spiking Neural Networks in the Physical Layer of Wireless Communication Systems

München + 0 weitere

Technische Universität München

Neu · 

Masterarbeit in der virtuellen Ausbildungssimulation

Jobriver HR Service

München + 0 weitere

Masterarbeit in der virtuellen Ausbildungssimulation

München + 0 weitere

Jobriver HR Service