Paper-Conference

On the Hidden Objective Biases of Group-based Reinforcement Learning

Group-based reinforcement learning methods, like Group Relative Policy Optimization (GRPO), are widely used nowadays to post-train large language models. Despite their empirical …

aleksandar-fontana

Unmasking model behavior: How llms reason on vulnerability detection

Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising …

aleksandar-fontana