Paper-Conference

On the Hidden Objective Biases of Group-based Reinforcement Learning

Group-based reinforcement learning methods, like Group Relative Policy Optimization (GRPO), are widely used nowadays to post-train large language models. Despite their empirical …

aleksandar-fontana

• Jan 1, 2026 • 1 min read

Unmasking model behavior: How llms reason on vulnerability detection

Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising …

aleksandar-fontana

• Jan 1, 2025 • 1 min read

No results found

Paper-Conference

On the Hidden Objective Biases of Group-based Reinforcement Learning

Unmasking model behavior: How llms reason on vulnerability detection