GRPO Vulnerability detection

Sep 5, 2023 · 1 min read
projects

We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).

Aleksandar Fontana
Authors
PhD Student in AI and Cybersecurity
I am a PhD Student in AI and Cybersecurity at Scuola Superiore Sant’Anna. My research focuses on stabilizing Large Language Model alignment through Reinforcement Learning (e.g., GTPO) and detecting software vulnerabilities.