GRPO Vulnerability detection

Sep 5, 2023 · 1 min read

We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).

Last updated on Apr 11, 2026

GRPO VULNERABILITY DETECTION LLM RL QWEN LLAMA

Authors

Aleksandar Fontana

PhD Student in AI and Cybersecurity

I am a PhD Student in AI and Cybersecurity at Scuola Superiore Sant’Anna. My research focuses on stabilizing Large Language Model alignment through Reinforcement Learning (e.g., GTPO) and detecting software vulnerabilities.

GTPO Sep 5, 2023 →

No results found

GRPO Vulnerability detection