GRPO Vulnerability detection
We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).
•
1 min read
We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).