RL | Aleksandar Fontana

GRPO Vulnerability detection

Tue, 05 Sep 2023 00:00:00 +0000

We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).

GTPO

Tue, 05 Sep 2023 00:00:00 +0000

The official implementation of GTPO (Group-relative Trajectory-based Policy Optimization), a novel method for stable and effective policy optimization in Large Language Models (LLMs).