RL

GTPO

The official implementation of GTPO (Group-relative Trajectory-based Policy Optimization), a novel method for stable and effective policy optimization in Large Language Models …

Sep 5, 2023 • 1 min read

GRPO

GRPO Vulnerability detection

We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).

Sep 5, 2023 • 1 min read

No results found

RL

GTPO

GRPO Vulnerability detection