GTPO

Sep 5, 2023 · 1 min read
projects

The official implementation of GTPO (Group-relative Trajectory-based Policy Optimization), a novel method for stable and effective policy optimization in Large Language Models (LLMs).

Aleksandar Fontana
Authors
PhD Student in AI and Cybersecurity
I am a PhD Student in AI and Cybersecurity at Scuola Superiore Sant’Anna. My research focuses on stabilizing Large Language Model alignment through Reinforcement Learning (e.g., GTPO) and detecting software vulnerabilities.