GTPO

Sep 5, 2023 · 1 min read

The official implementation of GTPO (Group-relative Trajectory-based Policy Optimization), a novel method for stable and effective policy optimization in Large Language Models (LLMs).

Last updated on Apr 11, 2026

GRPO GTPO LLM RL QWEN LLAMA

Authors

Aleksandar Fontana

PhD Student in AI and Cybersecurity

I am a PhD Student in AI and Cybersecurity at Scuola Superiore Sant’Anna. My research focuses on stabilizing Large Language Model alignment through Reinforcement Learning (e.g., GTPO) and detecting software vulnerabilities.

← GRPO Vulnerability detection Sep 5, 2023

No results found

GTPO