<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RL | Aleksandar Fontana</title><link>http://gildarts777.github.io/tags/rl/</link><atom:link href="http://gildarts777.github.io/tags/rl/index.xml" rel="self" type="application/rss+xml"/><description>RL</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 05 Sep 2023 00:00:00 +0000</lastBuildDate><image><url>http://gildarts777.github.io/media/icon_hu_da05098ef60dc2e7.png</url><title>RL</title><link>http://gildarts777.github.io/tags/rl/</link></image><item><title>GRPO Vulnerability detection</title><link>http://gildarts777.github.io/projects/grpo-vulnerability-detection/</link><pubDate>Tue, 05 Sep 2023 00:00:00 +0000</pubDate><guid>http://gildarts777.github.io/projects/grpo-vulnerability-detection/</guid><description>&lt;p&gt;We adapt Group Relative Policy Optimization (GRPO) for software vulnerability detection, using rule-based rewards (no value function, no learned reward model).&lt;/p&gt;</description></item><item><title>GTPO</title><link>http://gildarts777.github.io/projects/1-gtpo/</link><pubDate>Tue, 05 Sep 2023 00:00:00 +0000</pubDate><guid>http://gildarts777.github.io/projects/1-gtpo/</guid><description>&lt;p&gt;The official implementation of GTPO (Group-relative Trajectory-based Policy Optimization), a novel method for stable and effective policy optimization in Large Language Models (LLMs).&lt;/p&gt;</description></item></channel></rss>