Skip to content
View Battam1111's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@polyunlp

Block or report Battam1111

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
battam1111/README.md

Yanjun Chen

PhD candidate in the Department of Computing at The Hong Kong Polytechnic University, advised by Prof. Wenjie Li (Maggie) and Prof. Wei Zhang.

I want to make the environment trainable, the way models are, and with it to lift the ceiling of what AI can become. Today the environment is not even a single thing: a reward model here, a verifier there, a curriculum somewhere else, each built and judged on its own. My work begins with measurement: what does each piece actually contribute to the model it trains.

More at battam1111.github.io.

Selected work

  • Exact Is Easier: Credit Assignment for Cooperative LLM Agents (in submission, arXiv:2603.06859) One shared outcome hides each decision's share. The transcript makes every decision replayable, so per-decision credit is measured exactly instead of estimated: a learning algorithm that outperforms every approximate multi-agent RL alternative, plus a method-agnostic audit of credit quality.

  • The Accuracy Paradox in RLHF (EMNLP 2024) A reward model's benchmark accuracy fails to predict the policy it trains: varying only accuracy yields an interior optimum, with the real signal in the training dynamics.

  • battam1111.github.io Source for my homepage. al-folio + custom SCSS, trilingual EN/中/日.

Find me

Pinned Loading

  1. Myco Myco Public

    The living armor an AI agent inhabits: eternal devouring, eternal evolution, eternal amplification.

    Rust 63 7

  2. EIT-EAST-Lab/C3 EIT-EAST-Lab/C3 Public

    Official implementation of the paper "Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration". (by Yanjun Chen)

    Python 35

  3. AccuracyParadox-RLHF AccuracyParadox-RLHF Public

    [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models".

    Python 8