Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/projects/finrate/assets/fig-eg-DR_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/projects/finrate/assets/fig-eg-EC_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added app/projects/finrate/assets/fig-eg-LT_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions app/projects/finrate/page.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Fin-RATE: Financial Analytics and Tracking Evaluation Benchmark for LLMs on SEC Filings

![Overview](./assets/image-20260121002058463.png)

**Fin-RATE** is a real-world benchmark to evaluate large language models (LLMs) on professional-grade reasoning over **U.S. SEC filings**.
It targets financial analyst workflows that demand:

- 📄 **Long-context understanding**
- ⏱️ **Cross-year tracking**
- 🏢 **Cross-company comparison**
- 📊 **Structured diagnosis of model failures**

> 📘 [Paper](https://arxiv.org/abs/2602.07294) | 🤗 [Dataset](https://huggingface.co/datasets/JunrongChen2004/Fin-RATE)
> ⬇️ SEC-based QA benchmark with 7,500 instances + interpretable evaluation.

---

## 🔍 Overview

Fin-RATE includes **three core QA tasks**, modeling real-world financial reasoning:

![Fin-RATE Tasks|scale=0.9](./assets/fig-dataset-overview_01.png)

| | |
| --------- | ------------------------------------------------------------ |
| **DR-QA** | Detail & Reasoning: fine-grained reasoning within one SEC section |
| **EC-QA** | Enterprise Comparison: reasoning across peer firms in the same industry/year |
| **LT-QA** | Longitudinal Tracking: analyzing trends across years for the same firm |

### DR-QA Example

![DR-QA Example|scale=0.6](./assets/fig-eg-DR_01.png)


### EC-QA Example

![EC-QA Example|scale=0.6](./assets/fig-eg-EC_01.png)


### LT-QA Example

![LT-QA Example|scale=0.6](./assets/fig-eg-LT_01.png)


---

13 changes: 13 additions & 0 deletions config/publications.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,19 @@ export interface Publication {
}

export const publications: Publication[] = [
{
title: "Fin-RATE: A Real-world Financial Analytics and Tracking Evaluation Benchmark for LLMs on SEC Filings",
authors: "Yidong Jiang, Junrong Chen, Eftychia Makri, Jialin Chen, Peiwen Li, Ali Maatouk, Leandros Tassiulas, Eliot Brenner, Bing Xiang, Rex Ying",
venue: "KDD 2026",
page: "finrate",
code: "https://github.com/jyd777/Fin-RATE",
paper: "https://arxiv.org/abs/2602.07294",
abstract:
"Fin-RATE is a benchmark for evaluating LLMs on U.S. Securities and Exchange Commission (SEC) filings, designed to mirror financial analyst workflows. It covers detail-oriented reasoning within individual disclosures, cross-entity comparison under shared financial topics, and longitudinal tracking of the same firm across reporting periods. Experiments on 17 leading LLMs show substantial performance degradation as tasks move beyond single-document reasoning, with accuracy dropping by 18.60% and 14.35% on longitudinal and cross-entity analysis, respectively. These results reveal comparison hallucinations, temporal/entity mismatches, and weaknesses in reasoning quality and factual consistency.",
impact:
"Fin-RATE provides a diagnostic framework for evaluating LLMs in realistic financial analysis workflows. It reveals that current models struggle with cross-document reasoning, long-context financial tracking, and distinguishing retrieval, generation, reasoning, and context-interpretation errors.",
tags: [Tag.Benchmark],
},
{
title: "Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs",
authors: "Ngoc Bui, Shubham Sharma, Simran Lamba, Saumitra Mishra, Rex Ying",
Expand Down
7 changes: 4 additions & 3 deletions mdx-components.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ interface ImageOption {
}

function MarkDownImage(props: any) {
const [title, optionPart] = props.alt.split('|')
const { alt = '', style: _style, ...rest } = props
const [title, optionPart] = alt.split('|')
const option: ImageOption = optionPart ? optionPart.split(",").reduce((acc: any, cur: string) => {
const [key, value] = cur.split("=")
acc[key] = value
return acc
}, {}) : { scale: 1 }
const width_scale = 100 * option.scale
const width_scale = 100 * Number(option.scale)
const style = {
width: `${width_scale}%`,
height: 'auto',
Expand All @@ -25,8 +26,8 @@ function MarkDownImage(props: any) {
width={0}
height={0}
sizes="100vw"
{...rest}
style={style}
{...(props as ImageProps)}
/>
<span className="block mx-auto my-2 text-sm text-slate-600 text-center">{title}</span>
</>
Expand Down
22 changes: 3 additions & 19 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.