Skip to content

Commit de451e8

Browse files
committed
remove unsubstantiated info from claude
1 parent cf6935f commit de451e8

2 files changed

Lines changed: 49 additions & 25 deletions

File tree

paper/sections/abstract.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
\section*{Abstract}
22

3-
We combine the demographic detail of the Current Population Survey (CPS) with the tax precision of the IRS Public Use File (PUF) to create an enhanced microsimulation dataset. Our method uses quantile regression forests to transfer income and tax variables from the PUF to demographically-similar CPS households, followed by a dropout-regularized gradient descent procedure that reweights households to match administrative targets. The enhanced dataset reduces discrepancies in key tax components by 40\% compared to the baseline CPS while preserving demographic relationships and program participation patterns. Validation against IRS Statistics of Income shows the enhanced data captures capital gains within 12\% of administrative totals (vs. 45\% baseline error), business income within 8\% (vs. 38\%), and dividend income within 7\% (vs. 32\%). The dataset matches state-level EITC claims within 5\% for 45 states and maintains the CPS's high accuracy for poverty estimation and program participation analysis. We release both the enhanced dataset and our open-source enhancement procedure to support transparent policy analysis.
3+
We combine the demographic detail of the Current Population Survey (CPS) with the tax precision of the IRS Public Use File (PUF) to create an enhanced microsimulation dataset. Our method uses quantile regression forests to transfer income and tax variables from the PUF to demographically-similar CPS households. We create a synthetic CPS-structured dataset using PUF tax information, stack it alongside the original CPS records, then use dropout-regularized gradient descent to reweight households toward administrative targets from IRS Statistics of Income, Census population estimates, and program participation data. This preserves the CPS's granular demographic and geographic information while leveraging the PUF's tax reporting accuracy. The enhanced dataset provides a foundation for analyzing federal tax policy, state tax systems, and benefit programs. We release both the enhanced dataset and our open-source enhancement procedure to support transparent policy analysis.

paper/sections/results.tex

Lines changed: 48 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,70 @@
11
\section{Results}
22

3-
We validate our enhanced dataset against the administrative targets used in our loss matrix construction.
3+
We evaluate our enhanced dataset against administrative targets by constructing a loss matrix (defined in utils/loss.py) measuring relative deviations from:
44

5-
\subsection{Target Categories}
5+
\subsection{IRS Statistics of Income Targets}
66

7-
Our loss matrix tracks matches against:
8-
9-
\subsubsection{IRS Statistics of Income}
10-
By AGI bracket and filing status:
7+
By AGI bracket and filing status, we track:
118
\begin{itemize}
12-
\item Adjusted gross income
13-
\item Employment income
14-
\item Business net profits/losses
15-
\item Capital gains
9+
\item Adjusted gross income totals
10+
\item Return counts
11+
\item Wages, salaries, and tips
12+
\item Business net profits and losses (separately)
13+
\item Capital gains (gross amounts and distributions)
1614
\item Ordinary dividends
17-
\item Partnership and S-corporation income
15+
\item Partnership and S-corporation income and losses
1816
\item Qualified dividends
1917
\item Taxable interest income
2018
\item Pension income
2119
\item Social Security benefits
20+
\item Estate income and losses
21+
\item Tax-exempt interest
22+
\item IRA distributions
23+
\item Rent and royalty income and losses
24+
\item Taxable pension income
25+
\item Taxable Social Security
26+
\item Unemployment compensation
27+
\end{itemize}
28+
29+
\subsection{Census Population Targets}
30+
31+
From Census projections:
32+
\begin{itemize}
33+
\item Population counts for each single year of age from 0 to 85
2234
\end{itemize}
2335

24-
\subsubsection{Census Population Projections}
25-
Single-year age populations from ages 0-85.
36+
\subsection{CBO Program Totals}
2637

27-
\subsubsection{CBO Projections}
28-
Annual totals for:
38+
From Congressional Budget Office projections:
2939
\begin{itemize}
30-
\item Income tax
31-
\item SNAP
32-
\item Social Security
33-
\item SSI
40+
\item Income tax revenue
41+
\item SNAP benefit payments
42+
\item Social Security benefit payments
43+
\item SSI payments
3444
\item Unemployment compensation
3545
\end{itemize}
3646

37-
\subsubsection{Treasury EITC Statistics}
47+
\subsection{EITC Statistics}
48+
49+
From Treasury data:
3850
\begin{itemize}
39-
\item EITC claims and amounts by number of qualifying children
51+
\item Number of returns claiming EITC by number of qualifying children
52+
\item Total EITC amounts by number of qualifying children
4053
\end{itemize}
4154

42-
% TODO: Add specific quantitative results from running the enhanced dataset against these targets
55+
\subsection{Other Targets}
56+
57+
From various government sources:
58+
\begin{itemize}
59+
\item Healthcare spending by age group and type
60+
\item Child support payments
61+
\item Housing costs and subsidies
62+
\item Market income losses
63+
\end{itemize}
4364

44-
\subsection{Computational Performance}
65+
The reweighting procedure minimizes the relative squared error between weighted sums of these variables and their administrative targets.
4566

46-
% TODO: Add measured runtime statistics from production enhancement runs
67+
% TODO: Add specific quantitative results showing:
68+
% - Initial deviations from targets in base CPS
69+
% - Final deviations after enhancement
70+
% - Distribution of weights between original and synthetic records

0 commit comments

Comments
 (0)