You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/sections/data.tex
+8-10Lines changed: 8 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -2,28 +2,27 @@ \section{Data}\label{sec:data}
2
2
3
3
\subsection{Current Population Survey}
4
4
5
-
The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) provides comprehensive demographic and economic information for a nationally representative sample of U.S. households. For tax year 2024, our base dataset contains approximately 150,000 households representing the U.S. civilian non-institutional population.
5
+
The Census Bureau administers the Current Population Survey Annual Social and Economic Supplement (CPS ASEC, or hereafter the CPS) each March. In March 2024, they surveyed 89,473 households representing the U.S. civilian non-institutional population about their activities in the 2023 calendar year.
6
6
7
7
The CPS's key strengths include:
8
8
\begin{itemize}
9
9
\item Rich demographic detail including age, sex, race, ethnicity, and education
10
10
\item Complete household relationship matrices
11
11
\item Program participation indicators
12
-
\item State and sub-state geographic identifiers
13
-
\item Monthly employment and labor force status
12
+
\item State identifiers, and partial county identifiers
14
13
\end{itemize}
15
14
16
15
However, the CPS has known limitations for tax modeling:
17
16
\begin{itemize}
18
-
\item Underreporting of income, particularly at the top of the distribution
17
+
\item Underreporting of income, particularly at the top of the distribution due to top-coding
19
18
\item Limited tax-relevant information (e.g., itemized deductions)
20
19
\item No direct observation of tax units within households
21
20
\item Imprecise measurement of certain income types (e.g., capital gains)
22
21
\end{itemize}
23
22
24
23
\subsection{IRS Public Use File}
25
24
26
-
The Internal Revenue Service Public Use File (PUF) is a national sample of individual income tax returns, representing the 151.2 million Form 1040, Form 1040A, and Form 1040EZ Federal Individual Income Tax Returns filed for Tax Year 2015. The file contains 119,675 records sampled at varying rates across strata, with 0.07 percent sampling for strata 7 through 13 \cite{bryant2022}. The data are extensively transformed to protect taxpayer privacy while preserving statistical properties.
25
+
The Internal Revenue Service Public Use File (PUF) is a national sample of individual income tax returns, representing the 151.2 million Form 1040, Form 1040A, and Form 1040EZ Federal Individual Income Tax Returns filed for Tax Year 2015. The file contains 119,675 records sampled at varying rates across strata, with 0.07 percent sampling for strata 7 through 13 \cite{bryant2023b}. The data are extensively transformed to protect taxpayer privacy while preserving statistical properties.
27
26
28
27
The Public Use Tax Demographic File supplements the PUF with:
29
28
\begin{itemize}
@@ -53,14 +52,15 @@ \subsection{IRS Public Use File}
53
52
\begin{itemize}
54
53
\item Limited demographic information
55
54
\item No household structure beyond the tax unit
56
-
\itemGeographic detail limited to state
55
+
\itemNo geographic information such as state
57
56
\item No program participation information
58
57
\item Privacy protections that mask extreme values
58
+
\item Lag; the latest version as of November 2024 is for the 2015 tax year
59
59
\end{itemize}
60
60
61
61
\subsection{External Validation Sources}
62
62
63
-
We validate our enhanced dataset against several external sources:
63
+
We validate our enhanced dataset against 570 targets from several external sources:
64
64
65
65
\subsubsection{IRS Statistics of Income}
66
66
@@ -81,12 +81,11 @@ \subsubsection{CPS ASEC Public Tables}
81
81
\item Age distribution by state
82
82
\item Household size distribution
83
83
\item Program participation rates
84
-
\item Employment status
85
84
\end{itemize}
86
85
87
86
\subsubsection{Administrative Program Totals}
88
87
89
-
We incorporate official totals from various agencies:
88
+
We incorporate official totals from various agencies, including but not limited to:
90
89
\begin{itemize}
91
90
\item Social Security Administration beneficiary counts and benefit amounts
A crucial preparatory step is harmonizing variables across datasets. We develop a detailed crosswalk between CPS and PUF variables, accounting for definitional differences. Key considerations include:
100
99
\begin{itemize}
101
-
\item Income timing (calendar year vs. tax year)
102
100
\item Income classification (e.g., business vs. wage income)
Copy file name to clipboardExpand all lines: paper/sections/methodology/overview.tex
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
\section{Methodology}\label{sec:methodology}
2
2
3
-
Following \cite{bryant2022}, our procedure enhances the Current Population Survey (CPS) with tax information from the Public Use File (PUF) through four key steps:
3
+
Following \cite{bryant2023a}, our procedure enhances the Current Population Survey (CPS) with tax information from the Public Use File (PUF) through four key steps:
4
4
\begin{enumerate}
5
5
\item Project both CPS and PUF data to the target year
6
6
\item Transfer tax variable distributions from PUF to CPS records
@@ -26,7 +26,7 @@ \subsection{Data Projection}
26
26
27
27
\subsection{Demographic Variable Construction}
28
28
29
-
Following \cite{bryant2022}, we construct several key demographic variables:
29
+
Following \cite{bryant2023b}, we construct several key demographic variables:
30
30
31
31
\subsubsection{Dependent Ages}
32
32
We create three dependent age variables (AGEDP1/2/3) capturing:
0 commit comments