| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 | 25 | 26 | 27 |
| 28 | 29 | 30 | 31 |
- 문자열
- const
- 티스토리챌린지
- raw data
- baekjoon
- 백준
- Deep Learning
- 파이썬
- 오블완
- pass by reference
- array
- Class
- C++
- Pre-processing
- Object Oriented Programming
- string
- assignment operator
- pointer
- OOP
- Python
- 배열
- predictive analysis
- 알고리즘
- programming
- 포인터
- function
- 함수
- Data Science
- vscode
- 반복문
- Today
- Total
Channi Studies
Data Science case study - Systematic Racial Discrimination in Tax Audit 본문
Data Science case study - Systematic Racial Discrimination in Tax Audit
Chan Lee 2025. 8. 30. 07:52Let's go through a case study of data science to understand what is it about.
Black Americans Are Much More Likely to Face Tax Audits, Study Finds (Published 2023)
A new report documents systemic discrimination in how the I.R.S. selects taxpayers to be audited, with implications for a debate on the agency’s funding.
www.nytimes.com
New York times reported said that the IRS seems to be exhibiting systemic discrimination regarding tax audit (세무 조사) somehow. (Jan, 2023)
However, the mystery is that tax payers never report their race to IRS in any way. Race is not reported anywhere in tax return.
Then how can this happen?
Initial Research Question
What are audit rates of Black and non-Black taxpayers?
X% of tax returns of Black taxpayers were audited.
Y% of tax returns of non-Black taxpayers were audited.
Q. Human Context: What does this comparison matter? What is implied by X ≠ Y?
Q. Core Component of DS: Calculations across groups. (simple for this example, but require multiple analyses for larger data)
Q. How do we know who was audited?
A. Data obtained from IRS. Obtained from partnership between researchers and the IRS.
How do we determine the race of each taxpayer? We can't.
Could we make an informed prediction of taxpayer's race?

There is a difference in preference in their baby names in each race. Also, there is a difference in demographics for differenct cities. (more than 30% in SF, less than 20% in NY)
Drawing on external datasets, researchers estimated the probability that a taxpayer with a particular location + name identifies as Black.

Then the result graph showing the audit rate against estimated probability of identifying as Black shows a strong pattern.
Taxpayers with a higher estimated probability of identifying as Black were more likely to be audited.

Now, plotting it against reported income and divide into two groups–identifying as Black and Non-Black–shows stark difference.

Remembering IRS doesn't collect racial data, what are some possible reasons for this disparity?
Neighborhood? Address? Geography?
Identifying a Potential Root Cause

Taxpayers goes through a "black-box" algorithm that predicts the likelyhood that somebody has an error in their tax returns.
This algorithm seems to have prioritized catching errors in claimed tax credits over catching errors that, if addressed, would recover the most money.
Black taxpayers were more likely to file the kinds of returns targeted by the algorithm. Thus, audit rates of Black taxpayers were higher.
9 months after the original report was released, I.R.S. started to work on this to minimize the racial discrimination in tax audits.
이러한 시스템적 인종 차별은 회계 감사의 대상 중 많은 양을 차지하는 것이 Earned Income Tax Credit (EITC) 신청이고, EITC는 중저소득층 계열에서 가장 큰 혜택을 볼 수 있기 때문이다. 중저소득층 (~$20,000)에 타 구간과 비교하여 확연히 많은 EITC 신청이 존재하며, 심지어 최대 혜택을 받을 수 있는 구간이기에 더욱 감사 비율이 높은 것으로 추정된다.
이것이 흑인 대상 감사 비율이 높은 이유를 설명하는 이유는, 미국 사회의 인종별 소득 수준에는 인종간 차이가 존재하기 때문이다. 흑인의 평균 소득 수준이 전체 인종 그룹 중에서도 가장 낮기 때문에, EITC 신청의 비율 또한 가장 많고, 이것이 시스템적 인종 차별로 이어졌다는 추정이다.
'Data Science > 개념과 용어' 카테고리의 다른 글
| Classification (Data Science), k-Nearest Neighbor Classifier (KNN) | 분류 (1) | 2024.11.29 |
|---|---|
| Slope and Y-intercept of The Regression Line | 회귀선의 기울기와 y절편 (0) | 2024.11.25 |
| Trend, Pattern, and The Correlation Coefficient (r) (1) | 2024.11.15 |
| How to Interpret Confidence Interval | 신뢰 구간의 해석 (2) | 2024.11.01 |
| A/B Testing (0) | 2024.10.29 |