The main goal of data science is learning about the world from data using computational methods.
There are 3 key parts of data science.
- Exploration
Identifying patterns in data
Uses visulizations
- Inference
Quantifying whether those patterns are reliable
Uses randomization
- Prediction
Making informed guesses about unobserved data
Uses machine learning
There are two important concepts about the relationship of data.
1. Association (상관관계)
Association means any relation or link between two data.
ex) People who drink coffee regularly have a higher chance of developing lung cancer than those who do not.
2. Causality (인과관계)
When one variable (data) causes an effect on another variable (data).
It is relatively harder to answer questions about causality.
If there is any systematic differences other than the treatment, than it might be difficult to identify causality.
When such differences lead researchers astray, they are called confounding factors.
ex) There was a association between coffee drinking and smoking cigarettes.
Often, randomizing sample can help preventing confounding factors, by increasing the similarity between the control group and treatment group. = Randomized Controlled Experiment
'Data Science > 개념과 용어' 카테고리의 다른 글
A/B Testing (0) | 2024.10.29 |
---|---|
The Bootstrap Technique | 부트스트랩 (2) | 2024.10.23 |
Linear Regression (선형 회귀) - 8 | Assumptions (선형 회귀 가정) (1) | 2024.06.12 |
Linear Regression (선형 회귀) - 7 | Adjusted R-Squared (수정된 결정 계수) (0) | 2024.06.12 |
Linear Regression (선형 회귀) - 6 | R-Squared (결정 계수) (1) | 2024.06.03 |