Data Science/개념과 용어

What is Data Science?

Chan Lee 2024. 9. 9. 13:48

The main goal of data science is learning about the world from data using computational methods.

There are 3 key parts of data science. 

 

- Exploration

Identifying patterns in data

Uses visulizations

 

- Inference

Quantifying whether those patterns are reliable

Uses randomization

 

- Prediction

Making informed guesses about unobserved data

Uses machine learning

 

 

There are two important concepts about the relationship of data.

1. Association (상관관계)

Association means any relation or link between two data.

ex) People who drink coffee regularly have a higher chance of developing lung cancer than those who do not. 

 

2. Causality (인과관계)

When one variable (data) causes an effect on another variable (data). 

It is relatively harder to answer questions about causality.

 

If there is any systematic differences other than the treatment, than it might be difficult to identify causality. 

When such differences lead researchers astray, they are called confounding factors.

ex) There was a association between coffee drinking and smoking cigarettes. 

 

Often, randomizing sample can help preventing confounding factors, by increasing the similarity between the control group and treatment group. = Randomized Controlled Experiment