Data Science

There are two types of predictions in data science. Regression์€ numerical data๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•˜๊ณ , Classification ์€ cateogorical data๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.  ์˜ˆ๋ฅผ ๋“ค์–ด, ์šฐ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ์ด๋ฉ”์ผ์˜ ์ŠคํŒธ ๋ฉ”์ผํ•จ์ด ์žˆ์Šต๋‹ˆ๋‹ค.๋ฉ”์ผ์˜ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ŠคํŒธ์ธ์ง€ ์•„๋‹Œ์ง€, Yes or No ์— ํ•ด๋‹นํ•˜๋Š” Cateogorical variable์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. Input = Text / Output = Yes or No (Spam, Not Spam) Classification์— ๋Œ€ํ•ด์„œ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ธฐ ์ด์ „, ๊ฐ„๋‹จํ•˜๊ฒŒ Machine Learning์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.  Machine Learning Algorit..
In a simple linear regression line (LMS), the regression line can be expressed as following equation:y = ax + bwherey = The variable that you want to predict (์˜ˆ์ธกํ•˜๊ณ  ์‹ถ์€ ๊ฐ’) | Dependent variable (์ข…์† ๋ณ€์ˆ˜)x = The variable that you are using to predict (์˜ˆ์ธก์— ์‚ฌ์šฉํ•˜๋Š” ๊ฐ’) | Independent variable (๋…๋ฆฝ ๋ณ€์ˆ˜)a = Slope (๊ธฐ์šธ๊ธฐ)b = y-intercept (y ์ ˆํŽธ) ๊ทธ๋ ‡๋‹ค๋ฉด, y = ax+b ์—์„œ slope(a)์™€ y-intercept(b)๋Š” ์–ด๋–ป๊ฒŒ ๊ตฌํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Recall, r ..
When there are two numerical variables, there are  TrendPositive associationNegative association PatternAny discernible "shape" in the scatterLinear Non-linear Visualize, then quantify  The Correlation Coefficient rMeasures linear association. It is based on the standard units. r is defined as:The average of product of (x in standard units) and (y in standard units) ํ‘œ์ค€ ๋‹จ์œ„ x์™€ ํ‘œ์ค€ ๋‹จ์œ„ y์˜ ๊ณฑ์˜ ํ‰๊ท   In P..
Confidence Interval is the interval of estimates of a parameter.It's based on random sampling. ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์€ 95% Confidence Interval ์ž…๋‹ˆ๋‹ค. Here, '95%' is called the confidence level.it could be any percent between 0 - 100.Higher confidence level means wider intervals.  Confidence interval can be considered 'Good' if it contains the parameter.The confidence is in the process that creates the inter..
Chan Lee
'Data Science' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก