In a simple linear regression line (LMS), the regression line can be expressed as following equation:
y = ax + b
where
- y = The variable that you want to predict (์์ธกํ๊ณ ์ถ์ ๊ฐ) | Dependent variable (์ข ์ ๋ณ์)
- x = The variable that you are using to predict (์์ธก์ ์ฌ์ฉํ๋ ๊ฐ) | Independent variable (๋ ๋ฆฝ ๋ณ์)
- a = Slope (๊ธฐ์ธ๊ธฐ)
- b = y-intercept (y ์ ํธ)
๊ทธ๋ ๋ค๋ฉด, y = ax+b ์์ slope(a)์ y-intercept(b)๋ ์ด๋ป๊ฒ ๊ตฌํ๋์ง ์์๋ณด๊ฒ ์ต๋๋ค.
Recall,
r (correlation coefficient) = Average of product of (x in standard units) and (y in standard units)
or in python <datascience>:
Then, the slope of the best fit line is:
Slope (a) = r * (standard deviation of y) / (standard deviation of x)
That is,
a = r * SD_y / SD_x (*)
where
- r = correlation coefficient
- SD_y = standard deviation of y
- SD_x = standard deviation of x
And, the y-intercept of the regression line is:
y-intercept (b) = (average of y) - (slope)(average of x)
That is,
b = y_average - a * x_average
where
- y_average = average of y
- x_average = average of x
- a = slope = r * SD_y / SD_x (*)
In summary, the equations for slope and y-intercept are in the following image:
'Data Science > ๊ฐ๋ ๊ณผ ์ฉ์ด' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
Classification (Data Science), k-Nearest Neighbor Classifier (KNN) | ๋ถ๋ฅ (1) | 2024.11.29 |
---|---|
Trend, Pattern, and The Correlation Coefficient (r) (1) | 2024.11.15 |
How to Interpret Confidence Interval | ์ ๋ขฐ ๊ตฌ๊ฐ์ ํด์ (2) | 2024.11.01 |
A/B Testing (0) | 2024.10.29 |
The Bootstrap Technique | ๋ถํธ์คํธ๋ฉ (2) | 2024.10.23 |