์ „์ฒด ๊ธ€

Python, C++, Data Science ๊ณต๋ถ€ ๋ธ”๋กœ๊ทธ ์ž…๋‹ˆ๋‹ค.
ยทData Science/Python
์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ํŒŒ์ด์ฌ์„ ์‚ฌ์šฉํ•ด์„œ ๊ธฐ์ดˆ์  ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋“ค์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ค. ์„ ํ˜• ํšŒ๊ท€์˜ ๋งค์šฐ ๊ธฐ์ดˆ์  ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ ํฌ์ŠคํŠธ์— ์ •๋ฆฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. Regression Analysis - Linear Regression | ํšŒ๊ท€ ๋ถ„์„ - ์„ ํ˜• ํšŒ๊ท€Linear Regression(์„ ํ˜• ํšŒ๊ท€): 2๊ฐœ ์ด์ƒ์˜ ๋ณ€์ˆ˜๋“ค ์‚ฌ์ด์—์„œ์˜ ์ธ๊ณผ ๊ด€๊ณ„์— ๋Œ€ํ•œ ์„ ํ˜• ๊ทผ์‚ฌ (์˜ˆ์ธก)A linear approximation of a causal relationship between two or more variables. ์„ ํ˜• ํšŒ๊ท€์˜ ๊ณผ์ •1. Sample data๋ฅผ ์ˆ˜์ง‘ํ•œ๋‹ค.2code-studies.tistory.com ํŒจํ‚ค์ง€ ์„ค์น˜ ๋ฐ import์šฐ์„  ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ๋””์ž์ธํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ์ฃผ์š” ํŒŒ์ด์ฌ..
Correlation Analysis(์ƒ๊ด€ ๋ถ„์„)๊ณผ  Regression Analysis(ํšŒ๊ท€ ๋ถ„์„)๊ฐ„์˜ ์ฐจ์ด๋Š” ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. "Correlation does not imply causation"์ƒ๊ด€ ๊ด€๊ณ„๋Š” ์ธ๊ณผ ๊ด€๊ณ„๋ฅผ ์˜๋ฏธํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋” ์ž์„ธํ•˜๊ฒŒ ์„ค๋ช…ํ•˜์ž๋ฉด,1. Correlation์€ ๋‘ ๋ณ€์ˆ˜(variable) ์‚ฌ์ด์˜ relationship์˜ ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ˜๋ฉด, Regression์€ ํŠน์ • ๋ณ€์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋ณ€์ˆ˜์— ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ๋ผ์น˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.2. Correlation์€ ๋‘ ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ์ธ๊ณผ ๊ด€๊ณ„๋ฅผ ์ธก์ •ํ•˜๋Š”๊ฒƒ์ด ์•„๋‹Œ, ๊ด€๊ณ„์„ฑ์˜ ์ •๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค (move together). ๋ฐ˜๋ฉด Regression์€ ๋‘ ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์˜ ์ •๋„๊ฐ€ ์•„๋‹Œ ์ธ๊ณผ ๊ด€๊ณ„๋ฅผ ์ง์ ‘ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค (cause..
Linear Regression(์„ ํ˜• ํšŒ๊ท€): 2๊ฐœ ์ด์ƒ์˜ ๋ณ€์ˆ˜๋“ค ์‚ฌ์ด์—์„œ์˜ ์ธ๊ณผ ๊ด€๊ณ„์— ๋Œ€ํ•œ ์„ ํ˜• ๊ทผ์‚ฌ (์˜ˆ์ธก)A linear approximation of a causal relationship between two or more variables. ์„ ํ˜• ํšŒ๊ท€์˜ ๊ณผ์ •1. Sample data๋ฅผ ์ˆ˜์ง‘ํ•œ๋‹ค.2. ํ•ด๋‹น ์ƒ˜ํ”Œ์— ๋งž๋Š” ๋ชจ๋ธ์„ ๋””์ž์ธ ํ•œ๋‹ค.3. ํ•ด๋‹น ๋ชจ๋ธ์œผ๋กœ ์ „์ฒด population์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ํ•œ๋‹ค. ์„ ํ˜• ํšŒ๊ท€์—์„œ ์˜ˆ์ธกํ•˜๋Š” ๊ฐ’ y ๋Š” dependent variable(์ข…์† ๋ณ€์ˆ˜) ์ด๊ณ , (x1, x2, ..., xk)๋Š” independent variable(๋…๋ฆฝ ๋ณ€์ˆ˜) ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.y ๋Š” x๋“ค์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. => y = f(x1, x2, ..., xk)  ์šฐ์„  Simple L..
Binomial Distribution์€ Bernoulli Distribution with mutliple trials๋กœ ์ดํ•ดํ•˜๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.For a random variable X, ์ด๋ฒคํŠธ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‘๊ฐœ์˜ ์˜ต์…˜๋ฐ–์— ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ์ด๋ฅผ ์šฐ๋ฆฌ๋Š” ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.ํ™•๋ฅ  p์— ๋Œ€ํ•˜์—ฌ X~Bern(p)๋กœ ํ‘œ๊ธฐํ•˜๊ณ , ์ด๋Š” X~B(1,p)์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ์— ๋Œ€ํ•ด์„œ ์กฐ๊ธˆ๋งŒ ๋” ์•Œ์•„๋ณด์ž๋ฉด,E(x) = 1*p + 0*(1-p) = p Variance = p(1-p)STDEV = sqrt(p(1-p))์ž…๋‹ˆ๋‹ค. ๊ด€๋ก€์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ๋‘๊ฐœ์˜ ๊ฒฐ๊ณผ ์ค‘ ๋”์šฑ ํ™•๋ฅ ์ด ๋†’์€ ๊ฒฐ๊ณผ๋ฅผ p๋กœ, ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒƒ์„ 1-p, ํ˜น์€ q๋กœ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ๋ฅผ ์ ์šฉํ•˜๊ณ  ์‹ถ์€ ์ƒํ™ฉ์—, ๊ฐ ์ด๋ฒคํŠธ์— 1๊ณผ 0..
Chan Lee
Chan Code & DS ๐Ÿง‘‍๐Ÿ’ป๐Ÿ“Š