Data Science/๊ฐœ๋…๊ณผ ์šฉ์–ด

A/B testing is a type of experiment in Data Science that compare values of sampled individuals in Group A with values of sampled individuals in Group B.Q. Do the two sets of values come from the same underlying distribution?  ์˜ˆ๋ฅผ ๋“ค์–ด, ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ A ์ง€์—ญ์—์„œ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์ธก์ •ํ•œ ํ‰๊ท  ์‹ ์žฅ์ด 165cm, B ์ง€์—ญ์—์„œ๋Š” 170cm๋ผ๊ณ  ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (observed statistic)์—ฌ๊ธฐ์„œ, A์™€ B ์ง€์—ญ์˜ ํ‰๊ท  ์‹ ์žฅ ์ฐจ์ด๊ฐ€ same underlying distribution (๋Œ€ํ•œ๋ฏผ๊ตญ ์ „์ฒด ์‹ ์žฅ ๋ถ„ํฌ) ์—์„œ ๋น„๋กฏ๋œ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ..
Data science์—์„œ๋Š” population์˜ unknown parameter๋ฅผ estimate ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ผ ๋•Œ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ „ ๊ตญ๋ฏผ์˜ ์†Œ๋“์„ estimate ํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ค‘์œ„ ์†Œ๋“์„ ๊ตฌํ•ด์„œ ์ด๋ฅผ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ํ•œ๋‹ค๊ณ  ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 1. If you have a census: Just calculate the parameter from the census, and you're done. Population ๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„ ์™„๋ฃŒ ๋˜์—ˆ๋‹ค๋ฉด, ๋ฐ”๋กœ ๊ณ„์‚ฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฐ ๊ฒฝ์šฐ๊ฐ€ ๋‹น์—ฐํžˆ ํ”ํ•˜์ง€ ์•Š๊ฒ ์ฃ ?  2. If you don't have a census: Take a random sample from the population. Usa a statistic as..
The main goal of data science is learning about the world from data using computational methods.There are 3 key parts of data science.  - ExplorationIdentifying patterns in dataUses visulizations - InferenceQuantifying whether those patterns are reliableUses randomization - PredictionMaking informed guesses about unobserved dataUses machine learning  There are two important concepts about the re..
์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡๊ฐ€์ง€์˜ ํ•ต์‹ฌ ๊ฐ€์ •๋“ค์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.์ด ๊ฐ€์ •๋“ค์ด ์‚ฌ์‹ค์ด ์•„๋‹ˆ๋ผ๋ฉด, ์ตœ์†Œ์ œ๊ณฑ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋””์ž์ธ ํ–ˆ์„ ๋•Œ ๋ฌด์˜๋ฏธํ•˜๊ณ  ๋ถ€์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๊ฐ’์ด ๋„์ถœ๋  ๊ฒƒ์ด๋ฏ€๋กœ ์ด ์ ๋“ค์„ ์œ ์˜ํ•˜๋Š”๊ฒŒ ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค. 1. Linearity (์„ ํ˜•์„ฑ)์ด๋ฆ„๋ถ€ํ„ฐ๊ฐ€ ์„ ํ˜• ํšŒ๊ท€์ž–์•„์š”? ๊ฐ ๋…๋ฆฝ ๋ณ€์ˆ˜๋Š” ๊ณ ์œ ํ•œ ๊ณ„์ˆ˜๊ฐ€ ๊ณฑํ•ด์ง€๊ณ , ์ด๋ฅผ ๋‹ค ํ•ฉํ•ด์„œ ์ข…์†๋ณ€์ˆ˜๋ฅผ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค. ์„ ํ˜•์„ฑ์„ ํŒ๋‹จํ•˜๋Š” ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ์š”? ๋…๋ฆฝ ๋ณ€์ˆ˜ ์ค‘ ํ•˜๋‚˜(x1)๋ฅผ ๋ฝ‘์•„์„œ ์ข…์† ๋ณ€์ˆ˜(y)์— ๋Œ€ํ•ด์„œ scatter plot์„ ๊ทธ๋ ค๋ณด์„ธ์š”. ๊ทธ๋Ÿผ  ์–ผ์ถ” ๋ฐฉํ–ฅ์„ฑ์ด ๋ณด์ผํ…๋ฐ, ์ด๊ฒŒ ์ผ์ฐจํ•จ์ˆ˜๋ฉด ์„ ํ˜•์„ฑ์ด ์žˆ๋Š” ๊ฒƒ์ด๊ณ , ๊ณก์„ ์ด ๋ณด์ด๋ฉด ์„ ํ˜•์„ฑ์ด ๋ถ€์กฑํ•œ ๋ฐ์ดํ„ฐ๊ฒ ์ฃ ?๊ทธ๋ฆฌ๊ณ  ๊ทธ๋Ÿฐ ๊ฒฝ์šฐ์—๋Š” ์„ ํ˜• ํšŒ๊ท€๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋””์ž..
Chan Lee
'Data Science/๊ฐœ๋…๊ณผ ์šฉ์–ด' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (2 Page)