When there are two numerical variables, there are TrendPositive associationNegative association PatternAny discernible "shape" in the scatterLinear Non-linear Visualize, then quantify The Correlation Coefficient rMeasures linear association. It is based on the standard units. r is defined as:The average of product of (x in standard units) and (y in standard units) ํ์ค ๋จ์ x์ ํ์ค ๋จ์ y์ ๊ณฑ์ ํ๊ท In P..
์ ์ฒด ๊ธ
Python, C++, Data Science ๊ณต๋ถ ๋ธ๋ก๊ทธ ์ ๋๋ค.Confidence Interval is the interval of estimates of a parameter.It's based on random sampling. ๊ฐ์ฅ ๋๋ฆฌ ์ฌ์ฉ๋๋ ๊ฒ์ 95% Confidence Interval ์
๋๋ค. Here, '95%' is called the confidence level.it could be any percent between 0 - 100.Higher confidence level means wider intervals. Confidence interval can be considered 'Good' if it contains the parameter.The confidence is in the process that creates the inter..
A/B testing is a type of experiment in Data Science that compare values of sampled individuals in Group A with values of sampled individuals in Group B.Q. Do the two sets of values come from the same underlying distribution? ์๋ฅผ ๋ค์ด, ๋ํ๋ฏผ๊ตญ์ A ์ง์ญ์์ ์ํ๋ง์ ํตํด ์ธก์ ํ ํ๊ท ์ ์ฅ์ด 165cm, B ์ง์ญ์์๋ 170cm๋ผ๊ณ ํด๋ณด๊ฒ ์ต๋๋ค. (observed statistic)์ฌ๊ธฐ์, A์ B ์ง์ญ์ ํ๊ท ์ ์ฅ ์ฐจ์ด๊ฐ same underlying distribution (๋ํ๋ฏผ๊ตญ ์ ์ฒด ์ ์ฅ ๋ถํฌ) ์์ ๋น๋กฏ๋ ๊ฒ์ผ๋ก ํ๋จ..
Data science์์๋ population์ unknown parameter๋ฅผ estimate ํ๋ ๊ฒ์ด ๋ชฉํ์ผ ๋๊ฐ ๋ง์ต๋๋ค. ์๋ฅผ ๋ค์ด ์ ๊ตญ๋ฏผ์ ์๋์ estimate ํ๊ณ ์ถ๋ค๊ณ ํด๋ณด๊ฒ ์ต๋๋ค. ์ค์ ์๋์ ๊ตฌํด์ ์ด๋ฅผ ์งํ๋ก ์ฌ์ฉํ๋ ค๊ณ ํ๋ค๊ณ ํ๊ฒ ์ต๋๋ค. 1. If you have a census: Just calculate the parameter from the census, and you're done. Population ๋ฐ์ดํฐ๊ฐ ์ค๋น ์๋ฃ ๋์๋ค๋ฉด, ๋ฐ๋ก ๊ณ์ฐ๋ง ํ๋ฉด ๋ฉ๋๋ค. ํ์ง๋ง, ์ด๋ฐ ๊ฒฝ์ฐ๊ฐ ๋น์ฐํ ํํ์ง ์๊ฒ ์ฃ ? 2. If you don't have a census: Take a random sample from the population. Usa a statistic as..