Chan Lee 2024. 10. 29. 05:43

A/B testing is a type of experiment in Data Science that compare values of sampled individuals in Group A with values of sampled individuals in Group B.

Q. Do the two sets of values come from the same underlying distribution? 

 

์˜ˆ๋ฅผ ๋“ค์–ด, ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ A ์ง€์—ญ์—์„œ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์ธก์ •ํ•œ ํ‰๊ท  ์‹ ์žฅ์ด 165cm, B ์ง€์—ญ์—์„œ๋Š” 170cm๋ผ๊ณ  ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (observed statistic)

์—ฌ๊ธฐ์„œ, A์™€ B ์ง€์—ญ์˜ ํ‰๊ท  ์‹ ์žฅ ์ฐจ์ด๊ฐ€ same underlying distribution (๋Œ€ํ•œ๋ฏผ๊ตญ ์ „์ฒด ์‹ ์žฅ ๋ถ„ํฌ) ์—์„œ ๋น„๋กฏ๋œ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ์ด ๋˜๋Š”์ง€๋ฅผ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

If the result is not statistically significant (i.e. less than p-value cutoff), then we reject the null hypothesis that the average height of A is equal to B. 

์ฆ‰, A์™€ B์—์„œ ๊ด€์ธก๋œ ํ‰๊ท  ์‹ ์žฅ์˜ ์ฐจ์ด๊ฐ€ ํ™•๋ฅ ์ ์œผ๋กœ ๊ด€์ธก๋  ์ˆ˜ ์žˆ๋Š”์ง€ (์šฐ์—ฐ์œผ๋กœ ๊ด€์ธก๋œ ๊ฒƒ์ธ์ง€), ์•„๋‹ˆ๋ฉด ์‚ฌ์‹ค์ƒ ํ™•๋ฅ ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ์ง€ (p-value ๊ฐ€ cutoff๋ณด๋‹ค ๋‚ฎ์€์ง€) ๋ฅผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

Sometimes, random permutation technique is applied on A/B testing. 

์šฐ๋ฆฌ๋Š” Categorical Value์˜ ์ฐจ์ด๊ฐ€ ์›ํ•˜๋Š” test statistic์— ์˜ํ–ฅ์„ ์ฃผ๋Š”์ง€๋ฅผ ํ™•์ธํ•˜๊ณ  ์‹ถ์„ ๋•Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด, ์ž„์‹  ๊ธฐ๊ฐ„ ์ค‘ ์–ด๋จธ๋‹ˆ์˜ ํก์—ฐ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ์‹ ์ƒ์•„์˜ ์‹ ์žฅ ์ฐจ์ด๋ฅผ ํ™•์ธํ•˜๋ ค๊ณ  ํ•˜๋Š” ์ƒํ™ฉ์„ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. 

ํก์—ฐ ์—ฌ๋ถ€ (True or False) ์— ๋”ฐ๋ฅธ ์‹ ์žฅ์˜ ํ‰๊ท ์„ ๊ตฌํ•˜๊ณ , ๋‘ ๊ฐ’ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜์—ฌ ์ด๋ฅผ test statistic์œผ๋กœ ์‚ผ๊ฒ ์Šต๋‹ˆ๋‹ค. 

 

๋งŒ์•ฝ, ํก์—ฐ ์—ฌ๋ถ€๊ฐ€ ์‹ ์ƒ์•„์˜ ์‹ ์žฅ์— ์˜ํ–ฅ์ด ์—†๋‹ค๋ฉด, ํก์—ฐ ์—ฌ๋ถ€๋Š” ์‹ ์ƒ์•„์˜ ์‹ ์žฅ์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ์š”์†Œ๊ฐ€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— True์ธ ์ง‘๋‹จ๊ณผ Falsed์ธ ์ง‘๋‹จ์˜ ํ‰๊ท  ์‹ ์žฅ ๋ถ„ํฌ์— ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•˜๋ฉด ์•ˆ๋ฉ๋‹ˆ๋‹ค. 

์ด๋Ÿด ๋•Œ, ์šฐ๋ฆฌ๋Š” categorical label, ์ด ๊ฒฝ์šฐ์—๋Š” ํก์—ฐ ์—ฌ๋ถ€์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’๋“ค์„ ๋žœ๋คํ•˜๊ฒŒ ์„ž์–ด์ค๋‹ˆ๋‹ค (without replacement) 

์ด๋ฅผ ์šฐ๋ฆฌ๊ฐ€ random permutation ์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

 

์–ด์ฐจํ”ผ ํ•ด๋‹น ๋ ˆ์ด๋ธ”์€ ์˜ํ–ฅ์ด ์กด์žฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, permutation์„ ํ•˜๋“  ์•ˆํ•˜๋“  test statistic์€ ๋™์ผ (์œ ์‚ฌ)ํ•˜๊ฒŒ ์ธก์ •๋˜์–ด์•ผ ํ•˜๊ฒ ์ฃ ? 

์ฆ‰, random permutation์„ ๋‹คํšŒ ๋ฐ˜๋ณตํ•˜์—ฌ ์ธก์ •๋œ distribution์—์„œ ์ดˆ๊ธฐ observed statistic์˜ ๊ฐ’์˜ p-value๋ฅผ ํ™•์ธํ•จ์œผ๋กœ์จ statistical signifcance์— ๋Œ€ํ•œ ํŒ๋‹จ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค. 

 

 

๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ UC Berkeley Data 8 Course์˜ ๋ฌด๋ฃŒ ์ œ๊ณต๋˜๋Š” ์ธํ„ฐ๋„ท ์ž๋ฃŒ์—์„œ ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

https://inferentialthinking.com/chapters/12/1/AB_Testing.html