Computer Science/Data Science

Naive Bayes Classifier

_cactus 2021. 3. 8. 00:02
๋ฐ˜์‘ํ˜•

๊ฐœ์š”

  • ๋‹จ์ˆœ๊ทœ์น™๋ชจํ˜•: ์˜ˆ์ธก๋ณ€์ˆ˜๊ฐ€ ํ•„์š” ์—†๋Š” ๋ชจํ˜•, ์ฃผ๋กœ ๊ณ ๊ธ‰ ๋ชจํ˜•๋“ค๊ณผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•œ baseline

  • ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๋ชจํ˜•

     => ์ด ๊ธฐ๋ฒ•๋“ค์€ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ๊ฐ€์ •์„ ๊ฑฐ์˜ ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ณตํ†ต์ ! (data-driven)

     (makes no assumption about the data)

 

๋‹จ์ˆœ๊ทœ์น™

  • ๋ชจ๋“  ์˜ˆ์ธก๋ณ€์ˆ˜๋ฅผ ๋ถ„๋ฅ˜ํ•œ ์ƒ์ฑ„์—์„œ ์–ด๋Š ํ•œ record๋ฅผ m๊ฐœ์˜ ์ง‘๋‹จ ์ค‘์— ์ œ์ผ ๋งŽ์€ ํ•˜๋‚˜(prevalent class)๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋‹จ์ˆœํ•œ ๊ทœ์น™

๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๋ชจํ˜•

  • ๋‹จ์ˆœ๊ทœ์น™๋ณด๋‹ค ์ •๊ตํ•œ ๋ฐฉ๋ฒ• : ๋‹จ์ˆœ๊ทœ์น™ + ์˜ˆ์ธก๋ณ€์ˆ˜ ์ •๋ณด

  • ๋‹ค๋ฅธ ๋ถ„๋ฅ˜๋ชจํ˜•๊ณผ ๋‹ฌ๋ฆฌ naive bayes classifier๋Š” ์˜ˆ์ธก๋ณ€์ˆ˜๊ฐ€ ๋ฒ”์ฃผํ˜•์ธ ๊ฒฝ์šฐ์—๋งŒ ์ ์šฉ๋จ

    • ๋”ฐ๋ผ์„œ ์ˆ˜์น˜ํ˜• ์˜ˆ์ธก๋ณ€์ˆ˜๋Š” ๋ฒ”์ฃผํ˜• ์˜ˆ์ธก๋ณ€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ์•ผ ํ•จ

  • ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๊ธฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด ๋งค์šฐ ํด ๊ฒฝ์šฐ ์œ ์šฉ

    • ์˜ˆ) google๊ณผ ๊ฐ™์€ ์›น ๊ฒ€์ƒ‰ ์—”์ง„ ํšŒ์‚ฌ๋“ค์—์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์ฒ ์ž๋ฅผ ์ž˜๋ชป ์ž…๋ ฅํ•  ๋•Œ ์ด๋ฅผ ์ˆ˜์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๋ชจ๋ธ์„ ์‚ฌ์šฉ

  • ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  exact bayes classifier

    • ๋ถ„๋ฅ˜๋ฌธ์ œ์˜ ๋ชฉ์ : ์˜ˆ์ธก๋ณ€์ˆ˜๊ตฐ์ด ์ฃผ์–ด์งˆ ๋•Œ, ๊ฐ ์ง‘๋‹จ์— ์†Œํ•  ํ™•๋ฅ  ์ถ”์ • => ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ 

    • ์‚ฌ๊ฑด B๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ, ์‚ฌ๊ฑด A๊ฐ€ ์ผ์–ด๋‚  ํ™•๋ฅ  P(A|B)

    • ์ผ๋ฐ˜์ ์œผ๋กœ m๊ฐœ์˜ ์ง‘๋‹จ C1,C2,...Cm ์˜ ๋ฐ˜์‘๋ณ€์ˆ˜(Y)์™€ X1,X2,...,Xp์ธ ์˜ˆ์ธก๋ณ€์ˆ˜(X)์— ๋Œ€ํ•ด P(Ci|X1,X2,...,Xp) i=1,2,...,m ์„ ์ธก์ •

    • ํ•˜๋‚˜์˜ record๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ ์ง‘๋‹จ i์— ๋Œ€ํ•œ P(Ci|X1,X2,...,Xp)๋ฅผ ๊ณ„์‚ฐ -> ๊ฐ ์ง‘๋‹จ์— ์†ํ•  ๊ธฐํšŒ๋ฅผ ์ธก์ • -> ๊ทธ ํ›„ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ  ๊ฐ’์„ ๊ฐ€์ง€๋Š” ์ง‘๋‹จ์œผ๋กœ ๋ถ„๋ฅ˜

  • ์‹ค์ œ ์ ์šฉ์ƒ์˜ ์–ด๋ ค์›€

    • ์œ„์™€ ๊ฐ™์ด ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์„ ์ถ”์ •ํ•  ๊ฒฝ์šฐ, ์˜ˆ์ธก๋ณ€์ˆ˜ ์ˆ˜ p๊ฐ€ 20์ •๋„๋กœ ๋งค์šฐ ํฌ๊ณ , ์ง‘๋‹จ ์ˆ˜ m์ด 2๋ผ๋ฉด, record๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ์—๋Š” ์ž์‹ ๋“ค๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๋Š” ๊ทœ์น™๋“ค์„ ์ฐพ์„ ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ

    • relies on finding other records that share same predictor values as record-to-be-classified 

    • want to find ‘probability of belonging to class C, given specified values of predictors

    • ์˜ˆ: ํˆฌํ‘œ ์˜ˆ์ธก

      • ์•„๋ฌด๋ฆฌ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๋”๋ผ๋„, 4๋ช…์˜ ์ž๋…€๊ฐ€ ์žˆ๊ณ , ์ดํ˜ผํ–ˆ์œผ๋ฉฐ, ์ค‘์„œ๋ถ€ ์ง€์—ญ์— ์‚ด๊ณ , ๊ณ ์†Œ๋“์ธต์ด๋ฉฐ, hispanic์˜ ๋‚จ์„ฑ์— ์†ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์€ ๊ทธ๋‹ค์ง€ ๋งŽ์ง€ ์•Š๋‹ค

    • ํ•ด๊ฒฐ๋ฐฉ์•ˆ: naive bayes

  • naive bayes

    • ๊ฐ ์ง‘๋‹จ ๋‚ด์˜ ์˜ˆ์ธก๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๋…๋ฆฝ์„ฑ(independence)์˜ ๊ฐ€์ •์„ ๋‹จ์ˆœํ™”์‹œํ‚ค๊ธฐ (use multiplication rule)

    • ๋ชจ๋“  ์˜ˆ์ธก๋ณ€์ˆ˜๋“ค์ด ์ƒํ˜ธ ๋…๋ฆฝ์ ์ด๋‹ค -> ๊ณ„์‚ฐ๊ณผ์ • ๋‹จ์ˆœํ™” ๊ฐ€๋Šฅ -> ๋™์‹œ๋ฐœ์ƒ์˜ ํ™•๋ฅ  = ๋ชจ๋“  ๊ด€๋ จ ์˜ˆ์ธก๋ณ€์ˆ˜์˜ ํ•œ๊ณ„๋ณ€๋™๋ถ„์„ ์„œ๋กœ ๊ณฑํ•œ ๊ฐ’

  • bayes theorem ๋ฒ ์ด์ฆˆ ์ด๋ก 

    • ๋ฒ ์ด์ฆˆ ์ด๋ก : ์–ด๋–ค ํŠน์ • ํ›„์†์‚ฌ๊ฑด์ด ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ํ•  ๋•Œ, ์ด์ „ ์‚ฌ๊ฑด์˜ ํ™•๋ฅ ์„ ์•Œ๋ ค์คŒ

    • ์˜ˆ) ๋งŒ์•ฝ ๋ถ„์‹๋ณด๊ณ ์— ๋Œ€ํ•ด ์†Œ์†ก์ด ์ œ๊ธฐ๋œ ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด ํšŒ์‚ฌ๊ฐ€ ๋ถ„์‹ ์žฌ๋ฌด์ œํ‘œ๋ฅผ ์ œ์ถœํ•  ํ™•๋ฅ ์ด ์–ผ๋งˆ์ธ์ง€ ์•Œ ์ˆ˜ ์žˆ์Œ

    • ๋ฒ ์ด์ฆˆ ์ด๋ก ์€ record๊ฐ€ ์ง‘๋‹จ Ci์— ์†ํ•˜๋Š” ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ ๊ณต์‹ ์‚ฌ์šฉ

  

  • ์ด ๊ณต์‹์€ ์˜ˆ์ธก๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ์ง‘๋‹จ Ci์— ์†ํ•  ์‚ฌํ›„ํ™•๋ฅ (posterior probability)๋กœ ์•Œ๋ ค์ ธ ์žˆ๋‹ค
    • P(Ci|X1,X2,...,Xp) = ์‚ฌํ›„ํ™•๋ฅ  (posterior probability)
    • P(Ci) = ์‚ฌ์ „ ํ™•๋ฅ (prior probability)
  • ๋”ฐ๋ผ์„œ bayes theorem์€ record์˜ ์†์„ฑ์ด ์ฃผ์–ด์งˆ ๋•Œ ํ•ด๋‹น record๊ฐ€ ์–ด๋Š ํ•œ ์ง‘๋‹จ์— ์†ํ•˜๋Š” ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ด ๋‚ด๋Š” ๊ณต์‹์„ ์ œ๊ณต

  • ๋‹ค์Œ ๋‹จ๊ณ„:

 

 

 

  • ์˜ˆ) Financial Fraud

    • target variable: fraud / no fraud

    • predictors: prior pending legal charges (yes/no), size of firm (small/large)

      • ๊ฐ ํšŒ์‚ฌ์— ๋Œ€ํ•ด์„œ ๋ฒ•์  ์ฑ…์ž„์ด ์ œ์†Œ๋˜์—ˆ๋Š”์ง€ ์—ฌ๋ถ€, ๊ธฐ์—…์˜ ๊ทœ๋ชจ๊ฐ€ ํฐ์ง€/์ž‘์€์ง€ ์—ฌ๋ถ€, ์กฐ์‚ฌ ํ›„์— ์žฌ๋ฌด๋ณด๊ณ ๊ฐ€ ๋ถ„์‹ ๋˜๋Š” ์ •์ƒ์œผ๋กœ ํŒ๋ช…๋˜์—ˆ๋Š”์ง€์— ๋Œ€ํ•œ ์ •๋ณด 

๋ชฉํ‘œ: classify a small firm with charges field : (size=small, charges=yes)์˜ ๋ถ„๋ฅ˜

1. exact bayes calculation

  • P(fraud|yes,small) = P(yes,small|fraud)P(fraud)P(yes,small|fraud)P(fraud) + P(yes,small|truthful)P(truthful)
  • P(fraud | charges=yes, size=small) = ½ = 0.50 

2. naive bayes calculation

  • P(fraud|yes,small) = P(yes|fraud)P(small|fraud)P(fraud)P(yes|fraud)P(small|fraud)P(fraud) + P(yes|truthful)P(small|truthful)P(truthful)

  • ์œ„๋Š” multiplication rule์„ ์ ์šฉ..(๋‡Œ ํ”ผ์…œ..)

  • P(fraud | charges=yes, size=small) = 0.075/(0.075+0.067) = 0.53

  • exact bayes๋กœ ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ์™€ ๊ฐ’์ด ํฌ๊ฒŒ ์ฐจ์ด๊ฐ€ ๋‚˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค

  • all records are used in calculations, not just those matching predictor values

  • relies on assumption of independence between predictor variables within each class

๋‹จ์ˆœ ๋ฒ ์ด์ง€ ๋ถ„๋ฅ˜๋ชจํ˜•์˜ ์žฅ๋‹จ์ 

์žฅ์ 

  • handles purley categorical data well 

    • ๋ถ„์„์˜ ๋ชฉ์ ์ด ์ง‘๋‹จ์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์ด๊ฑฐ๋‚˜ ์–ด๋Š ํŠน์ • ์ง‘๋‹จ์— ์†ํ•  ํ™•๋ฅ ์— ๊ธฐ์ดˆํ•ด์„œ ๋ ˆ์ฝ”๋“œ์˜ ์ˆœ์œ„๋ฅผ ์ •ํ•˜๋Š” ๊ฒƒ์ด๋ผ๋ฉด naive bayes๋Š” ์•„์ฃผ ์ข‹์Œ

    • ๋งŒ์•ฝ ๋ชฉ์ ์ด ์ง‘๋‹จ์— ์†ํ•  ํ™•๋ฅ ์„ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ผ ๋•Œ๋Š” ํŽธํ–ฅ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ณ์Œ

    • good for applications using lift(ex. response to mailing), less so for applications requiring probabilities(ex. credit scoring)

  • works well with very large data sets 

  • simple & computationally efficient ๋ชจํ˜•์ด ๋‹จ์ˆœํ•˜๊ณ  ๊ณ„์‚ฐ์ด ํšจ์œจ์ 

๋‹จ์ 

  • requires large number of records

  • problematic when a predictor category is not present in training data

    • ๋งŒ์•ฝ ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ๊ฐ€ ‘์š”ํŠธ ์†Œ์œ ’=1์ธ record๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋ฉด, ‘์š”ํŠธ ์†Œ์œ ’=1์ธ ์ƒˆ๋กœ์šด record์— ๋Œ€ํ•ด์„œ naive bayes๋Š” ๋ชฉํ‘œ๋ณ€์ˆ˜ ‘๋ณดํ—˜ ๊ตฌ๋งค’์— 0์˜ ํ™•๋ฅ ๊ฐ’์„ ๋ถ€์—ฌ 

    • ์˜ˆ์ธก๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๊ฐ€ ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์—์„œ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋‹จ์ˆœ ๋ฒ ์ด์ง€๋Š” ์ด๋Ÿฌํ•œ ์˜ˆ์ธก๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ๋ฅผ ๊ฐ–๋Š” ์ƒˆ๋กœ์šด record๊ฐ€ 0์˜ ํ™•๋ฅ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค

    • assigns 0 probability of response, ignoring information in other variables๋‹จ

    • ์˜ˆ) ๋ชฉํ‘œ๋ณ€์ˆ˜๊ฐ€ ‘๋ณดํ—˜ ๊ตฌ๋งค’์ด๊ณ  ์˜ˆ์ธก๋ณ€์ˆ˜๊ฐ€ ‘์š”ํŠธ ์†Œ์œ ’๋ผ๊ณ  ๊ฐ€์ •

 

728x90
๋ฐ˜์‘ํ˜•