728x90
๋ฐ˜์‘ํ˜•

Computer Science/Data Science 17

[Random Forest] Random Forest ์„ค๋ช… ๋ฐ ์žฅ๋‹จ์ 

์„ค๋ช… Random Forest๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฒฐ์ • ํŠธ๋ฆฌ(decision tree)๋ฅผ ์กฐํ•ฉํ•ด ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์•™์ƒ๋ธ” ํ•™์Šต(ensemble learning) ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ๋ถ„๋ฅ˜(classification)์™€ ํšŒ๊ท€(regression) ๋ฌธ์ œ์— ์‚ฌ์šฉ๋˜๋ฉฐ, ๊ฐ๊ฐ์˜ ๊ฒฐ์ • ํŠธ๋ฆฌ๊ฐ€ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต๋œ ํ›„, ์ตœ์ข… ์˜ˆ์ธก๊ฐ’์„ ๋‹ค์ˆ˜๊ฒฐ(voting) ๋˜๋Š” ํ‰๊ท ์„ ํ†ตํ•ด ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๊ฐ ํŠธ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€์™€ ๋ณ€์ˆ˜์˜ ์ผ๋ถ€๋งŒ์„ ๋žœ๋คํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐœ๋ณ„ ํŠธ๋ฆฌ๊ฐ€ ๊ณผ์ ํ•ฉ(overfitting)๋˜๋Š” ๋ฌธ์ œ๋ฅผ ์ค„์ด๊ณ , ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์ •ํ™•์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค. Random Forest์˜ ์ฃผ์š” ํŠน์ง•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: 1.๋žœ๋ค์„ฑ ๋„์ž…: ๋ฐ์ดํ„ฐ์˜ ์ƒ˜ํ”Œ๊ณผ ํ”ผ์ฒ˜๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜์—ฌ ๊ฐ ํŠธ๋ฆฌ๋ฅผ ๊ตฌ์„ฑํ•˜..

[์ถ”์ฒœ์‹œ์Šคํ…œ] Collaborative Denoising AutoEncoders for Top-N Recommender Systems ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

Collaborative Denoising Auto-Encoders for Top-N Recommender Systems1. IntroductonCDAE (Collaborative Denoising AutoEncoder) ๋Š” DAE๋ฅผ Collaborative Filtering์— ์ ์šฉํ•˜์—ฌ top-N ์ถ”์ฒœ์— ํ™œ์šฉํ•œ ๋ชจ๋ธ๋ชจ๋ธ์€ input์œผ๋กœ corrupted๋œ user-item ์„ ํ˜ธ๋„๋ฅผ ์ฃผ๊ณ  ์ด๊ฒƒ์˜ latent representation์„ ํ•™์Šต→ ์ด๋Š” corrupted๋˜๊ธฐ ์ „์˜ ์›๋ž˜์˜ input์„ ๋” ์ž˜ ๋ณต์›ํ•ด์คŒ2. Problem DefinitionNotation$U$ : set of users$I$ : set of items$O = (u,i,y_{ui})$ : user์˜ item์— ๋Œ€ํ•œ ์„ ํ˜ธ๋„implict ..

[๋จธ์‹ ๋Ÿฌ๋‹] Likelihood "์šฐ๋„" ๋ž€?

์šฐ๋ฆฌ๊ฐ€ ๋จธ์‹ ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๋‹ค๋ณด๋ฉด MLE(Maximum LIkelihood Estimation)์„ ๋งŽ์ด ์ ‘ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ์˜๋ฏธํ•˜๋Š” Likelihood๊ฐ€ ๋ฌด์—‡์ธ๊ฐ€? ์ตœ๋Œ€์šฐ๋„๋ฒ•(MLE)๋ž€ - ๋ชจ์ˆ˜์ ์ธ ๋ฐ์ดํ„ฐ ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ• - ํŒŒ๋ผ๋ฏธํ„ฐ ๐œƒ = (๐œƒ1,๐œƒ2,..๐œƒn)์œผ๋กœ ๊ตฌ์„ฑ๋œ ์–ด๋–ค ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ P์—์„œ ๊ด€์ธก๋œ ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ x = (x1,x2,..,xn)๋ผ ํ•  ๋•Œ, ์ด ํ‘œ๋ณธ๋“ค์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ๐œƒ = (๐œƒ1,๐œƒ2,..๐œƒn)์„ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ข€ ๋” ์‰ฌ์šด ์ดํ•ด๋ฅผ ์œ„ํ•ด ๊ทธ๋ฆผ๊ณผ ์—์‹œ๋ฅผ ๋ณด๋ฉด์„œ MLE๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•ด๋ณด์ž ์˜ˆ์‹œ ) ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž x = { 1, 4, 5, 6, 9 } ์ด๋•Œ ๋ฐ์ดํ„ฐ x๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ์˜ ์ฃผํ™ฉ์ƒ‰ ๊ณก์„ ๊ณผ ํŒŒ๋ž€์ƒ‰ ๊ณก์„  ์ค‘ ์–ด๋–ค ๊ณก์„ ์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋˜์—ˆ์„ ํ™•๋ฅ ์ด ๋” ๋†’์€๊ฐ€?..

[๋จธ์‹ ๋Ÿฌ๋‹์„ ์œ„ํ•œ ํ†ต๊ณ„์ง€์‹]

1. ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ•˜๋Š”๋ฐ ํ†ต๊ณ„๊ฐ€ ํ•„์š”ํ•œ๊ฐ€์š”? - ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ํ†ต๊ณ„์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์ •์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ „์ฒด๋ฅผ ์•Œ์ง€ ๋ชปํ•˜๋Š” ์ƒํ™ฉ์—์„œ sampling๋œ ๋ฐ์ดํ„ฐ๋งŒ์„ ๋ณด๊ณ  ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ถˆํ™•์‹ค์„ฑ์„ ์ง€๋‹ ์ˆ˜ ๋ฐ–์— ์—†๊ธฐ ๋•Œ๋ฌธ์ด์ฃ . ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ์ผ์€ parameter๋ฅผ ์ถ”์ •ํ•˜๋Š” ์ผ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋˜๋Š” ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ฐ€์ •์„ ํ•  ๋•Œ ํ†ต๊ณ„์ง€์‹ ํ•„์š”ํ•˜์ฃ . 2. ํ™•๋ฅ ๋ถ„ํฌ์— ์–ด๋–ค ๊ฒƒ๋“ค์ด ์žˆ๋‚˜์š”? ๊ทธ๋ฆฌ๊ณ  ์–ธ์ œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜์š”? - uniform distribution, ์ •๊ทœ๋ถ„ํฌ, ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ, ์ดํ•ญ๋ถ„ํฌ, ๋ฒ ํƒ€๋ถ„ํฌ, ๋””๋ฆฌํด๋ ˆ ๋ถ„ํฌ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ „ํ˜€ ๋ชจ๋ฅด๋Š” ์ƒํ™ฉ์—์„œ๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ฐ€์ •ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋Š” ์ฃผ๋กœ ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„..

[Machine Learning] Logistic Regression ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€ ์ดํ•ดํ•˜๊ธฐ

Logistic Regression ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ์˜ˆ์ธก ๋ชจ๋ธ Logistic Regression์„ ์•Œ๊ธฐ์ „์— linear regression์„ ๋จผ์ € ์•Œ์•„์•ผ. Multiple Linear Regression (๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€) ์ˆ˜์น˜ํ˜• ์„ค๋ช…๋ณ€์ˆ˜ X์™€ ์—ฐ์†ํ˜• ์ˆซ์ž๋กœ ์ด๋ค„์ง„ ์ข…์†๋ณ€์ˆ˜ Y๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์„ ํ˜•์œผ๋กœ ๊ฐ€์ •ํ•˜๊ณ  ์ด๋ฅผ ๊ฐ€์žฅ ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ํšŒ๊ท€๊ณ„์ˆ˜๋ฅผ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ถ”์ •ํ•˜๋Š” ๋ชจ๋ธ ์ด๋•Œ ํšŒ๊ท€๊ณ„์ˆ˜๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ ์ฐจ์ด(์˜ค์ฐจ์ œ๊ณฑํ•ฉ error sum of squared)์„ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๊ฐ’ ์„ค๋ช…๋ณ€์ˆ˜๊ฐ€ p๊ฐœ์ธ ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€์˜ ์ผ๋ฐ˜ ์‹ ์˜ˆ์‹œ - 1 ๋‚˜์ด์™€ ํ˜ˆ์•• ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์˜ค์ฐจ์ œ๊ณฑํ•ฉ์„ ์ตœ์†Œ๋กœ ํ•˜๋Š” ํšŒ๊ท€๊ณ„์ˆ˜ ๊ตฌํ•˜๊ธฐ ์„ค๋ช…๋ณ€์ˆ˜ X : ๋‚˜์ด ์ข…์†๋ณ€์ˆ˜ Y : ํ˜ˆ์•• ์•ž์„œ ์ข…์†๋ณ€์ˆ˜ Y๋Š” ‘ํ˜ˆ์••’์œผ๋กœ ์—ฐ์†ํ˜• ์ˆซ์ž์˜€์Œ. ๊ทธ๋ ‡..

[Machine Learning] ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์ด๋ž€?

Ensemble ๊ธฐ๋ฒ• Ensemble Learning์ด๋ž€ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๊ทธ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ณด๋‹ค ์ •ํ™•ํ•œ ์˜ˆ์ธก์„ ๋‚ด๋Š” ๊ธฐ๋ฒ• ๊ฐ•๋ ฅํ•œ ํ•˜๋‚˜์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  ๋ณด๋‹ค ์•ฝํ•œ ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ๊ฐœ ์กฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹ Ensemble Learning ์ข…๋ฅ˜ ์•™์ƒ๋ธ” ํ•™์Šต์€ 3๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๋ถ„๋ฅ˜๋จ Voting Bagging Boosting Voting ์—ฌ๋Ÿฌ๊ฐœ์˜ classifier๊ฐ€ ํˆฌํ‘œ๋ฅผ ํ†ตํ•ด ์ตœ์ข… ์˜ˆ์ธก๊ฒฐ๊ณผ ๊ฒฐ์ • ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์—ฌ๋Ÿฌ๊ฐœ ๊ฒฐํ•ฉํ•˜์—ฌ ์‚ฌ์šฉ Voting ๋ฐฉ์‹ Hard Voting : ๋‹ค์ˆ˜์˜ classifier๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๊ฐ’์„ ์ตœ์ข… ๊ฒฐ๊ณผ๋กœ ์„ ์ • (๋‹ค์ˆ˜๊ฒฐ์˜ ๋ฒ•์น™) Soft Voting : ๋ชจ๋“  classifier๊ฐ€ ์˜ˆ์ธกํ•œ label๊ฐ’์˜ ๊ฒฐ์ • ํ™•๋ฅ  ํ‰๊ท ์„ ๊ตฌํ•œ ๋’ค ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ label๊ฐ’์„ ์ตœ์ข…๊ฒฐ๊ณผ๋กœ ์„ ..

[CNN] CNN feature map๊ณผ filter ์‹œ๊ฐํ™”

๋ชฉ์  : CNN layer๋“ค ์ค‘๊ฐ„์ค‘๊ฐ„ ์ถ”์ถœ๋˜๋Š” feature๋“ค์„ ์‹œ๊ฐํ™”ํ•ด๋ณด๋ฉด์„œ layer๋ฅผ ๊ฑฐ์น˜๋ฉด์„œ ์–ด๋– ํ•œ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜๋Š”์ง€ ์•Œ์•„๋ณธ๋‹ค. CNN architecture ๋“ค์–ด๊ฐ€๊ธฐ์ „์—, CNN์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๊ผญ ํ•„์š”ํ•œ ์ •๋ณด๋“ค์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ด๋ณธ๋‹ค. 1. input image์— ์šฐ๋ฆฌ๋Š” filter(=mask=kernel)๋ฅผ ์ ์šฉํ•˜์—ฌ feature map์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋•Œ filter๋Š” ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” edge, vertical line, horizontal line, bends์™€ ๊ฐ™์€ ์—ฌ๋Ÿฌ feature๋“ค์„ ๋‚˜ํƒ€๋‚ด์ฃผ๋„๋ก ๋„์™€์ค€๋‹ค. 2. ์ƒ์„ฑ๋œ feature map์— pooling์„ ์ ์šฉํ•œ๋‹ค. min, avg, max pooling๋“ฑ์„ ์“ธ ์ˆ˜ ์žˆ๊ณ , ๊ทธ ์ค‘์—์„œ max pooling์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ..

[text mining] word embedding ์ด๊ฑฐ๋ฉด ๋!

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ‘œํ˜„ ๋ฐฉ์‹ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•ด์•ผํ•จ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹(feature representation)์œผ๋กœ sparse representation์ด ๋จผ์ € ๋“ฑ์žฅํ•˜์˜€๊ณ  sparse representaion์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด dense representation๊ฐ€ ๋“ฑ์žฅ sparse representation์˜ ๋Œ€ํ‘œ์ ์ธ ๊ธฐ๋ฒ•์€ one-hot encoding์ด๊ณ  dense representation์˜ ๋Œ€ํ‘œ์ ์ธ ๊ธฐ๋ฒ•์€ word embedding one-hot encoding ์ปดํ“จํ„ฐ๋Š” ๋ฌธ์ž๋ฅผ ์ดํ•ดํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•ด์ค˜์•ผํ•˜๋ฉฐ one-hot encoding์€ ์—ฌ๋Ÿฌ ํ‘œํ˜„๊ธฐ๋ฒ•๋“ค ์ค‘ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋ฐฉ๋ฒ• sparse representation์—..

[Machine Learning] LightGBM์ด๋ž€? โœ” ์„ค๋ช… ๋ฐ ์žฅ๋‹จ์ 

๐Ÿ“Œ Remind LightGBM์— ๋“ค์–ด๊ฐ€๊ธฐ์ „์— ๋ณต์Šต ๊ฒธ reminding์„ ํ•ด๋ณด์ž. Light GBM์˜ GBM์€ Gradient Boosting Model๋กœ, tree๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์ด GBM์˜ ํ•™์Šต๋ฐฉ์‹์„ ์‰ฝ๊ฒŒ๋งํ•˜๋ฉด, ํ‹€๋ฆฐ๋ถ€๋ถ„์— ๊ฐ€์ค‘์น˜๋ฅผ ๋”ํ•˜๋ฉด์„œ ์ง„ํ–‰ํ•œ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. Gradient Boosting์—์„œ Boosting์€ ์—ฌ๋Ÿฌ๊ฐœ์˜ tree๋ฅผ ๋งŒ๋“ค๋˜, ๊ธฐ์กด์— ์žˆ๋Š” ๋ชจ๋ธ(tree)๋ฅผ ์กฐ๊ธˆ์”ฉ ๋ฐœ์ „์‹œ์ผœ์„œ ๋งˆ์ง€๋ง‰์— ์ด๋ฅผ ํ•ฉํ•˜๋Š” ๊ฐœ๋…์œผ๋กœ, Random Forest์˜ Bagging๊ธฐ๋ฒ•๊ณผ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์ด๋‹ค. Boostingํ•˜๋Š” ๋ฐฉ์‹์—๋„ ํฌ๊ฒŒ 2๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. 1. AdaBoost์™€ ๊ฐ™์ด ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ(์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ์ด ํ‹€๋ฆฐ ๋ฐ์ดํ„ฐ)์— ๋Œ€ํ•ด weight๋ฅผ ์ฃผ๋Š” ๋ฐฉ์‹ 2. GBDT์™€ ๊ฐ™์ด loss fun..

[Machine Learning] ๋จธ์‹ ๋Ÿฌ๋‹, ๋ชจ๋ธ์˜ ํŽธํ–ฅ(bias)๊ณผ ๋ถ„์‚ฐ(variance) : trade-off ๊ด€๊ณ„

๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ์€ ์–ธ์ œ ์“ฐ์ด๋Š” ์šฉ์–ด์ธ๊ฐ€? Supervised Learning(์ง€๋„ํ•™์Šต)์— ๋Œ€ํ•ด์„œ ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•ด๋ณด์ž๋ฉด ์‚ฌ๋žŒ์ด ์ •ํ•ด์ค€ ์ •๋‹ต์ด ์žˆ๊ณ , ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์€ ๊ทธ ์ •๋‹ต์„ ์ž˜ ๋งž์ถ”๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต(training)์„ ํ•œ๋‹ค. ์ด๋•Œ, ํ•™์Šต์„ ํ•˜๋ฉด์„œ ๋ชจ๋ธ์ด ๋‚ด๋†“๋Š” ์˜ˆ์ธก๊ฐ’๋“ค์˜ ๊ฒฝํ–ฅ์„ ํ‘œํ˜„ํ•˜๊ธฐ์œ„ํ•ด ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ์ด๋ผ๋Š” ์šฉ์–ด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•˜์ž๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค A. ์˜ˆ์ธก๊ฐ’๊ณผ ์ •๋‹ต ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ "ํŽธํ–ฅ"์œผ๋กœ ํ‘œํ˜„ (bias : model์˜ output๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์ œ๊ณฑ error, ์ •ํ™•๋„์™€ ๋น„์Šทํ•œ ๊ฐœ๋…) B. ์˜ˆ์ธก๊ฐ’๋ผ๋ฆฌ์˜ ๊ด€๊ณ„๋ฅผ "๋ถ„์‚ฐ"์œผ๋กœ ํ‘œํ˜„ (variance : model์ด ๊ฐ๊ธฐ ๋‹ค๋ฅธ train set์— ๋Œ€ํ•˜์—ฌ ์„ฑ๋Šฅ์˜ ๋ณ€ํ™”์ •๋„๊ฐ€ ๊ธ‰ํ•˜๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€, ์•ˆ์ •์ ์œผ๋กœ ๋ณ€ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„) [๋”ฅ๋Ÿฌ๋‹] Bia..

๋ฌธ์„œ์œ ์‚ฌ๋„

๋ฌธ์„œ์œ ์‚ฌ๋„ 0. Base ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฌธ์„œ๊ฐ€ ์žˆ๊ณ , ๋ฌธ์„œ๋ฅผ feature space์— ๋†“๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด์ž ๊ฐ•์•„์ง€ ๊ท€์—ฝ๋‹ค ๋งค์šฐ ๊ฐ•์•„์ง€๊ฐ€ ๊ท€์—ฝ๋‹ค 1 1 0 ๊ฐ•์•„์ง€๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๋‹ค 1 1 1 ๊ณ ์–‘์ด๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๋‹ค 0 1 1 ๊ฐ ๋‹จ์–ด ‘๊ฐ•์•„์ง€’, ‘๊ณ ์–‘์ด’, ‘๋งค์šฐ’๋ฅผ ์ถ•์œผ๋กœ ํ•˜๋Š” ํŠน์„ฑ๊ณต๊ฐ„(feature space)์—์„œ ๋‹ค์Œ ๋ฌธ์„œ๋“ค์„ ํ•˜๋‚˜์˜ ์ขŒํ‘œ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ ‘๊ฐ•์•„์ง€๊ฐ€ ๊ท€์—ฝ๋‹ค’ --> (1,1,0) ‘๊ฐ•์•„์ง€๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๋‹ค’ --> (1,1,1) ‘๊ณ ์–‘์ด๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๋‹ค’ --> (0,1,1) ๋‘ ๋‹จ์–ด ํ˜น์€ ๋ฌธ์žฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค cosine similarity jaccard similarity euclidean distance manhattan distance..

Naive Bayes Classifier

๊ฐœ์š” ๋‹จ์ˆœ๊ทœ์น™๋ชจํ˜•: ์˜ˆ์ธก๋ณ€์ˆ˜๊ฐ€ ํ•„์š” ์—†๋Š” ๋ชจํ˜•, ์ฃผ๋กœ ๊ณ ๊ธ‰ ๋ชจํ˜•๋“ค๊ณผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•œ baseline ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๋ชจํ˜• => ์ด ๊ธฐ๋ฒ•๋“ค์€ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ๊ฐ€์ •์„ ๊ฑฐ์˜ ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ณตํ†ต์ ! (data-driven) (makes no assumption about the data) ๋‹จ์ˆœ๊ทœ์น™ ๋ชจ๋“  ์˜ˆ์ธก๋ณ€์ˆ˜๋ฅผ ๋ถ„๋ฅ˜ํ•œ ์ƒ์ฑ„์—์„œ ์–ด๋Š ํ•œ record๋ฅผ m๊ฐœ์˜ ์ง‘๋‹จ ์ค‘์— ์ œ์ผ ๋งŽ์€ ํ•˜๋‚˜(prevalent class)๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋‹จ์ˆœํ•œ ๊ทœ์น™ ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๋ชจํ˜• ๋‹จ์ˆœ๊ทœ์น™๋ณด๋‹ค ์ •๊ตํ•œ ๋ฐฉ๋ฒ• : ๋‹จ์ˆœ๊ทœ์น™ + ์˜ˆ์ธก๋ณ€์ˆ˜ ์ •๋ณด ๋‹ค๋ฅธ ๋ถ„๋ฅ˜๋ชจํ˜•๊ณผ ๋‹ฌ๋ฆฌ naive bayes classifier๋Š” ์˜ˆ์ธก๋ณ€์ˆ˜๊ฐ€ ๋ฒ”์ฃผํ˜•์ธ ๊ฒฝ์šฐ์—๋งŒ ์ ์šฉ๋จ ๋”ฐ๋ผ์„œ ์ˆ˜์น˜ํ˜• ์˜ˆ์ธก๋ณ€์ˆ˜๋Š” ๋ฒ”์ฃผํ˜• ์˜ˆ์ธก๋ณ€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ์•ผ ํ•จ ๋‹จ์ˆœ ๋ฒ ์ด์ฆˆ ๊ธฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด ๋งค์šฐ ํด..

Decision Tree ๊ฐ„.๋‹จ.๋ช….๋ฃŒ

Decision tree : ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด ๋ถ„๋ฅ˜(classification)๊ณผ ํšŒ๊ท€๋ถ„์„(regression)์— ๋ชจ๋‘ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๊ธฐ ๋–„๋ฌธ์— CART(Classification And Regression Tree)๋ผ๊ณ  ๋ถˆ๋ฆผ node tree์˜ node : ์งˆ๋ฌธ/๋‹ต์„ ๋‹ด๊ณ  ์žˆ์Œ root node : ์ตœ์ƒ์œ„ node ์ตœ์ƒ์œ„ node์˜ ์†์„ฑ feature๊ฐ€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํŠน์„ฑ leaf node : ๋งˆ์ง€๋ง‰ node (๋ง๋‹จ๋…ธ๋“œ) ๋งŒ์•ฝ tree์˜ ๋ชจ๋“  leaf node๊ฐ€ pure node๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ์ง„ํ–‰ํ•˜๋ฉด model์˜ ๋ณต์žก๋„๋Š” ๋งค์šฐ ๋†’์•„์ง€๊ณ  overfitting๋จ overfitting ๋ฐฉ์ง€ tree์˜ ์ƒ์„ฑ์„ ์‚ฌ์ „์— ์ค‘์ง€ : pre-prunning (=๊นŠ์ด์˜ ์ตœ๋Œ€๋ฅผ ์„ค์ •, max_depth) ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ node ์‚ญ..

Random Forest ๊ฐ„.๋‹จ.๋ช….๋ฃŒ

Ensemble ์•™์ƒ๋ธ” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ model์„ ์—ฐ๊ฒฐํ•˜์—ฌ ๊ฐ•๋ ฅํ•œ model์„ ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ• classifier/regression์— ์ „๋ถ€ ํšจ๊ณผ์  random forest์™€ gradient boosting์€ ๋‘˜๋‹ค model์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธฐ๋ณธ ์š”์†Œ๋กœ decision tree๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค random forest ์กฐ๊ธˆ์”ฉ ๋‹ค ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ decision tree์˜ ๋ฌถ์Œ ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์˜ ๋“ฑ์žฅ ๋ฐฐ๊ฒฝ : ๊ฐ๊ฐ์˜ tree๋Š” ๋น„๊ต์  ์˜ˆ์ธก์„ ์ž˜ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€์— overfittingํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๊ฐ€์ง ๋”ฐ๋ผ์„œ, ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์œผ๋กœ overfitting๋œ tree๋ฅผ ๋งŽ์ด ๋งŒ๋“ค๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ท ๋‚ด๋ฉด overfitting์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด tree model์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ์€ ์œ ์ง€ํ•˜๋˜ overf..

๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€ / ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€ ๊ฐ„.๋‹จ.๋ช….๋ฃŒ

๋‹จ์ˆœ์„ ํ˜•ํšŒ๊ท€ ํ•˜๋‚˜์˜ ํŠน์„ฑ์„ ์ด์šฉํ•ด์„œ ํƒ€๊ฒŸ ์˜ˆ์ธก y = wx + b y : ์˜ˆ์ธก๊ฐ’ x : ํŠน์„ฑ w : ๊ฐ€์ค‘์น˜/๊ณ„์ˆ˜(coefficient) b : ํŽธํ–ฅ(offset) ์ฃผ์–ด์ง„ sample data๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ฐ€์žฅ ์ ํ•ฉํ•œ w์™€ b๋ฅผ ์ฐพ์•„์•ผ ํ•จ -> ๋ณดํ†ต ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(gradient descent)๋ฅผ ์ด์šฉํ•ด์„œ ์ฐพ๋Š”๋‹ค ๋‹ค์ค‘์„ ํ˜•ํšŒ๊ท€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŠน์„ฑ์„ ์ด์šฉํ•ด์„œ ํƒ€๊ฒŸ ์˜ˆ์ธก y = w0x0 + w1x1 = w2x2 + ... + b ์—ญ์‹œ MSE๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ€์žฅ ์ ํ•ฉํ•œ w๋“ค๊ณผ b๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ ๋ฌธ์ œ : ๊ณผ๋Œ€์ ํ•ฉ ๋  ๋•Œ๊ฐ€ ์ข…์ข… ์žˆ๋‹ค => ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๋–จ์–ด์ง„๋‹ค ๋ฆฟ์ง€(Ridge)์™€ ๋ผ์˜(Lasso) ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด๊ฒฐ

728x90
๋ฐ˜์‘ํ˜•