Computer Science/Data Science

[text mining] word embedding ์ด๊ฑฐ๋ฉด ๋!

_cactus 2022. 3. 2. 13:09
๋ฐ˜์‘ํ˜•

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ‘œํ˜„ ๋ฐฉ์‹

ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•ด์•ผํ•จ

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹(feature representation)์œผ๋กœ sparse representation์ด ๋จผ์ € ๋“ฑ์žฅํ•˜์˜€๊ณ  sparse representaion์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด dense representation๊ฐ€ ๋“ฑ์žฅ

sparse representation์˜ ๋Œ€ํ‘œ์ ์ธ ๊ธฐ๋ฒ•์€ one-hot encoding์ด๊ณ  dense representation์˜ ๋Œ€ํ‘œ์ ์ธ ๊ธฐ๋ฒ•์€ word embedding

one-hot encoding

์ปดํ“จํ„ฐ๋Š” ๋ฌธ์ž๋ฅผ ์ดํ•ดํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•ด์ค˜์•ผํ•˜๋ฉฐ one-hot encoding์€ ์—ฌ๋Ÿฌ ํ‘œํ˜„๊ธฐ๋ฒ•๋“ค ์ค‘ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋ฐฉ๋ฒ•

sparse representation์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ ๊ธฐ๋ฒ• (sparse representation์— ๊ด€ํ•˜์—ฌ ํ›„์— ์„ค๋ช…)

๊ณผ์ •

1. ๋‹จ์–ด์ง‘ํ•ฉ ์ƒ์„ฑ

๋‹จ์–ด ์ง‘ํ•ฉ (=vocabulary)

: ํ…์ŠคํŠธ๋‚ด ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์ค‘๋ณต์„ ํ—ˆ์šฉํ•˜์ง€ ์•Š๊ณ  ๋ชจ์€ ์ง‘ํ•ฉ

์„œ๋กœ ๋‹ค๋ฅธ ๋‹จ์–ด๋“ค์˜ ์ง‘ํ•ฉ์„ ๋งํ•˜๋ฉฐ apple๊ณผ apples๊ณผ ๊ฐ™์ด ๋‹จ์–ด์˜ ๋ณ€ํ˜•ํ˜•ํƒœ ๋˜ํ•œ ๋‹ค๋ฅธ ๋‹จ์–ด๋กœ ๊ฐ„์ฃผ

2. ๊ณ ์œ ํ•œ ์ •์ˆ˜ ๋ถ€์—ฌ

๋‹จ์–ด์ง‘ํ•ฉ์ด ๋งŒ๋“ค์–ด์ง€๊ณ ๋‚˜๋ฉด ์ด ๋‹จ์–ด ์ง‘ํ•ฉ์— ๊ณ ์œ ํ•œ ๋ฒˆํ˜ธ(index)๋ฅผ ๋ถ€์—ฌ(encoding๊ณผ์ •)

์˜ˆ) ๋‹จ์–ด์ง‘ํ•ฉ์˜ ํฌ๊ธฐ๊ฐ€ 5000์ผ ๋•Œ

     apple : 1

     banana : 2

     car : 3

    โ‹ฎ

     is : 2000

    โ‹ฎ

     zoo : 5000

๋ฐ˜์‘ํ˜•

3. one-hot encoding

: ๋ฒกํ„ฐ์˜ ์ฐจ์›์€ ๋‹จ์–ด์ง‘ํ•ฉ์˜ ํฌ๊ธฐ์ด๋ฉฐ ํ‘œํ˜„ํ•˜๊ณ ์žํ•˜๋Š” ๋‹จ์–ด์˜ index์—๋Š” 1๋ถ€์—ฌ, ๊ทธ ์™ธ index์—๋Š” 0์„ ๋ถ€์—ฌํ•˜๋Š” vectorํ‘œํ˜„๋ฐฉ์‹

์˜ˆ) 2๋ฒˆ์˜ ๊ณ ์œ ํ•œ ์ •์ˆ˜๋ถ€์—ฌ์—์„œ ์‚ฌ์šฉ๋œ ์˜ˆ์‹œ์˜ ์—ฐ์žฅ์„ 

๋‹จ์–ด \ index 1 2 3 ... 2000 ... 5000
apple 1 0 0 0 0 0 0
banana 0 1 0 0 0 0 0
car 0 0 1 0 0 0 0
โ‹ฎ 0 0 0 โ‹ฑ 0 0 0
is 0 0 0 0 1 0 0
โ‹ฎ 0 0 0 0 0 โ‹ฑ 0
zoo 0 0 0 0 0 0 1

ํ•œ๊ณ„ / ๋‹จ์ 

  • ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ๋ฒกํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ณต๊ฐ„ ๋˜ํ•œ ์ปค์ง ⇒ ๋ฒกํ„ฐ์˜ ์ฐจ์› ์ฆ๊ฐ€
  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์ด ํฌ๋ฉด ์ฐจ์›์˜ ์ €์ฃผ๋ผ๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊น€
  • → ์ž…๋ ฅ๋ฐ์ดํ„ฐ์— 0์ด ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉด ๋ฐ์ดํ„ฐ์—์„œ ์ •๋ณด๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์–ด๋ ค์›Œ์ง → ๋ชจ๋ธ์˜ ํ•™์Šต์ด ์ž˜ ์•ˆ๋˜์–ด ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง
  • ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•จ โญ
    ์˜ˆ) ๋Š‘๋Œ€, ํ˜ธ๋ž‘์ด, ๊ฐ•์•„์ง€, ๊ณ ์–‘์ด๋ผ๋Š” 4๊ฐœ์˜ ๋‹จ์–ด์— ๋Œ€ํ•ด์„œ one-hot encoding์„ ํ•ด์„œ ๊ฐ๊ฐ [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] ์ด๋ผ๋Š” one-hot vector๋ฅผ ๋ถ€์—ฌ๋ฐ›์•˜๋‹ค๊ณ  ๊ฐ€์ •
    ์ด๋•Œ one-hot vector๋กœ๋Š” ๊ฐ•์•„์ง€์™€ ๋Š‘๋Œ€๊ฐ€ ์œ ์‚ฌํ•˜๊ณ , ํ˜ธ๋ž‘์ด์™€ ๊ณ ์–‘์ด๊ฐ€ ์œ ์‚ฌํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ‘œํ˜„ํ•  ์ˆ˜๊ฐ€ ์—†์Œ
    ์ข€ ๋” ๊ทน๋‹จ์ ์œผ๋กœ๋Š” ๊ฐ•์•„์ง€, ๊ฐœ, ๋ƒ‰์žฅ๊ณ ๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์žˆ์„ ๋•Œ ๊ฐ•์•„์ง€๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ๊ฐœ์™€ ๋ƒ‰์žฅ๊ณ ๋ผ๋Š” ๋‹จ์–ด ์ค‘ ์–ด๋–ค ๋‹จ์–ด์™€ ๋” ์œ ์‚ฌํ•œ์ง€๋„ ์•Œ ์ˆ˜ ์—†์Œ
     โ‚ ๋‹จ์–ด๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•จ
    • ํ•ด๊ฒฐ๋ฐฉ๋ฒ• 
      ⇒ ๋‹จ์–ด์˜ ์ž ์žฌ ์˜๋ฏธ๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ๋‹ค์ฐจ์› ๊ณต๊ฐ„์— ๋ฒกํ„ฐํ™” ํ•˜๋Š” ๊ธฐ๋ฒ• : word embedding์„ ํ†ตํ•œ dense representation
      1. count๊ธฐ๋ฐ˜ ๋ฒกํ„ฐํ™” ๋ฐฉ๋ฒ•
          - LSA(์ž ์žฌ์˜๋ฏธ๋ถ„์„), HAL ๋“ฑ
      2. ์˜ˆ์ธก๊ธฐ๋ฐ˜ ๋ฒกํ„ฐํ™” ๋ฐฉ๋ฒ•
          - RNNLM, Word2Vec, FastText ๋“ฑ
      3. count&์˜ˆ์ธก๊ธฐ๋ฐ˜ ๋ฒกํ„ฐํ™” ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
          - GloVe
  •  

 

Sparse Representation (ํฌ์†Œ ํ‘œํ˜„)

  • ์•ž์„œ one-hot encoding์„ ํ†ตํ•ด ๋‚˜์˜จ one-hot vector๋Š” ํ‘œํ˜„ํ•˜๊ณ ์žํ•˜๋Š” ๋‹จ์–ด์˜ index๋งŒ 1์ด๊ณ  ์ด์™ธ ๋‚˜๋จธ์ง€ index๋Š” ์ „๋ถ€ 0์œผ๋กœ ํ‘œํ˜„๋˜์–ด ์žˆ์Œ
    ⇒ ์ด๋ ‡๊ฒŒ ํ–‰๋ ฌ/๋ฒกํ„ฐ์˜ ๊ฐ’ ์ค‘ ๋Œ€๋ถ€๋ถ„์ด 0์œผ๋กœ ์ฑ„์›Œ์ ธ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ **ํฌ์†Œ ํ‘œํ˜„(sparse representation)**์ด๋ผ๊ณ  ํ•จ (one-hot vector๋Š” ๋”ฐ๋ผ์„œ sparse vector)
  • one-hot encoding์™ธ์—๋„ ์—ฌ๋Ÿฌ ํฌ์†Œํ‘œํ˜„ ๋ฐฉ์‹(DTM ๋“ฑ..)์ด ์กด์žฌํ•˜๋‚˜ one-hot encoding์ด ๊ฐ€์žฅ ์œ ๋ช…

 

Dense Representation (๋ฐ€์ง‘ ํ‘œํ˜„)

  • sparse representation๊ณผ ๋ฐ˜๋Œ€๋˜๋Š” ํ‘œํ˜„
  • distributed representation์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๊ธฐ๋„ ํ•จ
  • sparse representation์€ ๋‹จ์–ด ์ง‘ํ•ฉ ํฌ๊ธฐ = ๋ฒกํ„ฐ์˜ ์ฐจ์›
    dense representation์€ ์‚ฌ์šฉ์ž๊ฐ€ ์„ค์ •ํ•œ ๊ฐ’์œผ๋กœ ๋ฒกํ„ฐ์˜ ์ฐจ์› ์„ค์ •
  • sparse representation์€ 0๊ณผ 1๋งŒ ๊ฐ€์ง
    dense representation์€ ์‹ค์ˆ˜๊ฐ’์„ ๊ฐ€์ง
    → sparse representation์€ ๋ฒกํ„ฐ์˜ ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ’์ด 0์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด sparseํ•˜์ง€๋งŒ dense representation์€ ๋ชจ๋“  ์ฐจ์›์ด ๊ฐ’์„ ๊ฐ–๊ณ  ์žˆ๋Š” ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋˜๋ฏ€๋กœ sparse์˜ ๋ฐ˜๋Œ€๋ง์ธ dense๋ฅผ ์จ์„œ dense representation์ด๋ผ ํ•จ

์˜ˆ์‹œ ) 1000๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ์žˆ๋Š” ๋‹จ์–ด์ง‘ํ•ฉ์—์„œ ์‚ฌ๊ณผ๋ฅผ ํ‘œํ˜„ํ•˜๊ณ ์ž ํ•  ๋•Œ ์‚ฌ๊ณผ๋ฅผ ๊ธฐ์กด์˜ sparse representation์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด

  • sparse representation : ์‚ฌ๊ณผ = [ 0 0 0 1 0 0 ... 0 0 ]
    1๊ฐœ์˜ 1๊ณผ 999๊ฐœ์˜ 0์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ์ฐจ์›์˜ ํฌ๊ธฐ๊ฐ€ 1000์ธ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋จ

์‚ฌ๊ณผ๋ฅผ ์ฐจ์›์˜ ํฌ๊ธฐ๊ฐ€ 64์ธ dense representation์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด

  • dense representation : ์‚ฌ๊ณผ = [ 0.15 1.2 0 1.8 -1.1 0.7 ... -3 -1.7 ]
    1000์ด์—ˆ๋˜ ์ฐจ์›์˜ ํฌ๊ธฐ๊ฐ€ 64๋กœ ์ค„์–ด๋“ค๋ฉด์„œ ๋ชจ๋“  ๊ฐ’์€ ์‹ค์ˆ˜๊ฐ€ ๋จ

    → ‘๋ฒกํ„ฐ์˜ ์ฐจ์›์ด ์กฐ๋ฐ€ํ•ด์กŒ๋‹ค’ํ•˜์—ฌ dense vector(๋ฐ€์ง‘ ๋ฒกํ„ฐ)๋ผ ํ•จ
    ⇒ ๋‹จ์–ด ๋ฒกํ„ฐ์˜ ๊ฐ’๋“ค์€ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•ด ํ•™์Šต๋จ

 

์žฅ์ 

  • ์ ์€ ์ฐจ์›์œผ๋กœ ๋Œ€์ƒ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Œ
  • ์ผ๋ฐ˜์ ์œผ๋กœ 20 ~ 200 ์ฐจ์› ์ •๋„๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ 0์ด ๊ฑฐ์˜ ์—†๊ณ  ๊ฐ๊ฐ์˜ ์ฐจ์›๋“ค์ด ๋ชจ๋‘ ์ •๋ณด๋ฅผ ๋“ค๊ณ ์žˆ์Œ (sparse representation์˜ ๋ช‡ ์ฒœ ์ฐจ์›์— ๋น„ํ•ด ํ›จ์”ฌ ์ž‘์€ ์ฐจ์›)
    → ๋ชจ๋ธ ํ•™์Šต์ด ๋” ์ž˜๋จ
  • ๋” ํฐ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ

์œ„์˜ ์žฅ์ ๋“ค์€ word embedding์ด ์ž˜ ํ•™์Šต๋˜์—ˆ๋‹ค๋Š” ์ „์ œํ•˜์— ์„ฑ๋ฆฝ

word embedding์— ๋Œ€ํ•œ ์„ค๋ช…๊ณผ ์–ด๋–ป๊ฒŒ ํ•™์Šต์‹œํ‚ค๋‚˜?

word embedding

  • embedding = ๋ผ์›Œ๋„ฃ๋‹ค
    ⇒ ๋‹จ์–ด/๋ฌธ์žฅ ๊ฐ๊ฐ์„ ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•œ N์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฒกํ„ฐ๊ณต๊ฐ„์— ๋ผ์›Œ๋„ฃ๋Š”๋‹ค
    ์‰ฝ๊ฒŒ ๋งํ•ด, NLP์—์„œ word embedding์€ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„/mapping/๋ณ€ํ™˜ํ•œ๋‹ค ๋ผ๋Š” ๋งฅ๋ฝ
  • word embedding์„ ํ•˜๊ฒŒ๋˜๋ฉด sparsity๋Š” ์‚ฌ๋ผ์ง
    one-hot encoding์ฒ˜๋Ÿผ ๋Œ€๋ถ€๋ถ„์˜ ๊ฐ’์ด 0์ธ ๋ฒกํ„ฐ๊ฐ€ ์•„๋‹Œ ๋ชจ๋“  ์ฐจ์›์ด ๊ฐ’์„ ๊ฐ–๊ณ  ์žˆ๋Š” ๋ฒกํ„ฐ

one-hot vector vs embedding vector

 

  one-hot vector  embedding vector
์ฐจ์› ๊ณ ์ฐจ์› ์ €์ฐจ์›
๋‹ค๋ฅธ ํ‘œํ˜„ sparse vector dense vector
ํ‘œํ˜„๋ฐฉ๋ฒ• ์ˆ˜๋™ train data๋กœ๋ถ€ํ„ฐ ํ•™์Šต
๊ฐ’ 0 ๋˜๋Š” 1 ์‹ค์ˆ˜

์•ž์„œ one-hot vector์˜ ๋‹จ์ ์œผ๋กœ ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ„ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์ด ์žˆ์—ˆ์Œ

embedding vector๋Š” ์ด ๋‹จ์ ์„ ํ•ด๊ฒฐํ•ด์คŒ

 

 

 

word embedding์„ ํ•™์Šตํ•˜๋Š” ์—ฌ๋Ÿฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ LSA, Word2Vec, FastText, Glove ... ๋“ฑ์ด ์žˆ์œผ๋ฉฐ word embedding์„ ํ•™์Šตํ•˜๋Š” ๊ธฐ๋ฒ•์€ 2๊ฐ€์ง€๋กœ ๋‚˜๋‰จ

2๊ฐ€์ง€ ๊ธฐ๋ฒ•

  1. count based method
    : ํŠน์ • ๋‹จ์–ด๊ฐ€ ์ด์›ƒ ๋‹จ์–ด๋“ค๊ณผ ๊ฐ™์ด ๋“ฑ์žฅํ•œ ํšŸ์ˆ˜ ๊ณ„์‚ฐ → ์ด ํ†ต๊ณ„๋ฅผ dense vector๋กœ mapping
    LSA๋“ฑ
  2. predicitive method
    : dense embedding vector๋กœ ํ‘œํ˜„๋œ ์ด์›ƒ ๋‹จ์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง์ ‘์ ์œผ๋กœ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธก
    word2vec๋“ฑ

 

word embedding ์œผ๋กœ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ word2vec์€ predictive method์— ํ•ด๋‹น

Word2Vec

: word embedding ๋ฐฉ๋ฒ• ์ค‘ ๊ฐ€์žฅ ์œ ๋ช… (๊ตฌ๊ธ€์ด ๊ฐœ๋ฐœ)

  • Korean Word2Vec
    ํ•ด๋‹น ์‚ฌ์ดํŠธ๋Š” ์šฐ๋ฆฌ๋ง์— word2vec ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•œ ๊ฒƒ๋“ค์„ ๋ณด์—ฌ์คŒ
    ์˜ˆ์‹œ ) ํ•œ๊ตญ - ์„œ์šธ + ๋„์ฟ„
    • ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉด ๋ฒกํ„ฐ๊ฐ„์— ๋ง์…ˆ ๋บ„์…ˆ์„ ํ•จ\
    • ๋ฒกํ„ฐ๊ฐ„์˜ ๋ง์…ˆ ๋บ„์…ˆ์€ ํ•ด๋‹นํ•˜๋Š” ๋‹จ์–ด๊ฐ„์˜ ์˜๋ฏธ์˜ ํ•ฉ๊ณผ ์ฐจ๋กœ ๋ฐ˜์˜๋จ
    • ๋ฒกํ„ฐ์— ๋‹จ์–ด์˜ ์˜๋ฏธ๊ฐ€ ์ž˜ ๋‹ด๊ฒจ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

 

ํ•ต์‹ฌ ์•„์ด๋””์–ด

๋‹จ์–ด์˜ ์ฃผ๋ณ€์„ ๋ณด๋ฉด ๊ทธ ๋‹จ์–ด๋ฅผ ์•ˆ๋‹ค

word2vec์€ ์ง€๋„ํ•™์Šต์„ ๋‹ฎ์•˜์ง€๋งŒ ๋น„์ง€๋„ํ•™์Šต์ž„

 

 

word2vec์€ ํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ word embedding์„ ํ•™์Šตํ•˜๋Š” ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์ด ์ข‹์€ predictive modelk์ด๋ฉฐ 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ˜„

  1. CBOW(Continuous Bag-Of-Words) : ๋งฅ๋ฝ์œผ๋กœ ๋‹จ์–ด ์˜ˆ์ธก
  2. Skip-Gram : ๋‹จ์–ด๋กœ ๋งฅ๋ฝ ์˜ˆ์ธก
    (CBOW ๋ชจ๋ธ์„ ๋ฐ˜๋Œ€๋กœ ๋’ค์ง‘์€๊ฒŒ skip-gram)

CBOW

: ์ฃผ๋ณ€๋‹จ์–ด(์ฆ‰ ๋งฅ๋ฝ)์œผ๋กœ target ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต

*์ฃผ๋ณ€๋‹จ์–ด : ์ผ๋ฐ˜์ ์œผ๋กœ target ๋‹จ์–ด์˜ ์ง์ „ ๋ช‡ ๋‹จ์–ด์™€ ์งํ›„ ๋ช‡ ๋‹จ์–ด๋ฅผ ๋œปํ•˜๋ฉฐ ์ด ์ฃผ๋ณ€ ๋‹จ์–ด์˜ ๋ฒ”์œ„๋ฅผ window๋ผ ํ•จ

window ์ ‘๊ทผ๋ฒ•

์˜ˆ์‹œ )

“green”์ด target ๋‹จ์–ด์ผ๋•Œ “colorless”๋ถ€ํ„ฐ “ideas”๊นŒ์ง€ ์ฐฝ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๊ณ  ์ด ๋‹จ์–ด๋“ค๋งŒ ๋ณด๋Š” ๊ฒƒ์ด window ์ ‘๊ทผ๋ฒ•

์•ž๊ณผ ๋’ค์—์„œ ๋ช‡ ๋‹จ์–ด๊นŒ์ง€ ๋ณผ์ง€๋Š” ์ง€์ •ํ•ด์ค„ ์ˆ˜ ์žˆ์Œ (=window size)

sliding window

“green”์„ target๋‹จ์–ด๋กœ ๋‘๊ณ  “colorless”๋ถ€ํ„ฐ “ideas”๊นŒ์ง€ ํ•œ๋ฒˆ ๋ณธ ๋‹ค์Œ,
window๋ฅผ ๋ฐ€์–ด์„œ target๋‹จ์–ด๋ฅผ “ideas”๋กœ ๋‘๊ณ  ๋ณธ ๋‹ค์Œ,
๋‹ค์‹œ window๋ฅผ ๋ฐ€์–ด์„œ target๋‹จ์–ด๋ฅผ “sleep”์œผ๋กœ ๋‘๊ณ  ๋ด„

⇒ ์ด๋ ‡๊ฒŒ window๋ฅผ ์ ์ฐจ ์˜†์œผ๋กœ ๋ฐ€๋ฉด์„œ target๋‹จ์–ด๋ฅผ ๊ณ„์† ๋ฐ”๊พธ๋Š” ๋ฐฉ์‹์„ sliding window๋ผ ํ•จ
    ๊ทธ๋ฆฌ๊ณ  ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ง„ window ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ํ•™์Šต๋ฐ์ดํ„ฐ๊ฐ€ ๋จ

 

CBOW๋Š” ๋งฅ๋ฝ์œผ๋กœ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ’ˆ

์ฆ‰, input์œผ๋กœ ์ฃผ๋ณ€ ๋‹จ์–ด๊ฐ€ ์ž…๋ ฅ๋˜๊ณ , ์˜ˆ์ธกํ•ด์•ผํ•˜๋Š” output์€ target๋‹จ์–ด๊ฐ€ ๋จ

์ด ๊ณผ์ •์—์„œ ๋ชจ๋ธ์˜ parameter๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ด๋ ‡๊ฒŒ ํ•™์Šต๋œ parameter๊ฐ€ ๋‹จ์–ด๋“ค์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์ด ๋จ

 

 

ํŒŒ๋ผ๋ฏธํ„ฐ ํ•™์Šต ๋ฐฉ๋ฒ•

  • ์ผ๋ฐ˜์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ•™์Šต๋ฐฉ์‹๊ณผ ๋™์ผ
  • ์ฒ˜์Œ parameter๋Š” ๋žœ๋ค์œผ๋กœ ์ดˆ๊ธฐํ™”๋œ ์ƒํƒœ๋กœ ์‹œ์ž‘ (random initialization)
    ์ด parameter๋กœ ์˜ˆ์ธก → ์‹ค์ œ๊ฐ’๊ณผ ์ฐจ์ด๊ฐ€ ์ƒ๊ธด๋งŒํผ parameter๋“ค์„ ์กฐ์ •
    ์œ„ ๊ณผ์ •์„ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์„ ๋Œ์•„๊ฐ€๋ฉฐ ๋ฐ˜๋ณต (backpropagation)
    ์›๋ฆฌ : gradient descent (cost function์ด ์ตœ์†Œํ™”๋˜๋Š” ์ชฝ์œผ๋กœ parameter ์—…๋ฐ์ดํŠธ)

 

 

์˜ˆ์‹œ )

V = ์‚ฌ์ „์˜ ํฌ๊ธฐ (vocab size = ๋‹จ์–ด ๊ฐœ์ˆ˜)

N = hidden layer size ( = ์‚ฌ์šฉ์ž๊ฐ€ ์ •ํ•œ ์ฐจ์› N )

 

input : one-hot encoding๋œ ๋ฒกํ„ฐ

input์œผ๋กœ ์ฃผ๋ณ€๋‹จ์–ด ์ฆ‰, target๋‹จ์–ด์˜ ์•ž ๋‹จ์–ด๊ฐ€ ๋“ค์–ด๊ฐ. ์ด ๋‹จ์–ด๋Š” V๊ฐœ์˜ ์š”์ˆ˜๊ฐ’ ์ค‘ ํ•˜๋‚˜๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์ธ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋จ

→ ์ด๋ ‡๊ฒŒ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ์˜ ์ฐจ์›์„ ๊ฐ–๋Š” input layer๊ฐ€ hidden layer์—์„œ embedding ํฌ๊ธฐ๋งŒํผ์˜ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ mapping๋จ

 

 

output layer์—์„œ๋Š” ๋‹ค์‹œ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ์˜ ์ฐจ์›์„ ๊ฐ€์ง

output์€ target๋‹จ์–ด์ด๋ฏ€๋กœ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋งŒํผ์˜ ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ (?)

 

 

728x90
๋ฐ˜์‘ํ˜•