๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ธ๊ณต์ง€๋Šฅ

[Day 15] DL Basic - Generative Model โ…  & โ…ก

๐Ÿ“ Generative Model โ…  

    - Generative model์˜ ๊ฒฝ์šฐ์—๋Š” ๋‹จ์ˆœํžˆ ๋งŒ๋“ค์–ด ๋‚ด๋Š”๊ฒƒ๋ณด๋‹ค ๋งŽ์€ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

      -> Generative(์ƒ˜ํ”Œ๋ง, ์ƒ์„ฑ, sampling)

      -> Density estimation(์ด์ƒ ๊ฐ์ง€, anolmaly detection)

         => ํ•ด๋‹น ๋‘๊ฐ€์ง€๋ฅผ ๋‹ค ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด explicit model์ด๋ผ๊ณ  ๋ถˆ๋ฆผ

      -> unsupervised representation learning(feature learning)

 

  ๐Ÿ”ฅ How can we represent p(x)?

    - Bernoulli distribution: (biased) coin flip

      -> ํ•˜๋‚˜์˜ ์ˆ˜๊ฐ€ ์žˆ์œผ๋ฉด ๋จ - ๋™์ „ ์•ž๋ฉด ํ™•๋ฅ  = p <-> ๋’ท๋ฉด ํ™•๋ฅ  = 1 - p

    - Categorical distribution: (biased) m-sided dice

      -> 6์ฃผ์‚ฌ์œ„์˜ ๊ฒฝ์šฐ 5๊ฐœ์˜ ์ˆซ์ž๊ฐ€ ํ•„์š”(sum = 1)

 

    - ํ•˜์ง€๋งŒ ํ˜„์‹ค์—์„œ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋„ ์–ด๋งˆ์–ด๋งˆํ•˜๊ฒŒ ๋งŽ์€ parameter๊ฐ€ ํ•„์š”ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ค„์ผ ์ˆ˜ ์žˆ์„๊นŒ?

      -> ๋งŒ์•ฝ ๋ชจ๋“  ๋ณ€์ˆ˜๊ฐ€ ๋…๋ฆฝ์ ์ด๋ผ๋ฉด n๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด ๊ฒฝ์šฐ ์›ํ•˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ๋ณด์—ฌ์งˆ ์ˆ˜ ์—†๋‹ค.

      -> ๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ? ์ค‘๊ฐ„ ์ •๋„๋ฅผ ์ฐพ์•„๋ณด์ž!

    => Conditional Independence

      -> Chain rule๊ณผ Bayes' rule, Conditional independence๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

 

  ๐Ÿ”ฅ Auto-regressive model

    - 28x28 binary pixels์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ชฉํ‘œ๋ฅผ p(x) = p(x1,x2,...,x784)๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ.(x ∈{0,1}^784)

    - chain rule์„ ์ด์šฉํ•ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

    - ์ˆœ์„œ๋ฅผ ๋ฉ”๊ฒจ์•ผ ํ•œ๋‹ค - ์–ด๋–ป๊ฒŒ ๋ฉ”๊ธฐ๋Š”์ง€์— ๋”ฐ๋ผ ์„ฑ๋Šฅ, ๋ชจ๋ธ์ด ๋‹ฌ๋ผ์ง

    - i-1๊นŒ์ง€์˜ ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๋ชจ๋ธ์ด๋‹ค.

 

  ๐Ÿ”ฅ NADE: Neural Autoregressive Density Estimator

    - i ๋ฒˆ์งธ๋ฅผ ์•ž์˜ ์š”์†Œ์— Dependentํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ.

      -> Neural Network์˜ ์ž…์žฅ์—์„œ๋Š” ์ž…๋ ฅ๊ฐ’์˜ ์ฐจ์›(Weight)์ด ๊ณ„์† ๋‹ฌ๋ผ์ง€๊ฒŒ ๋จ.

    - Explicit model: Generate๋งŒ ํ•˜๋Š”๊ฒƒ์ด ์•„๋‹ˆ๋ผ ํ™•๋ฅ ๋„ ๊ตฌํ•จ.(P(xi | x1:i-1))

    - Continuous์ผ ๊ฒฝ์šฐ, Gaussian์˜ mixture๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

  ๐Ÿ”ฅ Pixel RNN

    - auto-regressive model๋กœ ordering์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•œ๋‹ค.(Row LSTM, Diagonal BiLSTM)

 

 

๐Ÿ“ Generative Model โ…ก

  ๐Ÿ”ฅ Variational Auto-Encoder

    - VI(Variational Inference): ๋ชฉํ‘œ๋Š” Posterior deistribution์— ์ตœ๊ณ ๋กœ ์ž˜ ๋งž๋Š” variational distribution์„ ์ตœ์ ํ™” ํ•˜๋Š” ๊ฒƒ

      *Posterior distribution: Pθ(z|x)  z = latent vatiable

        -> obsevation์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ด€์‹ฌ์žˆ๋Š” random variable์˜ ํ™•๋ฅ ๋ถ„ํฌ 

        -> ์‚ฌ์‹ค์ƒ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅ ์— ๊ฐ€๊นŒ์šธ ๋•Œ๊ฐ€ ๋งŽ์Œ

        -> P(x|z)๋Š” ๋Œ€๊ฒŒ likelihood๋ผ๊ณ  ๋ถˆ๋ฆผ

      *Variational distribution(qΦ(z|x))

        -> Posterior์„ ์ฐพ๋Š” ๊ฐ€์žฅ ๊ทผ์‚ฌํ•œ ๊ฐ’์„ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ๋ถ„ํฌ

        -> Loss function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”(KL divergence์™€ True posterior์‚ฌ์ด์—์„œ)

    

    - ํ•˜์ง€๋งŒ targer์„ ๋ชจ๋ฅด๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ์ฐพ์•„ ๋‚˜๊ฐˆ ๊ฒƒ์ธ๊ฐ€?

      -> ELBO trick์„ ์‚ฌ์šฉํ•ด๋ณด์ž.

 

    - ELBO๋ฅผ ๊ฐ€์ ธ์™€ ๋ณด๊ฒŒ๋˜๋ฉด Reconstruction Term๊ณผ Prior Fitting Term์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ง„๋‹ค.

      -> Reconstruction Term: ์—„๋ฐ€ํ•œ ์˜ใ…ฃ๋กœ๋Š” inexplicit model์œผ๋กœ auto-encoder์˜ loss function์„ ๋งํ•จ.

      -> Prior Fitting Term: ์ ๋“ค์˜ ๋ถ„ํฌ๊ฐ€ ์‚ฌ์ „๋ถ„ํฌ์™€ ๋น„์Šทํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ.

ELBO๋ฅผ ๋‘๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

    - ํ•œ๊ณ„์ 

      -> ๊ฐ€๋Šฅ๋„ ๊ณ„์‚ฐ์ด ์–ด๋ ค์›€(Intractable model)

      -> Prior Fitting Term์€ ๋ฐ˜๋“œ์‹œ ๋ฏธ๋ถ„(Differentiable)๊ฐ€๋Šฅ ํ•ด์•ผํ•œ๋‹ค. -> diverse latent prior distribuions๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ค์›€

      -> ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ isotropic gaussian์„ ์‚ฌ์šฉํ•จ(Prior distribution์œผ๋กœ)

 

    - Adversarial Auto-encoder์ด ๊ฐœ๋ฐœ๋จ - variational Auto-encoder์— ์žˆ๋Š” prior fitting term์„ GAN objective๋กœ ๋ณ€๊ฒฝ

 

  ๐Ÿ”ฅ GAN(Generative Adversarial Network)

 

    - Generator์™€ Distriminator๊ฐ€ ๊ฐ™์ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ด์œผ๋กœ ํ•™์Šต์ด ๋จ(min/max๊ฒŒ์ž„)

    - VAE์™€ ๋น„๊ตํ•˜๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค.

 

    - Discriminator์˜ ๊ฒฝ์šฐ ์ตœ์ ํ™”๋Š” ๋‹ค์Œ ์‹๊ณผ ๊ฐ™๋‹ค.

max๋ฅผ ํ–ฅํ•ด ํ•™์Šต

    - Generator์˜ ๊ฒฝ์šฐ ์ตœ์ ํ™”๋Š” ๋‹ค์Œ ์‹๊ณผ ๊ฐ™๋‹ค.

Jenson-Shannon Divergence๋ฅผ ์ตœ์†Œํ™” ์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต

 

  ๐Ÿ”ฅ DCGAN: DNN์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๊ณ  Leaky-ReLU๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹๋‹ค์™€ ๊ฐ™์€ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ด

 

  ๐Ÿ”ฅ Info-GAN: GAN์˜ ํŠน์ • ๋ชจ๋“ˆ์— ์ง‘์ค‘ํ•˜๊ฒŒ ํ•ด์คŒ.

 

  ๐Ÿ”ฅ Text2Image: ๋ฌธ์ž๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ

 

  ๐Ÿ”ฅ Puzzle-GAN: ์ด๋ฏธ์ง€์˜ ์ผ๋ถ€๋ถ„(Sub patch)๋ฅผ ๊ฐ€์ง€๊ณ  ๋ณต์›ํ•˜๋Š” GAN

 

  ๐Ÿ”ฅ Cycle-GAN: ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ ๋„๋ฉ”์ธ์„ ๋ฐ”๊ฟ”์คŒ. Cycle-consistency loss - GAN๊ตฌ์กฐ๊ฐ€ 2๊ฐœ

 

  ๐Ÿ”ฅ Star-GAN: ๋ชจ๋“œ๋ฅผ ์ •ํ•ด์ค„ ์ˆ˜ ์žˆ์Œ

 

  ๐Ÿ”ฅ Progressive-GAN: ๊ณ ์ฐจ์›์˜ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ƒ„, 4x4๋ถ€ํ„ฐ 1024x1024๊นŒ์ง€ ๋Š˜๋ ค๊ฐ€๋ฉฐ trainingํ•จ

'์ธ๊ณต์ง€๋Šฅ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Day 17] NLP - RNN & LSTM & GRU  (0) 2021.02.16
[Day 16] NLP - Bag-of-Words & Words Embedding  (0) 2021.02.16
[Day 14] Math for AI - RNN  (0) 2021.02.04
[Day 13] DL Basic - CNN & Computer Vision Applications  (0) 2021.02.03
[Day 12] Math for AI - Convolution  (0) 2021.02.02