๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ธ๊ณต์ง€๋Šฅ

[Day 13] DL Basic - CNN & Computer Vision Applications

๐Ÿ“ Convolutional Neural Network

  - ์—ฐ์†ํ˜• ๋ณ€์ˆ˜, ์ด์‚ฐํ˜• ๋ณ€์ˆ˜, 2์ฐจ์› ์ด๋ฏธ์ง€์ผ ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์€ convolution ์—ฐ์‚ฐ์‹์œผ๋กœ ์ด๋ฃจ์–ด ๊ณ„์‚ฐํ•˜๊ฒŒ ๋œ๋‹ค.

  - 2D image convolution์„ ์ด์šฉํ•ด ์ด๋ฏธ์ง€์— Blur, Emboss, Outline๋“ฑ ๋‹ค์–‘ํ•œ ํšจ๊ณผ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋จ.

In 2D, 'I' is image, 'K' is Kernel.

 

  ๐Ÿ”ฅ CNN

    - Convolution Layer, Pooling Layer, Fully Connected Layer๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ.

    - Conv. layer ๊ณผ Pooling Layer์€ Feature extraction ์—ญํ• ์„ ํ•œ๋‹ค.

    - FC์˜ ๊ฒฝ์šฐ decision making์—ญํ• ์„ ํ•˜์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์ถ”์„ธ์ด๋‹ค

      WHY? FC๋กœ ๋„˜์–ด๊ฐ€๋ฉด์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ์—„์ฒญ๋‚˜๊ฒŒ ๋งŽ์•„์ง€๊ฒŒ ๋˜๋Š”๋ฐ ๊ทธ๊ฒƒ์„ ์ค„์ด๊ธฐ ์œ„ํ•จ

 

    - Stride: ํ•„ํ„ฐ๋ฅผ ์”Œ์šธ ๋•Œ ๊ฑด๋„ˆ๋›ฐ๋Š” ๋งŒํผ์„ ๋‚˜ํƒ€๋ƒ„

 

    - Padding: ๊ฐ’์„ ๋ง๋ฐ์–ด์ฃผ๋Š” ๊ฒƒ -> ์ด๋ฏธ์ง€์˜ ๊ฐ€์žฅ์ž๋ฆฌ๋„ ๊ฐ€์ ธ๊ฐ€๊ธฐ ์œ„ํ•จ.

 

๋Œ€๊ฒŒ zero padding์„ ์‚ฌ์šฉ

 

    - ์˜ˆ์‹œ) ๋‹ค์Œ ๊ทธ๋ฆผ์—์„œ Parameter ์ˆ˜? 

      -> 3X3X128 ์ปค๋„๋กœ convolutionํ•˜๊ณ  ์—ฐ์‚ฐ ๊ฒฐ๊ณผ์˜ channel์ˆ˜๊ฐ€ 64์ด๋‹ค.

      -> ๋”ฐ๋ผ์„œ, 3 X 3 X 128 X 64 = 73,728๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ๋‹ค.

 

    - 1 X 1 convolution: Dimension reduction(์ฑ„๋„ ์ˆ˜ ์ค„์ž„(ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜)), layer๋ฅผ ๊นŠ๊ฒŒ ์Œ“์œผ๋ฉด์„œ parameter๋ฅผ ๋งŽ์ด ์ค„์ž„(Bottlenect architecture)

 

๐Ÿ“ Modern Convolutional Neural Networks

  ๐Ÿ”ฅ AlexNet(2012, Parameter = 60M)

    - ILSVRC์—์„œ ์ˆ˜์ƒํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ์ดํ›„๋กœ DL์ด ์ž๋ฆฌ์žก์Œ

    - ํŠน์ง•: Network๊ฐ€ 2๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ์Œ(GPU์˜ ํ•œ๊ณ„๋กœ ๋‚˜๋ˆ„์–ด ํ•™์Šต)

      -> 11 X 11 ํ•„ํ„ฐ ์‚ฌ์šฉ๊ณผ 5๊ฐœ์˜ Conv Layer์™€ 3๊ฐœ์˜ Dense layer(FC)

    - Key idea: ReLU, Data augmetation, Dropout์„ ์‚ฌ์šฉ

      -> Local Response Normalization(LRN), Overlapping pooling ์‚ฌ์šฉ

 

  ๐Ÿ”ฅ VGGNet(2014, parameter = 110M)

    - 3 X 3 filter ์‚ฌ์šฉํ–ˆ๊ณ  FC๋ฅผ ์œ„ํ•ด 1 X 1 filter๋ฅผ ์‚ฌ์šฉํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ์ค„์ž„

 

  ๐Ÿ“Œ ์™œ 3X3์„ ์‚ฌ์šฉํ•˜๋‚˜์š”?

    - Receptive field๊ฐ€ ์ข‹์Œ -> 3X3์„ 2๊ฐœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ 5X5 ํ•˜๋‚˜ ์‚ฌ์šฉ๊ณผ ๋™์ผ(Parameter ์ˆ˜๊ฐ€ ์ž‘์Œ)

    - ์ดํ›„๋กœ ํ•„ํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ 7์„ ์ž˜ ๋„˜์–ด๊ฐ€์ง€ ์•Š์Œ

 

 

  ๐Ÿ”ฅ GoogLeNet(2015, parameter = 4M)

    - 22 Layers๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ  NiN(Network in Network)๋ฅผ ํ™œ์šฉ(Inception block)

 

    - Inception block: ํ•˜๋‚˜์˜ Input์— ๋Œ€ํ•ด Receptive field๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ ์ง€๊ณ  ๊ทธ๊ฒƒ์„ concatenationํ•˜๊ธฐ์— ์ข‹์€ ์„ฑ๋Šฅ

      -> 1X1 filter๊ฐ€ ์กด์žฌํ•ด์„œ Parameter์˜ ์ˆ˜๊ฐ€ ์ค„์–ด๋“ฌ

 

 ๐Ÿ“Œ ์™œ 1X1์„ ์‚ฌ์šฉํ•˜๋ฉด Parameter์˜ ์ˆ˜๊ฐ€ ์ค„์–ด๋“œ๋‚˜์š”?

    - ๋‹ค๋ฅธ ํ•„ํ„ฐ๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๊ธฐ ์ „์— 1X1 filter๋กœ ์ฑ„๋„์ˆ˜๋ฅผ ์ค„์—ฌ ์ค„ ์ˆ˜ ์žˆ๊ณ  ์ด๋กœ ์ธํ•œ parameter๊ฐ€ ์•ฝ 30%๋กœ ๊ฐ์†Œ

 

  ๐Ÿ”ฅ ResNet(2015)

    - Deeper NN์ผ์ˆ˜๋ก ํ•™์Šต์ด ์–ด๋ ค์›€(Not Overfitting) -> ์ธต์ด ์Œ“์ผ์ˆ˜๋ก ํ•™์Šต์ด ์ž˜์•ˆ๋˜๊ณ  ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง

    => ๊ทธ๋ ‡๋‹ค๋ฉด Skip connection์„ ์ถ”๊ฐ€ํ•ด๋ณด์ž! -> ํ•™์Šต์„ ๋” ์ž˜ ์‹œํ‚ด

    - Batch Norm์„ Activation Function ์•ž์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด ํŠน์ง•

    - Bottlenect architecture์„ ์‚ฌ์šฉํ•จ. -> ๊ฐˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋†’์•„์ง€๊ณ , parameter ์ˆ˜๋Š” ์ž‘์•„์ง

 

  ๐Ÿ”ฅ DenseNet

    - ResNet์ด ๋”ํ•ด์ฃผ๋Š” ์—ฐ์‚ฐ์„ ํ–ˆ๋‹ค๋ฉด, DenseNet์€ Concatenation์„ ํ•ด์คŒ

      -> ์ด๋ ‡๊ฒŒ ๋˜๋ฉด Channel์ˆ˜๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜์–ด๋‚จ(Parameter์ˆ˜๋„ ๊ฐ™์ด ๋Š˜์–ด๋‚จ)

    => ๊ทธ๋Ÿผ, ์ค‘๊ฐ„์— ํ•œ๋ฒˆ์”ฉ Channel์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ์ž!

    - Dense Block + Transition Block์„ ํ•ฉ์นจ

 

๐Ÿ“ Computer Vision Apllications(Semantic Segmentation and Detection)

  ๐Ÿ”ฅ Semantic Segmentation

    - ์–ด๋–ค ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์„๋•Œ ํ”ฝ์…€๋งˆ๋‹ค ๋ถ„๋ฅ˜๋ฅผ ํ•˜๋Š” ๊ฒƒ(ex. ์ž์œจ ์ฃผํ–‰์— ์‚ฌ์šฉ ๊ฐ€๋Šฅ)

    - Dense layer๋ฅผ ์—†์• ๊ณ  Fully Convolutional Network๋งŒ๋“ฌ -> parameter์ƒ์œผ๋กœ๋Š” ์™„์ „ํžˆ ๋˜‘๊ฐ™์Œ

    ** ๊ทธ๋ ‡๋‹ค๋ฉด ์™œ ์ด๋ ‡๊ฒŒ ํ•˜๋Š” ๊ฑด๊ฐ€์š”?

      -> ๋ถ„๋ฅ˜๋งŒ ๊ฐ€๋Šฅํ–ˆ๋˜ ๋ชจ๋ธ์ด segmentation์ด๋‚˜ heatmap์ด ๋งŒ๋“ค์–ด์งˆ ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์ด ์ƒ๊น€

 

  ๐Ÿ”ฅ R-CNN

    - region proposals -> compute feature -> Classification(SVM)

      -> ์ด๋ฏธ์ง€์—์„œ ๋ฝ‘์€ poroposals๋ฅผ ์ „๋ถ€ CNN์— ๋„ฃ์–ด ๊ณ„์‚ฐํ•ด์•ผํ•จ(์‹œ๊ฐ„ ↑)

 

  ๐Ÿ”ฅ SPPNet

    - ์ด๋ฏธ์ง€ ์•ˆ์—์„œ ๋ฝ‘์€ Bounding Box์˜ tensor๋งŒ ๋“ค๊ณ ์™€ CNN์—์„œ ํ•œ๋ฒˆ๋งŒ ๋Œ๋ฆผ

 

  ๐Ÿ”ฅ Fast R-CNN

    - SPP์™€ ๊ฑฐ์˜ ๋™์ผํ•œ ์ž‘๋™ ๋ฐฉ๋ฒ•์„ ๊ฐ€์ง€๋‚˜, ๋’ค๋‹จ์— NN์„ ์‚ฌ์šฉํ•ด ์‹œ๊ฐ„์„ ๋Œ์–ด์˜ฌ๋ฆผ

 

  ๐Ÿ”ฅ Faster R-CNN

    - Region Proposal๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ฌ(RPN)

 

  ๐Ÿ”ฅ YOLO(You Only Look Once)

    - Extremely fast Object detection algorithm

    - ์ด๋ฏธ์ง€๋ฅผ ๋”ฑ ์ฐ์–ด์„œ ํƒ์ง€(Region proposal์˜ step์ด ์—†์–ด ๋น ๋ฆ„)

    - S x S x (B*5 + C)

 

'์ธ๊ณต์ง€๋Šฅ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Day 15] DL Basic - Generative Model โ…  & โ…ก  (0) 2021.02.06
[Day 14] Math for AI - RNN  (0) 2021.02.04
[Day 12] Math for AI - Convolution  (0) 2021.02.02
[Day 12] DL Basic  (0) 2021.02.02
[Day 11] DL Basic  (0) 2021.02.02