[Lecture 10] Recurrent Neural Networks

AI/CS231n

[Lecture 10] Recurrent Neural Networks

Nolja놀자 2022. 7. 6. 19:56

one to many : Image Captioning

many to one : sentiment classification

many to many : machine translation

# RNN(Recurrent Neural Network)

1) Vanilla RNN

ht를 구할 때, ht-1가 영향을 준다는 것이 핵심이다.

* 왜 tanh 쓰는지?

2) Computational Graph

BACK PROP에서는 각각의 w를 sum함.

* Many to Many

Loss도 다르게 할 수 있음

* One to many

* seq to seq : many to one + one to many

# Example : Character-level Language model

many to many

* softmax 는 숫자 벡터를 확률 벡터로 만들어주는 함수

- sigmoid : binary-classification에서 사용 -> 총합이 1 아님 -> 값이 큰 출력값이 그 클래스에 해당할 가능성이 높음
- softmax : multi-classification에서 사용 -> 총합이 1 -> 클래스별 기여도 측정

[출처] Softmax vs Sigmoid|작성자 JINSOL KIM

# Truncated Back Prop

모든 데이터를 한번에 back prop하기 어렵기 때문에

sub sequence씩 back prop 하는 trick.

# Image Captioning

CNN + RNN

# Visual Question Answering

# Vanilla RNN Gradient Flow

ht = tanh(Whh * ht-1 + Wxh * xt)

* make it sequence

h0의 gradient는 W의 여러 요소들을 포함한다.

1) Exploding gradient problem : Largest singular value > 1

-> Gradient clipping

2) Vanishing gradient problem

-> change RNN architecture -> LSTM

# LSTM

4개의 게이트 이용한다.

input gate : 얼마나 input하고 싶은지 -> sigmoid(0~1)

forget gate : 얼마나 잊고 싶은지 -> sigmoid(0~1)

output gate : 얼마나 표현(output)하고 싶은지 -> sigmoid(0~1)

gate gate(?) : input cell에 얼마나 쓰고 싶은지 -> tanh(-1~1)

-> 4가지 게이트는 각각 다른 non linearity를 사용한다.

* LSTM 자세히 보기

** Backpropagation

에서 W를 사용하지 않고 Cell state 만 가지고 한다. -> computational efficiency

+ tanh 계산을 하지 않아도 된다.

+ forward시에 cell state 계산할 때 W도 관여를 했으므로 back prop 시에도 W가 고려가 된다.

저작자표시 (새창열림)