[Lecture 6] Training Neural Networks I

Nolja놀자 2022. 5. 18. 14:50

2022. 5. 18. 14:50

이전 복습

오늘 배울 것

1) 셋업

: activation functions, preprocessing, weight initialization, regularization, gradient checking

2) training dynamics

: baby sitting learning process, parameter updates, hyperparameter optimization

3) evaluation

: model ensembles

Activation Functions

* sigmoid

- output을 [0. 1] 내로 귀결시킨다.

[ 문제점 ]

- x가 너무 작거나 너무 크면(saturated) gradient 사라지는 문제발생

- sigmoid output이 zero-centered가 아니다.

- exp() 계산 시간이 소요된다.

* tanh(x)

[ 해결된 것 ]

- zero-centered임

[ 문제점 ]

- x가 너무 작거나 너무 크면(saturated) gradient 사라지는 문제발생

* ReLU

[ 해결된 것 ]

- does not saturate

- 계산 효율적

- converge 하기에 더 빠름

- 생물학적으로 더 그럴듯하다.

- > AlexNet에서 사용

[ 문제점 ]

- not zero-centered

- 음수에서는 zero-gradient

dead-ReLU :

because of bad initialization, too high learning rate,

* Leaky ReLU

[ 해결된 것 ]

- does not saturate

- 계산 효율적

- converge 하기에 더 빠름

- 생물학적으로 더 그럴듯하다.

- 음수에서도 gradient가 죽지 않는다.

* ELU

- 모든 ReLU의 장점

- exp() 계산 시간이 소요된다.

* tip

Data preprocessing

1) preprocess the data

normalize -> zero-centering, PCA, whitening

In practice) subtract the mean image

Weight Initialization

모든뉴런에서 동일한 gradient 가짐

1) small random numbers

-> small network에서는 괜찮지만, deep network에서는 not work well

all activations become zero!

Batch Normalization

input의 각각의 dimension을 가우시안 분포로 만드는 작업

fully-connected layer나 convolution layer 뒤에 붙인다.

nonlinearity 앞에 붙인다.

Babysitting the Learning Processing

1) data preprocessing

2) choose the architecture

tuning learning rate

cross-validation strategy

lr 너무 커서, reg 많이 안들어가서?

hyperparameter 조정 : 네트워크, Learning rate, ..

loss curve 보기

weight가 너무 큰지 확인

저작자표시 (새창열림)

'AI > CS231n' 카테고리의 다른 글

[Lecture 8] Deep Learning Software (0)	2022.07.05
[Lecture 7] Training Neural Networks II (0)	2022.07.01
[Lecture 5] Convolutional Neural Networks (0)	2022.05.17
[Lecture 4] Introduction to Neural Networks (0)	2022.05.17
[Lecture 3] Loss Functions and Optimization (0)	2022.05.15

가전제품 이것저것 이야기

[Lecture 6] Training Neural Networks I

'AI > CS231n' 카테고리의 다른 글

+ Recent posts

티스토리툴바