Coursera’dan aldığım derin öğrenme kurslarından anahtar kelimeleri daha sonrası için hafızada tutmak amacıyla bu yazıda paylaşmak istiyorum.
NEURAL NETWORK AND DEEP LEARNING
- Supervised learning with neural network
- Binary classification (örn: logistic regression)
- Logistic regression, SVM -> traditional learning algorithms
- Cost function
- Gradient descent
- Derivatives
- Vectorization and broadcasting
- Numpy, iPython, Jupyter
- Activation functions (Softmax, ReLU, leaky ReLU, tanh (hiberbolik tanjant), swish (a self-gated activation function))
- Forward / Back propagation
- Random initialization
- Shallow neural networks
- CNN (for image)
- Recurrent NN (Sequence data)
- Deep reinforcement learning
- Regression (Standart NN)
- Structured data – Database, Unstructured data – audio, image, text
- Tensorflow, Chainer, Theano, Pytorch
- Normalization
- Standart score
- T test
- Standardized moment
- Coefficient of variance
- Feature scaling (min-max)
- Circuit Theory
- Parameters
- W, b
- Hyperparameters
- learning rate
- #iterations
- #hidden layers
- #hidden units
- activation function
- momentum
- minibatch size
- regularization
IMPROVING DEEP NEURAL NETWORKS: HYPERPARAMETER TUNING, REGULARIZATION AND OPTIMIZATION
- Dataset -> Training / Dev (hold-out validation) / Test sets
- Büyük veri setleri için dağılım 98/1/1 olabilir. Geleneksel olarak 70/30 veya 60/20/20’dir.
- Bias / variance.
- high bias = underfitting
- bigger network (her zaman işe yarar)
- train longer (NN architecture search) (her zaman işe yaramaz)
- high variance = overfitting
- more data
- regularization
- L1 regularization
- L2 regularization (lambd) – daha çok tavsiye ve tercih edilir.
- Dropout regularization (keep prob)
- Data augmentation
- Early stopping
- NN architecture search
- high bias = underfitting
- Speed-up the training
- normalizing the inputs
- subtract mean
- normalize variance
- data vanishing / exploiding gradients
- weight initializion of deep networks
- xavier initialization
- HE initialization
- gradient checking -> backpropagation control
- dont use in training
- dtheta, dtheta approx.
- remember regularization
- does not work with dropout
- normalizing the inputs
- Optimization algorithms
- (stochastic) gradient descent
- momentum
- RMSProp
- Adam
- Mini batch
- Exponentially weighted averages
- Bias correction
- Learning rate decay
- The problem of local optima
- HYPERPARAMETER TUNING
- try random values
- confonets, resnets
- panda babysitting (sistem kaynakları kısıtlı ise) or baby fish (caviar) approach (değilse)
- batch normalization
- covariate shift
- softmax regression
- hardmax biggest 1, the others 0
- Frameworks
- Caffe/Caffe2
- CNTK
- DL4J
- Keras
- Lasagne
- mxnet
- PaddlePaddle
- Tensorflow
- Theano
- Torch
STRUCTURING MACHINE LEARNING PROJECTS
- Orthogonalization (eğitimin yeterince başarılı olması için gereklidir) (radyo ayarlama) (developer set (training)
- fit training set well in cost function
- bigger NN or better optimization algorithms
- fit dev. set well on cost function
- regularization or bigger training set
- fit test set well on cost function
- bigger dev set
- performs well in real world
- dev set is not set correctly, the cost function is not evaluating the right thing
- fit training set well in cost function
- Single number evaluation metric
- P (precision) (toplam doğruluk, %95 kedidir)
- R (Recall) (kedilerin %90’ı doğru bilindi.
- F1 Score – average of precision and recall (F1 değeri yüksek olan daha başarılıdır)
- Satisficing and optimizing metric
- hangisi satisficing hangisi optimizing olacak.
- Train/dev/test sets distribution
- When to change dev/test sets and metrics
- Human level performance
- avoidable bias / bayes optimal error (best possible error)
- reducing bias/variance
- surprassing human-level performance
- ERRORS
- training
- variance
- more data
- regularization (lz, dropout, augmentation)
- NN architecture / hyperparameter search
- dev
- human-level errors
- avoidable bias
- train bigger model
- train longer
- train better optimization algorithms (momentum, RMSProb, Adam)
- NN architecture
- Hyperparameter search
- RNN/CNN
- training
Bugün 1, bugüne kadar toplam 83 kez ziyaret edildi.