Etiket arşivi: deep learning notes

Derin Öğrenme Coursera Notları (Deep Learning Coursera Notes)

Coursera’dan aldığım derin öğrenme kurslarından anahtar kelimeleri daha sonrası için hafızada tutmak amacıyla bu yazıda paylaşmak istiyorum.


NEURAL NETWORK AND DEEP LEARNING

  • Supervised learning with neural network
  • Binary classification (örn: logistic regression)
  • Logistic regression, SVM -> traditional learning algorithms
  • Cost function
  • Gradient descent
  • Derivatives
  • Vectorization and broadcasting
  • Numpy, iPython, Jupyter
  • Activation functions (Softmax, ReLU, leaky ReLU, tanh (hiberbolik tanjant), swish (a self-gated activation function))
  • Forward / Back propagation
  • Random initialization
  • Shallow neural networks
  • CNN (for image)
  • Recurrent NN (Sequence data)
  • Deep reinforcement learning
  • Regression (Standart NN)
  • Structured data – Database, Unstructured data – audio, image, text
  • Tensorflow, Chainer, Theano, Pytorch
  • Normalization
    • Standart score
    • T test
    • Standardized moment
    • Coefficient of variance
    • Feature scaling (min-max)
  • Circuit Theory
  • Parameters
    • W, b
  • Hyperparameters
    • learning rate
    • #iterations
    • #hidden layers
    • #hidden units
    • activation function
    • momentum
    • minibatch size
    • regularization

IMPROVING DEEP NEURAL NETWORKS: HYPERPARAMETER TUNING, REGULARIZATION AND OPTIMIZATION

  • Dataset -> Training / Dev (hold-out validation) / Test sets
    • Büyük veri setleri için dağılım 98/1/1 olabilir. Geleneksel olarak 70/30 veya 60/20/20’dir.
  • Bias / variance.
    • high bias = underfitting
      • bigger network (her zaman işe yarar)
      • train longer (NN architecture search) (her zaman işe yaramaz)
    • high variance = overfitting
      • more data
      • regularization
        • L1 regularization
        • L2 regularization (lambd) – daha çok tavsiye ve tercih edilir.
        • Dropout regularization (keep prob)
        • Data augmentation
        • Early stopping
      • NN architecture search
  • Speed-up the training
    • normalizing the inputs
      • subtract mean
      • normalize variance
    • data vanishing / exploiding gradients
    • weight initializion of deep networks
      • xavier initialization
      • HE initialization
    • gradient checking -> backpropagation control
      • dont use in training
      • dtheta, dtheta approx.
      • remember regularization
      • does not work with dropout
  • Optimization algorithms
    • (stochastic) gradient descent
    • momentum
    • RMSProp
    • Adam
  • Mini batch
  • Exponentially weighted averages
  • Bias correction
  • Learning rate decay
  • The problem of local optima
  • HYPERPARAMETER TUNING
    • try random values
    • confonets, resnets
    • panda babysitting (sistem kaynakları kısıtlı ise) or baby fish (caviar) approach (değilse)
    • batch normalization
    • covariate shift
    • softmax regression
    • hardmax biggest 1, the others 0
  • Frameworks
    • Caffe/Caffe2
    • CNTK
    • DL4J
    • Keras
    • Lasagne
    • mxnet
    • PaddlePaddle
    • Tensorflow
    • Theano
    • Torch

STRUCTURING MACHINE LEARNING PROJECTS

  • Orthogonalization (eğitimin yeterince başarılı olması için gereklidir) (radyo ayarlama) (developer set (training)
    • fit training set well in cost function
      • bigger NN or better optimization algorithms
    • fit dev. set well on cost function
      • regularization or bigger training set
    • fit test set well on cost function
      • bigger dev set
    • performs well in real world
      • dev set is not set correctly, the cost function is not evaluating the right thing
  • Single number evaluation metric
    • P (precision) (toplam doğruluk, %95 kedidir)
    • R (Recall) (kedilerin %90’ı doğru bilindi.
    • F1 Score – average of precision and recall (F1 değeri yüksek olan daha başarılıdır)
  • Satisficing and optimizing metric
    • hangisi satisficing hangisi optimizing olacak.
  • Train/dev/test sets distribution
  • When to change dev/test sets and metrics
  • Human level performance
    • avoidable bias / bayes optimal error (best possible error)
    • reducing bias/variance
    • surprassing human-level performance
  • ERRORS
    • training
      • variance
      • more data
      • regularization (lz, dropout, augmentation)
      • NN architecture / hyperparameter search
    • dev
    • human-level errors
      • avoidable bias
      • train bigger model
      • train longer
      • train better optimization algorithms (momentum, RMSProb, Adam)
      • NN architecture
      • Hyperparameter search
      • RNN/CNN