In Machine Learning (ML), there are many terms and concepts and many of them may not be used consistently. When you start studying Artificial Intelligence you will come across with so many concepts. In this article, I will try to explain some of those concepts in a simple way.
If you’re like me, you like simple and direct explanations. With that in mind, I will explain the two types of parameters in ML, and then give a clear explanation of the different hyperparameters.
There are two types of parameters: model parameters and hyperparameters.
Model parameters are the parameters that the algorithm adjusts during the training phase. In Machine Learning, we train a model to predict or classify data. These models have parameters that are tuned during the training phase. In this phase, we feed the model with some data and the model tries to adjust the model parameters in order two make right predictions/classifications. In other words, the model has different parameters, that form a mathematical formula and these parameters need to be learned from the data. Some examples are weights and biases.
The other type of parameters is hyperparameters. These cannot be directly learned from the training process. These values are set by the data scientist before the training begins. A normal process in ML is, set different values for these parameters, train the different models and choose the values that give the best results.
Imagine you’re trying to tune your radio by turning the knobs. In ML you turn the knobs (hyperparameters) in order to get the best model.
Two of the main considerations when deciding the values for the hyperparameters are the time required to train and the amount of memory available.
There are many types of hyperparameters and different types of neural networks require different hyperparameters. Some of them are:
Learning rate: the learning rate quantifies the learning progress of a model in a way that can be used to optimize its capacity. Determines the speed that the parameters are updated. If it’s too high, the model might pass through a good solution.
Number of Epochs: it’s related to the number of times that the process of optimization occurs. One epoch means that the algorithm went through all the data once.
Batch Size: defines the number of samples that will be propagated in the network. Basically, is the number of data samples that the network will train on before updating the model parameters. In essence, it’s a for-loop iterating over one or more samples and making predictions. At the end of each batch, the predictions are compared to the expected values and the error is calculated.
Activation Function: is a function that converts an input signal into an output signal. In simple words, the activation functions decide if the input is important or not. Based on the input, the model uses the activation function to decide whether the neuron should fire or not.
Loss Function: tells us how far is the predicted result from the expected result. Based on this function, the model can tune the model parameters in order to minimize the result of the loss function
Dropout: with dropout, the model ‘ignores’ a certain set of neurons, that are chosen randomly. It’s used to prevent overfitting. Overfitting occurs when the model does very well with the training data, but with unseen data, it doesn’t perform that well. In overfitting, the model will know in detail the training data.
Validation percentage: normally, you will divide the dataset into three datasets. One for training, one for testing and one for validation. The validation dataset is used to minimize overfitting. If you only train the network with training data, most probably the network will only be good at predicting the training data values. By giving a validation data, the network will predict on data that it hasn’t seen.
Optimizer: optimizers help to minimize (or maximizing) the objective function. The overall purpose of the optimizer is to find a set of parameters that minimize (or maximize) the objective function.
These are some of the many parameters in Machine Learning.
There are no right or wrong values for each of these parameters.
You have to do a lot of tests, with different sets of values and check which are the best values for each parameter. It’s really a trial and error process.