Mathematics for ML and DL
Published:
Ever wondered what goes on in neural networks? Ever tried to calculate the gradients of the loss function after each pass? If yes, then you must have realized the role played by mathematical concepts here. If no, then you are at the right place.
In this blog post we will see the topics of Mathematics that are the most crucial for understanding and appreciating machine learning algorithms. Machine Learning has become very popular nowadays with developers and undergraduate college students, given the plethora of courses out there and the ease of building and running neural networks (thanks to google colab!). But, how many of these “machine-learners” actually understand the maths behind what is going on? Is it important? Well, not so much if you just want to play around with it, but an emphatic yes if you want to pursue research projects or come up with your new ideas.
Having said this, let’s dive into our topics.
Linear Algebra
Linear algebra is the study of vectors (n-dimensional in general) and the operations associated with them. In the context of Machine Learning, we use vectors, matrices and tensors to concisely represent data. By doing so, we can eliminate confusions and write the equations occurring in the ML models in a succinct and short-hand form.
The above image represents the computations taking place in a single neuron. We are calculating the weighted sum of the inputs and passing it through a non-linearity (function f() here).
We can observe the compact mathematical representation of the equation in the above image. Here b + x.w is obtained by matrix multiplications and addition which is a topic under Linear algebra.
Now, let’s look at a bigger picture of the entire neural network from LA’s perspective (LA - Linear Algebra and not Los Angeles 😉).
Consider the below NN architecture with 3 units in the input layer, 4 each in 2 hidden layers and 1 unit in the final layer (looks familiar 🤔- hmmm maybe of a binary classification task)
Can we represent the final output in a single line as a function of the input? (Hint - The diagram has weight matrices). The answer is yes. At each layer if we apply the formula shown above and combine all the three then we are done. Seems easy, isn’t it? But how is this going to help? Well, this representation comes handy when we are calculating gradients of the loss function with respect to the network parameters during backpropagation, which we will look at next.
Also, the concepts of basis, eigen-values, eigen-vectors, singular-value-decomposition are very crucial (Ex: they form the pillars of PCA (principal component analysis) which is a very important concept in Machine Learning (we won’t be discussing it here)). Having seen this let’s move on to the next topic i.e. Multivariate Calculus.
Multivariate Calculus
Wikipedia defines Calculus as the mathematical study of continuous change and so it is. Most of us are familiar with the concepts of integration and differentiation in calculus but what role do they have to play here? As said, we can view ML models as a computational box where the inputs is undergoing changes in some stages and finally changing to the output (a complicated way of saying output is some function of the input 😉). Now, in the training process our goal is to reduce the loss (or to optimize the model on the training data). This is where the theory of continuous change comes.
For reducing the loss function we have to tweak the weights and biases (collectively the parameters of the model), more specifically we have to move “opposite to the direction of the gradient”.
Consider this graph (here error surface), let us assume the z-axis represents the error and the x and y axes represent the parameters of the model. Now for reducing the error (or loss), we have to move in the x-y plane in the direction opposite to the gradient of the loss function wrt the parameters x and y (note - I’m not discussing the proof here). This commonly involves vectorial differentiation and the chain-rule over the loss function to some kth layer. Thankfully, we don’t have to perform it ourselves ( we have Tensorflow and pytorch!). However, it is important to know about it.
The image below shows the chain rule of differentiation.
Having a general idea of Calculus helps while carrying out research and building new models from scratch. Having said this, let’s move on to the final topic Probability and Statistics.
Probability and Statistics
the bedrock of ML
Probability is the study of uncertainty. It is a science that quantifies the likely-hood of the occurrence of the events of interest in a given setting. But, how is it related to ML? Well, isn’t ML about developing predictive models from uncertain data? Here is where Probability comes in. Probability is used almost everywhere in ML from defining cross-entropy as the loss function of a classification task, sampling data from specific gaussian distributions, usage in algorithms like Naive Bayes to study of Bayesian Machine Learning (a separate branch of ML based on Bayesian inferences). Having a clear understanding of Probability is a must if you want to excel in ML.
The formula above shows Kullback - Liebler Divergence to calculate the similarity between two probability distributions
Why is Statistics used along with Probability? Statistics is the study of data (includes its collection, organization and analysis). Mean, median, quartiles, variance, and standard deviation which are statistical topics are dealt with in the form of evaluation of expectation values of probability distribution functions along with their other properties.
Since we deal with huge data in ML models, the task of organizing them and preparing them to be fed to the model is dealt by statistics (Ex: We enforce batch normalization before feeding image data to deep CNNs (convolutional neural networks))
The above figure is of a common data distribution considered in ML (the gaussian distribution)
Prob-Stat is indeed the most important topic of maths used in ML.
Resources for Learning
- Linear Algebra -
- Probability and Statistics
Bonus
Conclusion
This post helps the reader to understand why Mathematics is used in ML and how exactly. The explanations provided are a motivation for further learning. I have also provided Resources for each topic discussed.
For more ML and DL content stay tuned!