The chain rule of probability is a theory that allows one to calculate any member of a joint distribution of random variables using conditional probabilities. It is pretty important that you understand this if you are reading any type of Bayesian literature (you need to be able to describe probability distributions in terms of conditional probability), so I am going to derive it here.

Given the joint probability distribution:

We can use the definition of conditional probability to factor the joint probability as follows:

So lets take N=4

And it its generic form:

After you derive it, it’s actually pretty easy to understand and remember.