Probability vs Statistics

Probability vs Statistics: Who is Who?

This got to be one of the most question people ask me. I say people in general, not students. Students typically do not know the difference, but they don't ask, because as soon as they finish their stats class, they are out and forget about everything that they ever learned.

But there are people out there that have some inclination for math and science, and they are interested overall in data and what we commonly know as statistical data, either from macroeconomics, demographic or scientific data.

Data enthusiasts love working with data, interpreting its consequences, compare things, and that is all great. Hobbyists usually do not see much of a difference, and they see a blurry line between statistics and probability, and they may not know the difference between them.

Likelihood vs probability statistics

First of all let us clarify the relationship between likelihood vs probability. Strictly speaking, probability is formally a number between 0 and 1 which quantifies the chances for a certain event to occur (a probability close to 0 means an event with LITTLE chances to occur, and a probability close to 1 indicates that the event has a VERY GOOD chance to occur).

Now, likelihood refers to exactly the same thing, it is a way of quantifying the chances for an event to occur. Saying a that something has a high likelihood is indicating that something has a good chance to occur.

So one could say that likelihood and probability are the same, only that a probability is a formal way to express a likelihood or a chance for an event to occur. So then, probability is the formal term that expects to be quantified with a number between 0 and 1. Likelihood is a more loose way of referring to a probability.

Example of the Difference between probability and statistics

When seeing some examples you can clearly see the difference between probability vs statistics. Assume that you have a coin that is fair, which has a probability p = 1/2 of getting a head (and then it has a probability 1/2 of getting tail).

If I toss the coin 10 times, I want to compute what is the probability of getting exactly 6 heads. That probability happens to be approximately 0.2051, and it is computed as \(C_{10, 6} 0.5^6 \times 0.5^4 \approx 0.2051\) (Do you know how to compute that??? You need to use the BINOMIAL distribution). That is an example of what probability does.

Now, say you hear claims from people playing the tossing game who feel that the coin cannot be fair, because it lands too many heads. You, as a stats detective, go and examine the coin and notice nothing weird, and then toss the coin 100 times and you get 65 heads.

You say "if the coin was fair, the likelihood of this 65 heads out of 100 tosses would be quite unlikely, so I conclude that there must be foul play and the coin is not fair". That is an example of statistics.

So then, probability as a discipline concerns itself with calculating probabilities using probability distributions that have given population parameters.

Statistics, on the other hand, concern itself with collecting sample information (finding "evidence" as a detective) to make plausibility claims about population parameters.

Difference between mathematical and statistical probability

Once we have a better understanding of how probability theory operates, we start to realize that the use different strategies to construct probability distributions. On the one hand, sometimes certain random phenomena are really easy to model making assumption of equal probability of simple events.

For example, when we toss a dice, and the dice is perfectly symmetric and homogeneous in its construction, one could argue that no side has a higher probability than any other to show. So then you would say that each side has equal probability.

Since there are 6 sides, each with the sample probability, and the total probability is 1, each side must have a probability of 1/6 of showing up. That is an example of mathematical or theoretical probabilities.

Now, assume that you are an engineer and you are building a road in the jungle with your team, in place where it rains a lot. And you know that when it rains the work could be severely affected, especially when there is a storm.

You know don't know if it will rain today, but you know from historical records that out of 10000 days, there has been 300 storms during this season. Then, empirically, you judge that the probability of a storm that day would be 300/10000 = 0.03. That is an example of statistical (or empirical) probability.

In Summary, what is the difference between Probability vs Statistics

Summarizing, probability and statistics stack up as follows:

Probability and statistics are like the two sides of a coin: They are tightly related and they in a sense opposite to each other
Probability tries using different strategies to organize and systematize the probability (likelihood) calculation of events related to a specific phenomenon in a content where events with unknown outcomes occur (like tossing a coin)
Statistics, like a detective, attempts to say something about the probability properties of a certain phenomenon, based on some small information about the phenomenon, based on sample. This is, a few pieces of information are collected and we attempt to make claims about the big picture from them.

Probability vs Statistics