How many eggs will I get? Linear Regression can tell you.

Allan Bond
6 min readJun 22, 2021

Have you heard the expression don’t count your chickens before they hatch? Well, now you don't have to with linear regression. With this introduction you can learn more about linear regression and how to explain it simply to your child or just for your own learning.

Photo by Karim MANJRA on Unsplash

So you want to know how many chickens, or eggs rather, you are going to get over a given time? Well, linear regression is here to save the day!

For this exercise, we will be using what is called simple linear regression. Don’t worry too much about the simple part, it only refers to only having one determinant factor i.e. the number of chickens, rather than more than one.

What do chickens have to do with linear regression?

Not much really, they actually prefer exploratory data analysis (they peck around until they find something interesting).

Jokes aside, there is a lot you can do with data obtained from chickens that can help you learn about linear regression.

Key points about chickens:

  • They lay eggs
  • We can count eggs
  • Many breeds of chicken lay one egg per day
  • Heritage breeds may not lay all year round

So as you can see there are some things about chickens that means that the ratio of eggs per day per chicken is not always going to be 1 to 1.

A chicken could be a rooster (cannot lay eggs), may be going through a moult (won’t always lay eggs), may go off lay (e.g. during winter), or could go broody (babies might be on the way!) All these factors can reduce the number of eggs that a flock of chickens will produce on a given day, month, or year.

What is linear regression?

The chicken and the egg is just one example of a linear relationship. Others include:

  • Buying fruit — Price may drop per unit when buying in bulk
  • Fuel consumption while driving (fuel goes down as you drive)
  • How many jelly beans in a jar given its weight

Let’s get back to the chicken and egg example. The following image shows a made up chicken and egg story where, as we increase the number of chickens, we also get more eggs. Notice that we do not get exactly 100 eggs from 100 chickens? This goes back to the factors we mentioned at the start, where we could have roosters, or some chickens may not be laying for a variety of reasons.

Image by Author

“Surely we would know how many chickens we have and how many of them are roosters?” I hear you ask. True, but imagine a business that has 100, 1000, or even 10000 chickens. They may want to see how many eggs they are getting per chicken and a snapshot of production as seen above would help them do that.

Another useful plot would be a time series showing eggs laid per day, week, month, or year.

Image by Author (egg production for 10 chickens per month)

In the plot above you can see a little bit of up and down in the amount of eggs being laid per day even over a month. The red line going through the middle is called the regression line and is the average of the data that is displayed. Each regression line is represented mathematically using a linear equation in the form of:

y = mx + c

where y is the predicted value (eggs), m is the slope of the line, x is the determinant (days) and c is the y-intercept (where the regression line crosses the y axis).

Out of interest, the equation of the line for the data we have is:

daily egg  production = 6.87 x day + 9.24

So on day 5 we can expect approximately 43–44 eggs from our 10 chickens, which is not too bad since 50 would have been the maximum if they were all laying one egg per day.

So we have a equation! But is the equation reliable?

Not every data set that is out there is going to be as ‘neat’ as the one I generated. I deliberately made it so that the values go up with each day and there are no outliers in the data that could influence our model.

One way of seeing if the line is a good fit for the data is something called the R-squared value, or coefficient of determination. In short, this tells you how well the y value is predicted by the x value, and is therefore a measure of correlation.

For example the R-squared value for our egg laying model is 0.9996, which is very very close to 1! This means that the number of eggs being produced is almost entirely determined by the number of days that have occurred.

Now imagine that for a brief period of time, there are no chickens laying eggs and some weird stuff was happening on the farm. What could that look like?

Image by Author

Now take a minute to think about what could be going on with the chickens on this farm. There’s a good chance that in this case we may have a record keeping problem and someone is making up the data… maybe it’s the chickens. I suspect fowl play!!

We did some checking and it looks like the author may be getting creative with the data. That’s right, it was me! And I would do it again in the name of education!

So back to the point. The R-squared value for the dodgy plot is a low 0.2905 which means that, while the number of days has some influence on the number of eggs being laid, it is far from being the only factor.

So what does this mean?

It means that you should always check to see if the data in front of you is actually legit, but even more so, it’s important to understand that there are very few, if any, real world data sets out there that are completely error free.

The other major takeaway here is that linear regression can be used in a variety of areas to provide insights into how things work and make useful predictions about the future.

If you want some amazing examples of highly correlated relationships that don’t make sense then check out Spurious Correlations! It’s a great resource for explaining why correlation does not always = causation. But that is a topic for another post!

I hope you have gained some appreciation of linear regression and how it can be used. I certainly had fun writing it!

This post was written to provide an introduction to linear regression for parents/carers and teachers who are looking for easy to follow and understand content. I hope that with the information I have provided you’ll feel confident passing this knowledge on.

For more resources on teaching your children computer programming and data science, take a look at some of my previous posts listed below:

If you have enjoyed this article then please connect with me via LinkedIn or Twitter.

--

--

Allan Bond

Data science enthusiast with a passion for solving problems using my knowledge in Biotechnology, Business Administration and Data Science.