# Simple Logistic Regression Analysis

In certain situations, the linear regression model isn't appropriate. You may find yourself in a situation where the predicted values are probabilities and thus must always be between zero and one. A linear regression assumes no such restriction. A logistic regression does.

The best way to explain logistic regression analysis may be with an example. Say we are interested in modelling the probability of a newborn having a birth defect given a mother's age. One could survey a number of mothers at various ages (X in our table), counting both those with normal births (coded as 0 in our table) and those with abnormal ones (coded as 1 in our table).

After collecting some data, it is time to run the logistic regression. To do this you need the total of all observations (both normal and abnormal births), the probability of observing some phenomenon (a birth defect in our case), the odds of occurence (infants with birth defects over those born normally), and the natural logarithm of the calculated odds.

A weighted linear regression is then produced using the total number of observations in each category (again, both normal births and those with defects per age) as weights. This linear regression is created with X (age) as the independent variable and the observed logarithm of the calculated odds. Again, using the total number of observations in each X (age) as weights.

This weighted linear regression produces an equation of the form ln(odds) = intercept + slope * X. (i.e., we regressed the Xs with the logarithm of the calculated odds). If we take both sides to be the exponent of e, we get odds = eintercept + slope * X. The probability of observing our phenomenon (again, a birth with a defect in this case) is then equal to odds / (1 + odds).

You can use the table below to enter your own observed data and see the logistic curve produced in a graph to the right. Contained within the graph will be your own data along with data that the model predicts. The title of the graph is the equation used to generate the predicted probabilities.

To illustrate the use of this tool, I borrowed data from BabyZone and put it in the calculator like so...

#### Example of how calculator handles data on birth defects by Mother's age

Notice from the graph that our logistic model appears to do a very good job of modelling our actual data. In fact, you can't even see the actual (blue dot) data behind its predicted counterparts (orange dots). But they are there. Also notice that the data is somewhat sparce. We have no probability of birth defects corresponding to women aged 39, for instance. But now we have a logistic equation we can use to help us predict what that probability would be. We just put an age (39) in for X and the model equation will generate the corresponding probability: e-13.595 + 0.237 * (39) / (1 + e-13.595 + 0.237 * (39) ) = 0.013. So, we can say that a woman of age 39 can expect to have a 1.3% chance of having a child with a birth defect.

Now it's your turn to try. Enter the number of data points you have below, then the table will show. Then enter your observed Xs, Ys coded as 0 and Ys coded as 1. Your results will automatically be generated.

#### Logistic Regression Calculator:

Enter the number of data points:

i ii iii iv v vi vii viii
X
 Instances of Y Coded as 0 1
Total ii + iii Y as Observed Probability Y as Odds Y as Log Odds Predicted Probability