Generalized linear model
4 stars based on
Thus far our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing. Models can handle more complicated situations and generalized linear model binomial assumptions the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables.
The structural form of the model describes the patterns of interactions and associations. The model parameters provide measures of strength of associations. In models, the focus is on estimating the model parameters. The basic inference tools e.
When discussing models, we will keep in mind:. For a review, if you wish, see a handout labeled LinRegExample. These models are fit by least squares and weighted least squares using, for example: The first widely used software package for fitting these models was called GLIM. The table below provides generalized linear model binomial assumptions good summary of GLMs following Agresti ch. For a more generalized linear model binomial assumptions discussion refer to AgrestiCh.
Following are examples of GLM components for models that we are already familiar, such as linear regression, and for some of generalized linear model binomial assumptions models that we will cover in this class, such as logistic regression and log-linear models. Simple Linear Regression models how mean expected value of a continuous response variable depends on a set of explanatory variables, where index i stands for each data point:.
Binary logistic regression models are also known as logit models when the predictors are all categorical. Log-linear Model models the expected cell counts as a function of levels of categorical variables, e. The log-linear models are more general than logit models, and some logit models are equivalent to certain log-linear models.
Log-linear model is also equivalent to Poisson regression model when all explanatory variables are discrete. For additional details see AgrestiSec. There are ways around these restrictions; e. Eberly College of Science. Printer-friendly version Thus far our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing. When discussing models, we will keep in mind: Objective Model structure e.
Parameter estimates and interpretation: Do you recall, what is the interpretation of the intercept and the slope? R 2residual analysis, F -statistic Model selection: From a plethora of possible predictors, which variables to include? Also called a noise model or error model. How is random error added to the prediction that comes out of the link function?
Systematic Component - specifies the explanatory variables X 1X 2X k in the model, more specifically their linear combination in creating the so called linear predictor ; e.
It says how the expected value of the generalized linear model binomial assumptions relates to the linear predictor of explanatory variables; e. The dependent variable Y i does NOT need to be normally distributed, but it typically assumes a distribution from an exponential family e. GLM does NOT assume a linear relationship between the dependent variable and the independent variables, but it does assume linear relationship between the transformed response in terms of the link function generalized linear model binomial assumptions the explanatory variables; e.
Independent explanatory variables can be even the power terms or some other nonlinear transformations of the original independent variables. The homogeneity of variance does NOT need to be satisfied.
In fact, it is not even possible in many cases given the model structure, and overdispersion when the observed variance is larger than what the model assumes maybe present.
Errors need to be independent but NOT normally distributed. It uses maximum likelihood estimation MLE rather than ordinary least squares OLS to estimate the parameters, and thus relies on large-sample approximations.
Simple Linear Regression models how mean expected value of a continuous response variable depends on a set of explanatory variables, where generalized linear model binomial assumptions i stands for each data point: Notice that with a multiple linear regression where we have more than one explanatory variable, e. X' s are explanatory variables can be continuous, discrete, or generalized linear model binomial assumptions and are linear in the parameters, e.
Again, transformation of the X's themselves are allowed like in linear regression; this holds for any GLM. The distribution of counts, which are the responses, is Poisson Systematic component: Summary of advantages of GLMs over traditional OLS regression We do not need to transform the response Y to have a normal distribution The choice of link is separate from the choice of random component thus we have more flexibility in modeling If the link produces additive effects, then we do not need constant variance.
The models are fitted via Maximum Likelihood estimation; thus optimal properties of the estimators. All the inference tools and model checking that we will discuss for log-linear and logistic regression models apply for other GLMs too; e. There is often one procedure in a software package to capture all the models listed above, e.
But there are some limitations of GLMs too, such as, Linear function, e. Logistic Regression up 6. Welcome to STAT ! Independence and Association Lesson 4: Generalized linear model binomial assumptions Data and Dependent Samples Lesson 5: Different Types of Independence Lesson 6: Further Topics on Logistic Regression Lesson 8: Multinomial Logistic Regression Models Lesson 9: Poisson Regression Lesson Log-Linear Models Lesson Advanced Topics Lesson