Offset and Weights in GLM Regression

Introduction

In GLMs, we often encounter scenarios where we need to account for exposure or adjust for certain factors. Both offset and weights play crucial roles in achieving this. Let’s break down their differences and understand when to use each one. Let Y= (Y_1, Y_2, \dots, Y_n) represent the response variable, X_{j} = (X_{j1}, \dots, X_{jm}),  j=1,\dots,n are predictor variables, and s_j= \text{time}_j represents the exposure time.

1. Offset

An offset is a covariate included in a model with a fixed coefficient of 1 (which is not estimated). It acts as a scaling factor for the response variable. Typically, offsets are used with Poisson models to represent exposure. For instance, if you’re modeling count data (e.g., number of events), an offset can account for varying exposure times. The formula for incorporating an offset in a Poisson GLM with Y \sim Poi(\lambda) is:

    \[\text{{Model: }} Y \sim X_1 + X_2 + \ldots + \text{{offset}}(\log(s}))\]

This makes totally sense, the exposure just multiplies \lambda_j = s_j e^{\theta^T X_j } compared to a Poisson regression model without different exposure and is the correct way to incorporate exposure into a Poisson regression.

The log likelihood is therefore given by

    \[log(L) = \sum_{j=1}^n Y_j \theta^T X_j - s_j e^{\theta^T X_j }\]

2. Using Y_j/s_j as response variable

We still assume that Y \sim Poi(\lambda). Thus assuming Y_j/s_j to be a Poisson distribution as well is incorrect since we are modelling rates now. To see this we take a look at the log likelihood which differs from the offset approach

    \[log(L) = \sum_{j=1}^n \frac{Y_j}{s_j} \theta^T X_j - e^{\theta^T X_j }\]

3. Weights

Weights, on the other hand, are quite different. They adjust the variance of the response variable. When using weights, the scale parameter (related to the variance) is divided by the weight values for each observation. Records with weight values less than or equal to 0 or missing are excluded from the analysis. Weights are commonly employed in GLMs to handle heteroscedasticity or unequal variances. Weight are reflected in the log likelihood by

    \[log(L) = \sum_{j=1}^n s_j( Y_j \theta^T X_j - e^{\theta^T X_j })\]

Conclusion:

To incorporate exposure in an Poisson GLM Regression using an offset is the method of choice. However, a weighted Poisson regression when modelling Y_j/s_j will give the same results, since:

    \[log(L) = \sum_{j=1}^n s_j( \frac{Y_j}{s_i} \theta^T X_j - e^{\theta^T X_j }) = \sum_{j=1}^n Y_j \theta^T X_j - s_j e^{\theta^T X_j }\]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.