Generalized Linear Models and Categorical Data Analysis in R

Course Topics

Ordinary linear regression (OLR) assumes that response variables are continuous. Generalized Linear Models (GLMs) provide an extension to OLR since response variables can be continuous or discrete (e.g. binary or frequency).

This course covers:

  1. What are GLMs? When should we use them?
  2. How GLM works.
  3. Categorical data analysis, including contingency table analysis, measures of association, tests of independence, tests of symmetry.
  4. How to use R to fit GLMs using real data.

Below are three data examples which will be used in the course.

Example 1:

Researcher A is interested in how variables, including GRE, GPA and prestige of the undergraduate institution, affect admission status into graduate school. In this scenario, the response admission status (admit/no admit) is binary.

Data set link: 

Example 2:

Researcher B wants to predict the number of awards that a newly admitted student will earn by looking at the type of program in which the student was enrolled (vocational, general or academic) and the score of their final math exam.

Data set link: 

Example 3:

A Physicians’ Health Study Research Group at Harvard Medical School wants to study the relationship between aspirin use (Placebo/Aspirin) and heart attacks (Fatal Attack/Nonfatal Attack/No Attack).

Data are summarized in the table below.

 

Myocardial Infarction

Fatal Attack

Nonfatal Attack

No Attack

Treatment

Placebo

18

171

10,845

Aspirin

5

99

10,933

 

In Example 3, both variables are categorical, so categorical data analysis techniques (e.g. tests of independence) will be explained and implemented.

[video:https://vimeo.com/111638203]

 from  on .