Examples of 3-Way Contingency Tables (R)

Posted on September 27, 2014 by

0


5.1 – Three Examples

A three-way contingency table is a cross-classification of observations by the levels of three categorical variables. More generally, k-way contingency tables classify observations by levels of k categorical variables. Levels may be ordinal or nominal.

Example 1 – Death Penalty

A 2 × 2 × 2 Table. Radlet (1981) studied effects of racial characteristics on whether individuals convicted of homicide received the death penalty. The 326 subjects were defendants in homicide indictments in 20 Florida counties during 1976-1977 (from Agresti (1990)).

Defendant’s Race
Victim’s Race
Death penalty
Yes
No
White
White
19
132
Black
0
9
Black
White
11
52
Black
6
97

Is there an association between death penalty, defendant’s race and victim’s race?  What kind of association? There is a similar example in Agresti (2007), Sec. 2.7.2.  We can display this table in another way as follows:

Defendant’s Race Victim Race Death Penalty
nijk
white white yes 19
white white no 132
white black yes 0
white black no 9
black white yes 11
black white no 52
black black yes 6
black black no 97
326

Note that this way of looking at the data “forgets” that the eight categories come from three variables with two levels each.

Example 2 – Boy Scouts and Juvenile Delinquency

A 3 × 2 × 2 table. This table below classifies n = 800 boys according to socioeconomic status (S), whether a boy is scout (B), and his juvenile delinquency (D) status:

Socioeconomic status
Boy scout
Delinquent
Yes
No
Low
Yes
11
43
No
42
169
Medium
Yes
14
104
No
20
132
High
Yes
8
196
No
2
59

Is the boy’s delinquent status independent of his socioeconomic status and/or his boy scout status?

Example 3 – Graduate Admissions

A 6 × 2 × 2 table. The table below lists graduate admissions information for the six largest departments at U.C. Berkeley in the fall of 1973.

Dept.
Men rejected
Men accepted
Women rejected
Women accepted
A
313
512
19
89
B
207
353
8
17
C
205
120
391
202
D
278
139
244
131
E
138
53
299
94
F
351
22
317
24

Let D = department, S = sex, and A = admission status (rejected or accepted). Is there a gender admission bias? Does admission differ across different departments?

Below is the acceptance and rejection data for the 6 departments. The data are presented in a 2 × 2 table looking at acceptance and rejection for males and females, ignoring the departments. In a table like this we could only ask if the rejections or admissions differ by gender, but nothing about the departments.

Male Female total
Accepted 1199 557 1756
Rejected 1492 1278 2770
total 2691 1835 4526
Think about the following question, then click on the icon to the left display an answer.

Does the aggregate data accurately reflect the presence or absence of gender bias, or must we look at departments individually?

Here is an example of a 2 × 6 table. The same data as above, however, this time the data are presented looking at acceptance and rejection counts by department, ignoring the gender.

Department
A
B
C
D
E
F
total
Accepted
601
370
322
270
147
46
1756
Rejected
332
215
596
522
437
668
2770
total
933
585
918
792
584
714
4526

Boy Scouts CMH Test

The option in R is mantelhaen.test() and used in the file boys.R as shown below:

#### Test of conditional independence
#### Use of oddsratio(), confit() from {vcd}
#### Cochran-Mantel-Haenszel test
#### Also testing via Log-linear models
############################################

#### Input the table

deliquent=c("no","yes")
scout=c("no", "yes")
SES=c("low", "med","high")
table=expand.grid(deliquent=deliquent,scout=scout,SES=SES)
count=c(169,42,43,11,132,20,104,14,59,2,196,8)
table=cbind(table,count=count)
table
temp=xtabs(count~deliquent+scout+SES,table)
temp

#### Create "flat" contigency tables

ftable(temp)

#### Let's see how we can create various subtables
#### One-way Table SES

Frequency=as.vector(margin.table(temp,3))
CumFrequency=cumsum(Frequency)
cbind(SES,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency))

#### One-way Table scout

Frequency=as.vector(margin.table(temp,2))
CumFrequency=cumsum(Frequency)
cbind(scout,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency))

#### One-way Table deliquent

Frequency=as.vector(margin.table(temp,1))
CumFrequency=cumsum(Frequency)
cbind(deliquent,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency))

#### Test the Mutual Independence, step by step 
#### compute the expected frequencies

E=array(NA,dim(temp))
for (i in 1:dim(temp)[1]) {
for (j in 1:dim(temp)[2]) {
for (k in 1:dim(temp)[3]) {
E[i,j,k]=(margin.table(temp,3)[k]*margin.table(temp,2)[j]*margin.table(temp,1))[i]/(sum(temp))^2
}}}
E

#### compute the X^2, and G^2

df=(length(temp)-1)-(sum(dim(temp))-3)
X2=sum((temp-E)^2/E)
X2
1-pchisq(X2,df)
G2=2*sum(temp*log(temp/E))
G2
1-pchisq(G2,df)


#### Test for Mutual indpendence (and other models) by considering analysis of all two-way tables
#### Two-way Table SES*scout
#### This is a test of Marginal Independence for SES and Scout

SES_Scout=margin.table(temp,c(3,2))
result=chisq.test(SES_Scout)
result
result$expected
result=chisq.test(SES_Scout,correct = FALSE)
result
SES_Scout=list(Frequency=SES_Scout,RowPercentage=prop.table(SES_Scout,2))


#### Two-way Table SES*deliquent
#### temp=xtabs(count~scout+SES+deliquent,table)
#### SES_deliquent=addmargins(temp)[1:2,1:3,3]
#### This is a test of Marginal Independence for SES and deliquent status

SES_deliquent=margin.table(temp,c(3,1))
result=chisq.test(SES_deliquent)
result
result$expected
result=chisq.test(SES_deliquent,correct = FALSE)
result
SES_deliquent=list(Frequency=SES_deliquent,RowPercentage=prop.table(SES_deliquent,2))

#### Two-way Table deliquent*scout
#### This is a test of Marginal Independence for Deliquent status and the Scout status

deliquent_scout=margin.table(temp,c(1,2))
result=chisq.test(deliquent_scout)
result$expected
result=chisq.test(deliquent_scout,correct = FALSE)
result
deliquent_scout1=list(Frequency=deliquent_scout,RowPercentage=prop.table(deliquent_scout,1))

#### compute the log(oddsraio), oddsratio and its 95% CI using {vcd} package

lor=oddsratio(deliquent_scout)
lor
OR=exp(lor)
OR
OR=oddsratio(deliquent_scout, log=FALSE)
OR
CI=exp(confint(lor))
CI
CI=confint(OR)
CI


#### Table of deliquent*scout at different level of SES

temp

#### Test for Joint Independence of (D,BS)
#### creating 6x2 table, BS x D

SESscout_deliquent=ftable(temp, row.vars=c(3,2))
result=chisq.test(SESscout_deliquent)
result

#### Test for Marginal Independence (see above analysis of two-way tables)

#### Test for conditional independence
#### To get partial tables of DB for each level of S

temp[,,1]
chisq.test(temp[,,1], correct=FALSE)
temp[,,2]
chisq.test(temp[,,2], correct=FALSE)
temp[,,3]
chisq.test(temp[,,3], correct=FALSE)
X2=sum(chisq.test(temp[,,1], correct=FALSE)$statistic+chisq.test(temp[,,2], correct=FALSE)$statistic+chisq.test(temp[,,3], correct=FALSE)$statistic)
1-pchisq(X2,df=3)

#### Cochran-Mantel-Haenszel test

mantelhaen.test(temp)
mantelhaen.test(temp,correct=FALSE) 

#### Breslow-Day test
#### make sure to first source/run breslowday.test_.R

breslowday.test(temp)

################################################################
#### Log-linear Models
#### Let's see how we would do this via Log_linear models 
#### We will look at the details later in the course
#### test of conditional independence
#### via loglinear model, but the object has to be a table!

is.table(temp)  #### to check if table
temp<-as.table(temp) #### to save as table
temp.condind<-loglin(temp, list(c(1,3), c(2,3)), fit=TRUE, param=TRUE) #### fit the cond.indep. model
temp.condind
1-pchisq(temp.condind$lrt, temp.condind$df)

#### There is no way to do Breslow-Day stats in R; 
#### you need to write your own function or fit the homogeneous association model
#### to test for identity of odds-ratios

#### test of homogenous association model

temp.hom<-loglin(temp, list(c(1,3), c(2,3), c(1,2)), fit=TRUE, param=TRUE)
temp.hom
1-pchisq(temp.hom$lrt, temp.hom$df)


#### Here is how to do the same but with the glm() function in R
#### test of conditional independence
#### via loglinear mode, but using glm() funcion
#### the object now needs to be dataframe

temp.data<-as.data.frame(temp)
temp.data
temp.condind<-glm(temp.data$Freq~temp.data$scout*temp.data$SES+temp.data$deliquent*temp.data$SES, family=poisson())
summary(temp.condind)
fitted(temp.condind)

#### test of homogenous association model

temp.hom<-glm(temp.data$Freq~temp.data$scout*temp.data$SES+temp.data$deliquent*temp.data$SES+temp.data$scout*temp.data$deliquent, family=poisson())
summary(temp.hom)
fitted(temp.hom)


https://onlinecourses.science.psu.edu/stat504/node/103

Advertisements