Show Menu
Cheatography

r psy1 Cheat Sheet (DRAFT) by

R Programming for Psychometrics

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Intro

This is material to prepare for applic­ation in R in Psycho­met­rics.
Resources: website
Material made by: Susu Zhang
(*Addi­tional)

CTT Item Analysis

1.1) Response data
resp <- read.table('resp.txt', header = F, sep = '\t')

The imported data set resp contains 100 subjects’ dichotomous responses to 40 GRE questions.
Output 1.1

1.2) CTT Item Analysis
The total score of each subject can be obtained by the row sums.
total.score <- rowSums(resp)

Output 1.2
The item difficulty in CTT can be obtained by calculating the proportion of correct answers of each item.
item.diff <- colMeans(resp)

Output 1.3

1.3) Item discrimination
The item discrimination in CTT can be obtained by the point biserial correlation between the item response and the total score.
n.items <- ncol(resp)        # number of items

total.score <- rowSums(resp) # total score

item.disc <- numeric(n.items)      # output vector

for(i in 1:n.items){         # sequence

  item.disc[i] <- cor(total.score, resp[,i])}  # body 

Output 1.4
(*Addi­tional)

Output 1.1

Output 1.2

Output 1.3

Output 1.4

 

Validity

A test has validity if it measures what it purports to measure. There are three types of validities; content validity, criterion-related validity, and construct validity.

test1 <- read.table("test1.txt")

test2 <- read.table("test2.txt")


2.1) Criterion Related Validity
total_X <- rowSums(test1)

total_Y <- rowSums(test2)

rho_xy <- cor(total_X, total_Y)

The validity coefficient is 0.6024.

2.2) Correction for attenuation
coeff.alpha <- function(responses){

  # Get number of items (N) and individuals

  n.items <- ncol(responses)

  n.persons <- nrow(responses)

  # Get individual total scores

  x <- rowSums(responses)

  # Get observed-score variance of whole test (X)

  var.x <- var(x)*(n.persons-1)/n.persons

  # Get observed-score variance of each item (Y_j)

  var.y <- numeric(n.items)

  for(i in 1:n.items){

    var.y[i] <- var(responses[,i])*(n.persons-1)/n.persons

  }

  # Apply the alpha formula

  alpha <- (n.items/(n.items-1))*(1 - sum(var.y)/var.x)

  return(alpha)

}

rho_xx <- coeff.alpha(test1)

rho_xx

## [1] 0.6061799

rho_yy <- coeff.alpha(test2)

rho_yy

## [1] 0.6439105

rho_txty <- rho_xy / (sqrt(rho_xx) * sqrt(rho_yy))

rho_txty

## [1] 0.9641909

Obtain the true score correlation.
(*Addi­tional)

Decision Table

Decision table or a contingency table based on a test score and a criterion score. From the contingency table, students will be able to understand and obtain the hit rate, sensitivity, specificity, and base rate in R. Test scores are often used for making screening decisions.

3.1) Decision Table
test1 <- read.table("test1.txt")

test2 <- read.table("test2.txt")

# total scores

X <- rowSums(test1)

Y <- rowSums(test2)

predicted <- (X >= 13)

actual <- (Y >= 13)

head(predicted)

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

head(actual)

## [1] FALSE FALSE  TRUE FALSE FALSE FALSE

sum(predicted)

## [1] 44

sum(actual)

## [1] 33

mean(predicted)

## [1] 0.44

mean(actual)

## [1] 0.33

match <- cbind(predicted, actual)

decision <- table(actual, predicted)

decision

Output 3.1

3.2) Hit Rate
mean(predicted == actual)

## [1] 0.71

decision[1,1] # Correct rejection

## [1] 47

decision[2,2] # Hit

## [1] 24

sum(decision) # Number of examinees

## [1] 100

(decision[1,1] + decision[2,2]) / sum(decision) # hit rate

## [1] 0.71


3.3) Sensitivity and Specificity
decision[2,2] # hit

## [1] 24

decision[2,1] # miss

## [1] 9

decision[2,2] / (decision[2,2] + decision[2,1]) # sensitivity

## [1] 0.7272727

decision[1,1] # correct rejection

## [1] 47

decision[1,2] # false alarm

## [1] 20

decision[1,1] / (decision[1,1] + decision[1,2]) # specificity

## [1] 0.7014925

mean(predicted[actual == TRUE] == TRUE) # sensitivity

## [1] 0.7272727

mean(predicted[actual == FALSE] == FALSE) # specificity

## [1] 0.7014925


3.4) Base Rate
mean(actual)

## [1] 0.33

(decision[2,1] + decision[2,2]) / sum(decision)

## [1] 0.33

sum(decision[2,]) / sum(decision)

## [1] 0.33

rowSums(decision)[2] / sum(decision)

## TRUE 

## 0.33
(*Addi­tional)

Output 3.1

Test Constr­uction

Suppose that we developed a new test containing N newly written items. The test is designed to measure a unidimensional latent construct. Also, the test items are assumed to be essentially tau− equivalent. This N-item test is administered to a pretest sample.

4.1) Item Difficulty
responses <- read.table(".")

Y <- read.table(".")

difficulty <- colMeans(responses)

difficulty

Output 4.1

4.2) Item Discrimination
X <- rowSums(responses) # total score


discrimination <- numeric(ncol(responses))  # outcome vector

for(i in 1:ncol(responses)){

  discrimination[i] <- cor(responses[,i], X) # Pearson correlation between the i-th item score and the total score

}

discrimination

Output 4.2

4.3) Item-score SD
item_sd <- sqrt(difficulty * (1-difficulty))

item_sd

Output 4.3

4.4) Item Reliability
item_rel <- item_sd * discrimination

round(item_rel, 2)  # round to 2 digits

Output 4.4

4.5) Item Validity
r_iy <- numeric(ncol(responses))

for(i in 1:ncol(responses)){

  r_iy[i] <- cor(responses[,i], Y) # replace each value by r_iy

}


item_val <- item_sd * r_iy

round(item_val, 2)

Output 4.5
(*Addi­tional)

Output 4.1

Output 4.2

Output 4.3

Output 4.4

Output 4.5

 

Explor­atory Factor Analysis

Factor analysis (FA) assumes that there are a number of underlying (latent) factors affecting the observed scores on items/tests. In other words, the traits underlying a test might be multidimensional.

scores <- read.table(".")

cor.subtests <- cor(scores)

factanal(x = scores, factors = 1)

factanal(x = scores, factors = 2)

cov.mat <- cov(scores)

factanal(covmat=cov.mat, factors = 2, n.obs = nrow(scores))

output <- factanal(x = scores, factors = 2, scores = 'regression')

varimax <- factanal(scores, factors = 2, rotation="varimax", scores="regression")

promax <- factanal(scores, factors = 2, rotation="promax", scores="regression")


library(psych)

output2 <- fa(scores, # input data

              nfactors = 2, # number of factors

              rotate = "varimax", # rotation

              scores = "regression") # factor score estimation

output2$loadings # factor loadings

Output 5
(*Addi­tional)

Output 5

Item Response Theory

Item response theory models the nonlinear relationship between the examinee’s trait and the probability of response.

GRE40 <- read.table(".")

par(mfrow=c(1,3))

hist(GRE40$a)

hist(GRE40$b)

hist(GRE40$c)


Output 6.1
Output 6.2

irt.p <- function(theta,a,b,c){   #ability and item parameters as scalars

  p <- c + (1-c)/(1 + exp(-1.7a(theta-b)))

  return(p)

}


irt.p(theta = 0, a = 1, b = -.2, c = .25)

## [1] 0.6881429

irt.p(theta = -1, a = 1, b = -.2, c = .25)

## [1] 0.4031802

theta_seq <- seq(from = -4, to = 4, by = .1)


p_item1 <- numeric(length(theta_seq))

for(t in 1:length(theta_seq)){

  p_item1[t] <- irt.p(theta = theta_seq[t],

                   a = GRE40$a[1],

                   b = GRE40$b[1],

                   c = GRE40$c[1])

}

plot(theta_seq, p_item1, type = 'l', ylim=c(0,1))

Output 6.3
(*Addi­tional)

Output 6.1

Output 6.2

Output 6.3

Item Response Theory