Intro
This is material to prepare for application in R in Psychometrics.
Resources: website
Material made by: Susu Zhang |
CTT Item Analysis
1.1) Response data
resp <- read.table('resp.txt', header = F, sep = '\t')
The imported data set resp contains 100 subjects’ dichotomous responses to 40 GRE questions.
Output 1.1
1.2) CTT Item Analysis
The total score of each subject can be obtained by the row sums.
total.score <- rowSums(resp)
Output 1.2
The item difficulty in CTT can be obtained by calculating the proportion of correct answers of each item.
item.diff <- colMeans(resp)
Output 1.3
1.3) Item discrimination
The item discrimination in CTT can be obtained by the point biserial correlation between the item response and the total score.
n.items <- ncol(resp) # number of items
total.score <- rowSums(resp) # total score
item.disc <- numeric(n.items) # output vector
for(i in 1:n.items){ # sequence
item.disc[i] <- cor(total.score, resp[,i])} # body
Output 1.4 |
|
|
Validity
A test has validity if it measures what it purports to measure. There are three types of validities; content validity, criterion-related validity, and construct validity.
test1 <- read.table("test1.txt")
test2 <- read.table("test2.txt")
2.1) Criterion Related Validity
total_X <- rowSums(test1)
total_Y <- rowSums(test2)
rho_xy <- cor(total_X, total_Y)
The validity coefficient is 0.6024.
2.2) Correction for attenuation
coeff.alpha <- function(responses){
# Get number of items (N) and individuals
n.items <- ncol(responses)
n.persons <- nrow(responses)
# Get individual total scores
x <- rowSums(responses)
# Get observed-score variance of whole test (X)
var.x <- var(x)*(n.persons-1)/n.persons
# Get observed-score variance of each item (Y_j)
var.y <- numeric(n.items)
for(i in 1:n.items){
var.y[i] <- var(responses[,i])*(n.persons-1)/n.persons
}
# Apply the alpha formula
alpha <- (n.items/(n.items-1))*(1 - sum(var.y)/var.x)
return(alpha)
}
rho_xx <- coeff.alpha(test1)
rho_xx
## [1] 0.6061799
rho_yy <- coeff.alpha(test2)
rho_yy
## [1] 0.6439105
rho_txty <- rho_xy / (sqrt(rho_xx) * sqrt(rho_yy))
rho_txty
## [1] 0.9641909
Obtain the true score correlation. |
Decision Table
Decision table or a contingency table based on a test score and a criterion score. From the contingency table, students will be able to understand and obtain the hit rate, sensitivity, specificity, and base rate in R. Test scores are often used for making screening decisions.
3.1) Decision Table
test1 <- read.table("test1.txt")
test2 <- read.table("test2.txt")
# total scores
X <- rowSums(test1)
Y <- rowSums(test2)
predicted <- (X >= 13)
actual <- (Y >= 13)
head(predicted)
## [1] FALSE TRUE FALSE FALSE FALSE FALSE
head(actual)
## [1] FALSE FALSE TRUE FALSE FALSE FALSE
sum(predicted)
## [1] 44
sum(actual)
## [1] 33
mean(predicted)
## [1] 0.44
mean(actual)
## [1] 0.33
match <- cbind(predicted, actual)
decision <- table(actual, predicted)
decision
Output 3.1
3.2) Hit Rate
mean(predicted == actual)
## [1] 0.71
decision[1,1] # Correct rejection
## [1] 47
decision[2,2] # Hit
## [1] 24
sum(decision) # Number of examinees
## [1] 100
(decision[1,1] + decision[2,2]) / sum(decision) # hit rate
## [1] 0.71
3.3) Sensitivity and Specificity
decision[2,2] # hit
## [1] 24
decision[2,1] # miss
## [1] 9
decision[2,2] / (decision[2,2] + decision[2,1]) # sensitivity
## [1] 0.7272727
decision[1,1] # correct rejection
## [1] 47
decision[1,2] # false alarm
## [1] 20
decision[1,1] / (decision[1,1] + decision[1,2]) # specificity
## [1] 0.7014925
mean(predicted[actual == TRUE] == TRUE) # sensitivity
## [1] 0.7272727
mean(predicted[actual == FALSE] == FALSE) # specificity
## [1] 0.7014925
3.4) Base Rate
mean(actual)
## [1] 0.33
(decision[2,1] + decision[2,2]) / sum(decision)
## [1] 0.33
sum(decision[2,]) / sum(decision)
## [1] 0.33
rowSums(decision)[2] / sum(decision)
## TRUE
## 0.33
|
Test Construction
Suppose that we developed a new test containing N newly written items. The test is designed to measure a unidimensional latent construct. Also, the test items are assumed to be essentially tau− equivalent. This N-item test is administered to a pretest sample.
4.1) Item Difficulty
responses <- read.table(".")
Y <- read.table(".")
difficulty <- colMeans(responses)
difficulty
Output 4.1
4.2) Item Discrimination
X <- rowSums(responses) # total score
discrimination <- numeric(ncol(responses)) # outcome vector
for(i in 1:ncol(responses)){
discrimination[i] <- cor(responses[,i], X) # Pearson correlation between the i-th item score and the total score
}
discrimination
Output 4.2
4.3) Item-score SD
item_sd <- sqrt(difficulty * (1-difficulty))
item_sd
Output 4.3
4.4) Item Reliability
item_rel <- item_sd * discrimination
round(item_rel, 2) # round to 2 digits
Output 4.4
4.5) Item Validity
r_iy <- numeric(ncol(responses))
for(i in 1:ncol(responses)){
r_iy[i] <- cor(responses[,i], Y) # replace each value by r_iy
}
item_val <- item_sd * r_iy
round(item_val, 2)
Output 4.5 |
|
|
Exploratory Factor Analysis
Factor analysis (FA) assumes that there are a number of underlying (latent) factors affecting the observed scores on items/tests. In other words, the traits underlying a test might be multidimensional.
scores <- read.table(".")
cor.subtests <- cor(scores)
factanal(x = scores, factors = 1)
factanal(x = scores, factors = 2)
cov.mat <- cov(scores)
factanal(covmat=cov.mat, factors = 2, n.obs = nrow(scores))
output <- factanal(x = scores, factors = 2, scores = 'regression')
varimax <- factanal(scores, factors = 2, rotation="varimax", scores="regression")
promax <- factanal(scores, factors = 2, rotation="promax", scores="regression")
library(psych)
output2 <- fa(scores, # input data
nfactors = 2, # number of factors
rotate = "varimax", # rotation
scores = "regression") # factor score estimation
output2$loadings # factor loadings
Output 5 |
Item Response Theory
Item response theory models the nonlinear relationship between the examinee’s trait and the probability of response.
GRE40 <- read.table(".")
par(mfrow=c(1,3))
hist(GRE40$a)
hist(GRE40$b)
hist(GRE40$c)
Output 6.1
Output 6.2
irt.p <- function(theta,a,b,c){ #ability and item parameters as scalars
p <- c + (1-c)/(1 + exp(-1.7a(theta-b)))
return(p)
}
irt.p(theta = 0, a = 1, b = -.2, c = .25)
## [1] 0.6881429
irt.p(theta = -1, a = 1, b = -.2, c = .25)
## [1] 0.4031802
theta_seq <- seq(from = -4, to = 4, by = .1)
p_item1 <- numeric(length(theta_seq))
for(t in 1:length(theta_seq)){
p_item1[t] <- irt.p(theta = theta_seq[t],
a = GRE40$a[1],
b = GRE40$b[1],
c = GRE40$c[1])
}
plot(theta_seq, p_item1, type = 'l', ylim=c(0,1))
Output 6.3 |
|