6  Assumptions and Properties of IRT Models

Author

Derek C. Briggs and Claude Code (Opus 4.6 & 4.7)

6.1 Overview

IRT models rest on several key assumptions and have important properties that distinguish them from classical test theory. Understanding these is essential for proper application and interpretation.

Assumptions Properties
Local Independence Parameter Invariance
Appropriate Dimensionality Scale Indeterminacy
Functional Form (ICC shape)
Continuous latent variable

6.2 Assumption 1: Local Independence

6.2.1 Definition

Item responses are statistically independent, conditional on the latent variable \(\theta\).

Put differently: the only reason item responses are correlated is due to their common dependence on \(\theta\).

6.2.2 Mathematical Expression

This assumption gives us the important result:

\[P(X_{1p} = 1 \text{ and } X_{2p} = 1 | \theta_p) = P(X_{1p} = 1 | \theta_p) \times P(X_{2p} = 1 | \theta_p)\]

More generally, for a full response pattern:

\[P(X_1, X_2, \ldots, X_I | \theta) = \prod_{i=1}^{I} P(X_i | \theta)\]

This is the same assumption made in factor analysis. For multidimensional IRT, we assume item responses are independent conditional on all dimensions of \(\theta\) included in the model.

6.2.3 Quick Aside: Useful Probability Rules

These rules are essential for understanding IRT:

  1. Complement Rule: \(P(A) = 1 - P(\neg A)\)
    • Probability of A is 1 minus the probability of A not happening
  2. Multiplication Rule (Independence): \(P(A \text{ and } B) = P(A) \times P(B)\)
    • When A and B are independent
    • This is the joint probability of A and B occurring
    • Local independence allows us to use this for item responses
  3. Conditional Probability: \(P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\)
    • Probability of A given that B happened
  4. Bayes’ Rule: \(P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\)

6.2.4 What Causes Violations of Local Independence?

  1. Multidimensionality: Test measures more than one latent trait
  2. Item chaining: Answer to one item depends on previous item
  3. Testlet effects: Items share common stimulus (e.g., reading passage)
  4. Speededness: Time pressure creates dependence among later items
  5. Method effects: Similar item formats create additional covariance

6.2.5 Checking Local Independence

Code
library(mirt)

# Load example data
forma <- read.csv("../Data/pset1_formA.csv")
forma <- forma[, 1:15]

# Fit model
mod <- mirt(forma, 1, itemtype = "2PL", verbose = FALSE)

# Residual correlations can indicate local dependence
# (suppress printed output - just keep the visualization)
residuals_ld <- suppressMessages(residuals(mod, type = "LD", verbose = FALSE))

# Visualize (upper triangle of matrix)
library(ggplot2)

# Convert to long format for plotting
ld_matrix <- as.matrix(residuals_ld)
ld_df <- expand.grid(Item1 = 1:15, Item2 = 1:15)
ld_df$LD <- as.vector(ld_matrix)
ld_df <- ld_df[ld_df$Item1 < ld_df$Item2, ]

ggplot(ld_df, aes(x = factor(Item1), y = factor(Item2), fill = LD)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0,
                       name = "LD\nStatistic") +
  labs(x = "Item", y = "Item", title = "Local Dependence Check: Residual Correlations") +
  theme_minimal()

Large positive values suggest items may be locally dependent.


6.3 Assumption 2: Appropriate Dimensionality

6.3.1 Definition

The model contains the appropriate number of latent dimensions to account for the covariance among items.

  • For unidimensional IRT: One \(\theta\) is sufficient
  • For multidimensional IRT: Multiple \(\theta\)s are needed

6.3.2 Key Points

  • If a test is multidimensional and we fit only a single latent variable, this can cause local item dependence
  • Dimensionality refers to the number of latent variables needed to model the response data
  • This may or may not correspond to the hypothesized dimensionality of the theoretical construct
  • Most IRT models used in practice are unidimensional

6.3.3 Assessing Dimensionality

Code
par(mfrow = c(1, 2))

# Eigenvalue plot (scree plot)
eigenvalues <- eigen(cor(forma))$values
plot(eigenvalues, type = "b", pch = 19,
     xlab = "Component Number", ylab = "Eigenvalue",
     main = "Scree Plot")
abline(h = 1, lty = 2, col = "red")

# Ratio of first to second eigenvalue
cat("First eigenvalue:", round(eigenvalues[1], 2), "\n")
First eigenvalue: 5.29 
Code
cat("Second eigenvalue:", round(eigenvalues[2], 2), "\n")
Second eigenvalue: 1.8 
Code
cat("Ratio (1st/2nd):", round(eigenvalues[1]/eigenvalues[2], 2), "\n")
Ratio (1st/2nd): 2.94 
Code
# Cumulative variance explained
cum_var <- cumsum(eigenvalues) / sum(eigenvalues)
plot(cum_var, type = "b", pch = 19,
     xlab = "Number of Components", ylab = "Cumulative Variance Explained",
     main = "Cumulative Variance", ylim = c(0, 1))
abline(h = 0.8, lty = 2, col = "red")

Code
par(mfrow = c(1, 1))

Guidelines:

  • A dominant first eigenvalue suggests unidimensionality
  • Ratio of first to second eigenvalue > 3 often indicates essential unidimensionality
  • But these are rough guidelines, not strict rules

6.4 Assumption 3: Functional Form (ICC Shape)

6.4.1 Definition

We assume that the probability of correct responses follows the logistic form (or normal ogive):

\[P(X_{ip} = 1 | \theta_p) = c_i + (1 - c_i) \frac{\exp(a_i(\theta_p - b_i))}{1 + \exp(a_i(\theta_p - b_i))}\]

6.4.2 Key Characteristics

  1. Monotonically increasing probability as a function of \(\theta\)
  2. S-shaped (sigmoidal) curve
  3. Described by the \(a\), \(b\), and \(c\) parameters
  4. Lower asymptote determined by \(c\)
  5. Upper asymptote is 1 (or can be less with a 4PL model)
Code
theta <- seq(-4, 4, 0.1)

par(mfrow = c(1, 2))

# Standard ICC
plot(theta, calc_prob(theta, 1.5, 0, 0.2), type = "l", lwd = 3, col = "blue",
     xlab = expression(theta), ylab = expression(P(X == 1)),
     main = "Standard 3PL ICC Form",
     ylim = c(0, 1))
abline(h = c(0.2, 1), lty = 2, col = "gray")
text(-3, 0.25, "c (lower asymptote)", col = "gray40")

# What if ICC were NOT monotonic? (This would violate the assumption)
theta_range <- seq(-4, 4, 0.1)
non_monotonic <- 0.5 + 0.3 * sin(theta_range)  # Hypothetical non-monotonic

plot(theta_range, non_monotonic, type = "l", lwd = 3, col = "red",
     xlab = expression(theta), ylab = expression(P(X == 1)),
     main = "Hypothetical Non-Monotonic ICC\n(Would Violate Assumption)",
     ylim = c(0, 1))

Code
par(mfrow = c(1, 1))

6.4.3 Checking ICC Form with Empirical Plots

Code
# Compare model-implied ICC to empirical ICC
library(mirt)

par(mfrow = c(1, 3))

# Check item fit for 3 items
for (item in 1:3) {
  print(itemfit(mod, group.bins = 10, empirical.plot = item))
}

Code
par(mfrow = c(1, 1))

If the empirical points deviate systematically from the model-implied curve, the functional form assumption may be violated.


6.5 Property 1: Parameter Invariance

6.5.1 Definition

If the model fits the data, item parameter estimates should be the same regardless of the group used to estimate them.

  • We should get the same \(a\), \(b\), \(c\) parameters whether we use high-ability or low-ability examinees
  • Similarly, person parameters should be the same regardless of which items are used

6.5.2 Why This Matters

CTT IRT
Item p-values depend on sample Item parameters are invariant (if model fits)
Discrimination depends on sample variance Discrimination is a stable property
Need matched samples for comparison Can compare across different samples

6.5.3 Analogy: Linear Regression

In linear regression, we assume the slope and intercept are the same regardless of which subset of data we use to estimate them.

Code
set.seed(123)

# Generate population data
n <- 1000
x <- rnorm(n, 0, 1)
y <- 0.5 + 0.7 * x + rnorm(n, 0, 0.3)

# Split into "low" and "high" groups
low_group <- x < 0
high_group <- x >= 0

par(mfrow = c(1, 2))

# Full data
plot(x, y, pch = 16, col = rgb(0, 0, 0, 0.3),
     main = "Full Data", xlab = "X", ylab = "Y")
abline(lm(y ~ x), col = "blue", lwd = 2)

# Separate groups
plot(x[low_group], y[low_group], pch = 16, col = rgb(1, 0, 0, 0.5),
     xlim = range(x), ylim = range(y),
     main = "Low (Red) vs High (Blue) Groups", xlab = "X", ylab = "Y")
points(x[high_group], y[high_group], pch = 16, col = rgb(0, 0, 1, 0.5))
abline(lm(y[low_group] ~ x[low_group]), col = "red", lwd = 2, lty = 2)
abline(lm(y[high_group] ~ x[high_group]), col = "blue", lwd = 2, lty = 2)

Code
par(mfrow = c(1, 1))

# Compare estimates
cat("Full data: slope =", round(coef(lm(y ~ x))[2], 3), "\n")
Full data: slope = 0.726 
Code
cat("Low group: slope =", round(coef(lm(y[low_group] ~ x[low_group]))[2], 3), "\n")
Low group: slope = 0.724 
Code
cat("High group: slope =", round(coef(lm(y[high_group] ~ x[high_group]))[2], 3), "\n")
High group: slope = 0.72 

6.6 Property 2: Scale Indeterminacy

6.6.1 The Problem

Because we cannot observe \(a\), \(b\), \(c\), or \(\theta\) directly, and they occur together in the model, there is inherent indeterminacy in the scale.

Consider the expression \(a_i(\theta_p - b_i)\).

Now define new values:

  • \(b_i^* = (b_i + k_1) / k_2\)
  • \(\theta_p^* = (\theta_p + k_1) / k_2\)
  • \(a_i^* = k_2 \times a_i\)
  • \(c_i^* = c_i\)

It will be true that: \(a_i(\theta_p - b_i) = a_i^*(\theta_p^* - b_i^*)\)

6.6.2 Resolving Scale Indeterminacy

There are two main approaches:

6.6.2.1 1. Item-Side Anchoring

  • Place constraints on item parameters (e.g., mean item difficulty = 0)
  • Freely estimate the \(\theta\) distribution
  • More common in Rasch modeling and in Europe

6.6.2.2 2. Person-Side Anchoring

  • Fix distribution of \(\theta\) to have mean = 0 and SD = 1
  • Freely estimate item parameters
  • More common in US and in achievement testing

6.6.3 Practical Implications

  1. Comparing across studies: Must ensure same scale/anchoring
  2. Linking: Need anchor items or anchor persons to equate scales
  3. Interpretation: The absolute value of \(\theta\) is arbitrary; only relative positions matter

6.7 Summary

6.7.1 Assumptions

Assumption Description Violation Consequences
Local Independence Responses independent given \(\theta\) Biased parameter estimates
Dimensionality Correct number of latent dimensions Local dependence, poor fit
Functional Form ICC follows logistic/normal ogive Poor item fit
Continuous \(\theta\) Latent variable is continuous (Rarely problematic)

6.7.2 Properties

Property Description Practical Implication
Parameter Invariance Parameters same across groups Enables fair comparison
Scale Indeterminacy Scale defined up to linear transform Need anchoring conventions

6.7.3 Key Takeaways

  1. Local independence is crucial - violations can seriously bias estimates
  2. Dimensionality should be checked before fitting unidimensional models
  3. Parameter invariance is a property, not an assumption - it holds if model fits
  4. Scale indeterminacy means we need conventions to interpret parameters
  5. Always check model assumptions before interpreting results!