6 Assumptions and Properties of IRT Models

Author

Derek C. Briggs and Claude Code (Opus 4.6 & 4.7)

6.1 Overview

IRT models rest on several key assumptions and have important properties that distinguish them from classical test theory. Understanding these is essential for proper application and interpretation.

Assumptions	Properties
Local Independence	Parameter Invariance
Appropriate Dimensionality	Scale Indeterminacy
Functional Form (ICC shape)
Continuous latent variable

6.2 Assumption 1: Local Independence

6.2.1 Definition

Item responses are statistically independent, conditional on the latent variable $\theta$.

Put differently: the only reason item responses are correlated is due to their common dependence on $\theta$.

6.2.2 Mathematical Expression

This assumption gives us the important result:

\[P(X_{1p} = 1 \text{ and } X_{2p} = 1 | \theta_p) = P(X_{1p} = 1 | \theta_p) \times P(X_{2p} = 1 | \theta_p)\]

More generally, for a full response pattern:

\[P(X_1, X_2, \ldots, X_I | \theta) = \prod_{i=1}^{I} P(X_i | \theta)\]

This is the same assumption made in factor analysis. For multidimensional IRT, we assume item responses are independent conditional on all dimensions of $\theta$ included in the model.

6.2.3 Quick Aside: Useful Probability Rules

These rules are essential for understanding IRT:

Complement Rule: $P(A) = 1 - P(\neg A)$
- Probability of A is 1 minus the probability of A not happening
Multiplication Rule (Independence): $P(A \text{ and } B) = P(A) \times P(B)$
- When A and B are independent
- This is the joint probability of A and B occurring
- Local independence allows us to use this for item responses
Conditional Probability: $P(A|B) = \frac{P(A \text{ and } B)}{P(B)}$
- Probability of A given that B happened
Bayes’ Rule: $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$

6.2.4 What Causes Violations of Local Independence?

Multidimensionality: Test measures more than one latent trait
Item chaining: Answer to one item depends on previous item
Testlet effects: Items share common stimulus (e.g., reading passage)
Speededness: Time pressure creates dependence among later items
Method effects: Similar item formats create additional covariance

6.2.5 Checking Local Independence

Code

library(mirt)

# Load example data
forma <- read.csv("../Data/pset1_formA.csv")
forma <- forma[, 1:15]

# Fit model
mod <- mirt(forma, 1, itemtype = "2PL", verbose = FALSE)

# Residual correlations can indicate local dependence
# (suppress printed output - just keep the visualization)
residuals_ld <- suppressMessages(residuals(mod, type = "LD", verbose = FALSE))

# Visualize (upper triangle of matrix)
library(ggplot2)

# Convert to long format for plotting
ld_matrix <- as.matrix(residuals_ld)
ld_df <- expand.grid(Item1 = 1:15, Item2 = 1:15)
ld_df$LD <- as.vector(ld_matrix)
ld_df <- ld_df[ld_df$Item1 < ld_df$Item2, ]

ggplot(ld_df, aes(x = factor(Item1), y = factor(Item2), fill = LD)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0,
                       name = "LD\nStatistic") +
  labs(x = "Item", y = "Item", title = "Local Dependence Check: Residual Correlations") +
  theme_minimal()

Large positive values suggest items may be locally dependent.

6.3 Assumption 2: Appropriate Dimensionality

6.3.1 Definition

The model contains the appropriate number of latent dimensions to account for the covariance among items.

For unidimensional IRT: One $\theta$ is sufficient
For multidimensional IRT: Multiple $\theta$s are needed

6.3.2 Key Points

If a test is multidimensional and we fit only a single latent variable, this can cause local item dependence
Dimensionality refers to the number of latent variables needed to model the response data
This may or may not correspond to the hypothesized dimensionality of the theoretical construct
Most IRT models used in practice are unidimensional

6.3.3 Assessing Dimensionality

Code

par(mfrow = c(1, 2))

# Eigenvalue plot (scree plot)
eigenvalues <- eigen(cor(forma))$values
plot(eigenvalues, type = "b", pch = 19,
     xlab = "Component Number", ylab = "Eigenvalue",
     main = "Scree Plot")
abline(h = 1, lty = 2, col = "red")

# Ratio of first to second eigenvalue
cat("First eigenvalue:", round(eigenvalues[1], 2), "\n")

First eigenvalue: 5.29

Code

cat("Second eigenvalue:", round(eigenvalues[2], 2), "\n")

Second eigenvalue: 1.8

Code

cat("Ratio (1st/2nd):", round(eigenvalues[1]/eigenvalues[2], 2), "\n")

Ratio (1st/2nd): 2.94

Code

# Cumulative variance explained
cum_var <- cumsum(eigenvalues) / sum(eigenvalues)
plot(cum_var, type = "b", pch = 19,
     xlab = "Number of Components", ylab = "Cumulative Variance Explained",
     main = "Cumulative Variance", ylim = c(0, 1))
abline(h = 0.8, lty = 2, col = "red")

Code

par(mfrow = c(1, 1))

Guidelines:

A dominant first eigenvalue suggests unidimensionality
Ratio of first to second eigenvalue > 3 often indicates essential unidimensionality
But these are rough guidelines, not strict rules

6.4 Assumption 3: Functional Form (ICC Shape)

6.4.1 Definition

We assume that the probability of correct responses follows the logistic form (or normal ogive):

\[P(X_{ip} = 1 | \theta_p) = c_i + (1 - c_i) \frac{\exp(a_i(\theta_p - b_i))}{1 + \exp(a_i(\theta_p - b_i))}\]

6.4.2 Key Characteristics

Monotonically increasing probability as a function of $\theta$
S-shaped (sigmoidal) curve
Described by the $a$, $b$, and $c$ parameters
Lower asymptote determined by $c$
Upper asymptote is 1 (or can be less with a 4PL model)

Code

theta <- seq(-4, 4, 0.1)

par(mfrow = c(1, 2))

# Standard ICC
plot(theta, calc_prob(theta, 1.5, 0, 0.2), type = "l", lwd = 3, col = "blue",
     xlab = expression(theta), ylab = expression(P(X == 1)),
     main = "Standard 3PL ICC Form",
     ylim = c(0, 1))
abline(h = c(0.2, 1), lty = 2, col = "gray")
text(-3, 0.25, "c (lower asymptote)", col = "gray40")

# What if ICC were NOT monotonic? (This would violate the assumption)
theta_range <- seq(-4, 4, 0.1)
non_monotonic <- 0.5 + 0.3 * sin(theta_range)  # Hypothetical non-monotonic

plot(theta_range, non_monotonic, type = "l", lwd = 3, col = "red",
     xlab = expression(theta), ylab = expression(P(X == 1)),
     main = "Hypothetical Non-Monotonic ICC\n(Would Violate Assumption)",
     ylim = c(0, 1))

Code

par(mfrow = c(1, 1))

6.4.3 Checking ICC Form with Empirical Plots

Code

# Compare model-implied ICC to empirical ICC
library(mirt)

par(mfrow = c(1, 3))

# Check item fit for 3 items
for (item in 1:3) {
  print(itemfit(mod, group.bins = 10, empirical.plot = item))
}

Code

par(mfrow = c(1, 1))

If the empirical points deviate systematically from the model-implied curve, the functional form assumption may be violated.

6.5 Property 1: Parameter Invariance

6.5.1 Definition

If the model fits the data, item parameter estimates should be the same regardless of the group used to estimate them.

We should get the same $a$, $b$, $c$ parameters whether we use high-ability or low-ability examinees
Similarly, person parameters should be the same regardless of which items are used

6.5.2 Why This Matters

CTT	IRT
Item p-values depend on sample	Item parameters are invariant (if model fits)
Discrimination depends on sample variance	Discrimination is a stable property
Need matched samples for comparison	Can compare across different samples

6.5.3 Analogy: Linear Regression

In linear regression, we assume the slope and intercept are the same regardless of which subset of data we use to estimate them.

Code

set.seed(123)

# Generate population data
n <- 1000
x <- rnorm(n, 0, 1)
y <- 0.5 + 0.7 * x + rnorm(n, 0, 0.3)

# Split into "low" and "high" groups
low_group <- x < 0
high_group <- x >= 0

par(mfrow = c(1, 2))

# Full data
plot(x, y, pch = 16, col = rgb(0, 0, 0, 0.3),
     main = "Full Data", xlab = "X", ylab = "Y")
abline(lm(y ~ x), col = "blue", lwd = 2)

# Separate groups
plot(x[low_group], y[low_group], pch = 16, col = rgb(1, 0, 0, 0.5),
     xlim = range(x), ylim = range(y),
     main = "Low (Red) vs High (Blue) Groups", xlab = "X", ylab = "Y")
points(x[high_group], y[high_group], pch = 16, col = rgb(0, 0, 1, 0.5))
abline(lm(y[low_group] ~ x[low_group]), col = "red", lwd = 2, lty = 2)
abline(lm(y[high_group] ~ x[high_group]), col = "blue", lwd = 2, lty = 2)

Code

par(mfrow = c(1, 1))

# Compare estimates
cat("Full data: slope =", round(coef(lm(y ~ x))[2], 3), "\n")

Full data: slope = 0.726

Code

cat("Low group: slope =", round(coef(lm(y[low_group] ~ x[low_group]))[2], 3), "\n")

Low group: slope = 0.724

Code

cat("High group: slope =", round(coef(lm(y[high_group] ~ x[high_group]))[2], 3), "\n")

High group: slope = 0.72

6.6 Property 2: Scale Indeterminacy

6.6.1 The Problem

Because we cannot observe $a$, $b$, $c$, or $\theta$ directly, and they occur together in the model, there is inherent indeterminacy in the scale.

Consider the expression $a_i(\theta_p - b_i)$.

Now define new values:

$b_i^* = (b_i + k_1) / k_2$
$\theta_p^* = (\theta_p + k_1) / k_2$
$a_i^* = k_2 \times a_i$
$c_i^* = c_i$

It will be true that: $a_i(\theta_p - b_i) = a_i^*(\theta_p^* - b_i^*)$

6.6.2 Resolving Scale Indeterminacy

There are two main approaches:

6.6.2.1 1. Item-Side Anchoring

Place constraints on item parameters (e.g., mean item difficulty = 0)
Freely estimate the $\theta$ distribution
More common in Rasch modeling and in Europe

6.6.2.2 2. Person-Side Anchoring

Fix distribution of $\theta$ to have mean = 0 and SD = 1
Freely estimate item parameters
More common in US and in achievement testing

6.6.3 Practical Implications

Comparing across studies: Must ensure same scale/anchoring
Linking: Need anchor items or anchor persons to equate scales
Interpretation: The absolute value of $\theta$ is arbitrary; only relative positions matter

6.7 Summary

6.7.1 Assumptions

Assumption	Description	Violation Consequences
Local Independence	Responses independent given $\theta$	Biased parameter estimates
Dimensionality	Correct number of latent dimensions	Local dependence, poor fit
Functional Form	ICC follows logistic/normal ogive	Poor item fit
Continuous $\theta$	Latent variable is continuous	(Rarely problematic)

6.7.2 Properties

Property	Description	Practical Implication
Parameter Invariance	Parameters same across groups	Enables fair comparison
Scale Indeterminacy	Scale defined up to linear transform	Need anchoring conventions

6.7.3 Key Takeaways

Local independence is crucial - violations can seriously bias estimates
Dimensionality should be checked before fitting unidimensional models
Parameter invariance is a property, not an assumption - it holds if model fits
Scale indeterminacy means we need conventions to interpret parameters
Always check model assumptions before interpreting results!

--- title: "Assumptions and Properties of IRT Models" author: "Derek C. Briggs and Claude Code (Opus 4.6 & 4.7)" output: html_document: toc: true toc_float: true code_folding: show pdf_document: toc: true latex_engine: xelatex --- ```{r inject-rootdir, include=FALSE} knitr::opts_knit$set(root.dir = "/Users/briggsd/Library/CloudStorage/Dropbox/Github/Measurement and Psychometrics/IRT Models for Dichotomously Scored Items/R Markdown Tutorials") ``` ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) # 3PL probability function (used throughout document) calc_prob <- function(theta, a, b, c) { c + (1 - c) * exp(a * (theta - b)) / (1 + exp(a * (theta - b))) } ``` ## Overview IRT models rest on several key **assumptions** and have important **properties** that distinguish them from classical test theory. Understanding these is essential for proper application and interpretation. | Assumptions | Properties | |:------------|:-----------| | Local Independence | Parameter Invariance | | Appropriate Dimensionality | Scale Indeterminacy | | Functional Form (ICC shape) | | | Continuous latent variable | | --- ## Assumption 1: Local Independence ### Definition Item responses are **statistically independent, conditional on the latent variable** $\theta$. Put differently: the only reason item responses are correlated is due to their common dependence on $\theta$. ### Mathematical Expression This assumption gives us the important result: $$P(X_{1p} = 1 \text{ and } X_{2p} = 1 | \theta_p) = P(X_{1p} = 1 | \theta_p) \times P(X_{2p} = 1 | \theta_p)$$ More generally, for a full response pattern: $$P(X_1, X_2, \ldots, X_I | \theta) = \prod_{i=1}^{I} P(X_i | \theta)$$ This is the same assumption made in factor analysis. For multidimensional IRT, we assume item responses are independent conditional on **all** dimensions of $\theta$ included in the model. ### Quick Aside: Useful Probability Rules These rules are essential for understanding IRT: 1. **Complement Rule**: $P(A) = 1 - P(\neg A)$ - Probability of A is 1 minus the probability of A not happening 2. **Multiplication Rule (Independence)**: $P(A \text{ and } B) = P(A) \times P(B)$ - When A and B are independent - This is the **joint probability** of A and B occurring - Local independence allows us to use this for item responses 3. **Conditional Probability**: $P(A|B) = \frac{P(A \text{ and } B)}{P(B)}$ - Probability of A given that B happened 4. **Bayes' Rule**: $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$ ### What Causes Violations of Local Independence? 1. **Multidimensionality**: Test measures more than one latent trait 2. **Item chaining**: Answer to one item depends on previous item 3. **Testlet effects**: Items share common stimulus (e.g., reading passage) 4. **Speededness**: Time pressure creates dependence among later items 5. **Method effects**: Similar item formats create additional covariance ### Checking Local Independence ```{r check-li, fig.width=8, fig.height=5} library(mirt) # Load example data forma <- read.csv("../Data/pset1_formA.csv") forma <- forma[, 1:15] # Fit model mod <- mirt(forma, 1, itemtype = "2PL", verbose = FALSE) # Residual correlations can indicate local dependence # (suppress printed output - just keep the visualization) residuals_ld <- suppressMessages(residuals(mod, type = "LD", verbose = FALSE)) # Visualize (upper triangle of matrix) library(ggplot2) # Convert to long format for plotting ld_matrix <- as.matrix(residuals_ld) ld_df <- expand.grid(Item1 = 1:15, Item2 = 1:15) ld_df$LD <- as.vector(ld_matrix) ld_df <- ld_df[ld_df$Item1 < ld_df$Item2, ] ggplot(ld_df, aes(x = factor(Item1), y = factor(Item2), fill = LD)) + geom_tile() + scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0, name = "LD\nStatistic") + labs(x = "Item", y = "Item", title = "Local Dependence Check: Residual Correlations") + theme_minimal() ``` Large positive values suggest items may be locally dependent. --- ## Assumption 2: Appropriate Dimensionality ### Definition The model contains the **appropriate number of latent dimensions** to account for the covariance among items. - For unidimensional IRT: One $\theta$ is sufficient - For multidimensional IRT: Multiple $\theta$s are needed ### Key Points - If a test is **multidimensional** and we fit only a single latent variable, this can cause **local item dependence** - Dimensionality refers to the number of latent variables needed to model the response data - This may or may not correspond to the hypothesized dimensionality of the theoretical construct - Most IRT models used in practice are unidimensional ### Assessing Dimensionality ```{r dimensionality, fig.width=10, fig.height=5} par(mfrow = c(1, 2)) # Eigenvalue plot (scree plot) eigenvalues <- eigen(cor(forma))$values plot(eigenvalues, type = "b", pch = 19, xlab = "Component Number", ylab = "Eigenvalue", main = "Scree Plot") abline(h = 1, lty = 2, col = "red") # Ratio of first to second eigenvalue cat("First eigenvalue:", round(eigenvalues[1], 2), "\n") cat("Second eigenvalue:", round(eigenvalues[2], 2), "\n") cat("Ratio (1st/2nd):", round(eigenvalues[1]/eigenvalues[2], 2), "\n") # Cumulative variance explained cum_var <- cumsum(eigenvalues) / sum(eigenvalues) plot(cum_var, type = "b", pch = 19, xlab = "Number of Components", ylab = "Cumulative Variance Explained", main = "Cumulative Variance", ylim = c(0, 1)) abline(h = 0.8, lty = 2, col = "red") par(mfrow = c(1, 1)) ``` **Guidelines:** - A dominant first eigenvalue suggests unidimensionality - Ratio of first to second eigenvalue > 3 often indicates essential unidimensionality - But these are rough guidelines, not strict rules --- ## Assumption 3: Functional Form (ICC Shape) ### Definition We assume that the probability of correct responses follows the **logistic form** (or normal ogive): $$P(X_{ip} = 1 | \theta_p) = c_i + (1 - c_i) \frac{\exp(a_i(\theta_p - b_i))}{1 + \exp(a_i(\theta_p - b_i))}$$ ### Key Characteristics 1. **Monotonically increasing** probability as a function of $\theta$ 2. **S-shaped** (sigmoidal) curve 3. Described by the $a$, $b$, and $c$ parameters 4. Lower asymptote determined by $c$ 5. Upper asymptote is 1 (or can be less with a 4PL model) ```{r icc-form, fig.width=10, fig.height=5} theta <- seq(-4, 4, 0.1) par(mfrow = c(1, 2)) # Standard ICC plot(theta, calc_prob(theta, 1.5, 0, 0.2), type = "l", lwd = 3, col = "blue", xlab = expression(theta), ylab = expression(P(X == 1)), main = "Standard 3PL ICC Form", ylim = c(0, 1)) abline(h = c(0.2, 1), lty = 2, col = "gray") text(-3, 0.25, "c (lower asymptote)", col = "gray40") # What if ICC were NOT monotonic? (This would violate the assumption) theta_range <- seq(-4, 4, 0.1) non_monotonic <- 0.5 + 0.3 * sin(theta_range) # Hypothetical non-monotonic plot(theta_range, non_monotonic, type = "l", lwd = 3, col = "red", xlab = expression(theta), ylab = expression(P(X == 1)), main = "Hypothetical Non-Monotonic ICC\n(Would Violate Assumption)", ylim = c(0, 1)) par(mfrow = c(1, 1)) ``` ### Checking ICC Form with Empirical Plots ```{r empirical-icc, fig.width=10, fig.height=4} # Compare model-implied ICC to empirical ICC library(mirt) par(mfrow = c(1, 3)) # Check item fit for 3 items for (item in 1:3) { print(itemfit(mod, group.bins = 10, empirical.plot = item)) } par(mfrow = c(1, 1)) ``` If the empirical points deviate systematically from the model-implied curve, the functional form assumption may be violated. --- ## Property 1: Parameter Invariance ### Definition If the model fits the data, **item parameter estimates should be the same regardless of the group used to estimate them**. - We should get the same $a$, $b$, $c$ parameters whether we use high-ability or low-ability examinees - Similarly, person parameters should be the same regardless of which items are used ### Why This Matters | CTT | IRT | |:----|:----| | Item p-values depend on sample | Item parameters are invariant (if model fits) | | Discrimination depends on sample variance | Discrimination is a stable property | | Need matched samples for comparison | Can compare across different samples | ### Analogy: Linear Regression In linear regression, we assume the slope and intercept are the same regardless of which subset of data we use to estimate them. ```{r invariance-demo, fig.width=10, fig.height=5} set.seed(123) # Generate population data n <- 1000 x <- rnorm(n, 0, 1) y <- 0.5 + 0.7 * x + rnorm(n, 0, 0.3) # Split into "low" and "high" groups low_group <- x < 0 high_group <- x >= 0 par(mfrow = c(1, 2)) # Full data plot(x, y, pch = 16, col = rgb(0, 0, 0, 0.3), main = "Full Data", xlab = "X", ylab = "Y") abline(lm(y ~ x), col = "blue", lwd = 2) # Separate groups plot(x[low_group], y[low_group], pch = 16, col = rgb(1, 0, 0, 0.5), xlim = range(x), ylim = range(y), main = "Low (Red) vs High (Blue) Groups", xlab = "X", ylab = "Y") points(x[high_group], y[high_group], pch = 16, col = rgb(0, 0, 1, 0.5)) abline(lm(y[low_group] ~ x[low_group]), col = "red", lwd = 2, lty = 2) abline(lm(y[high_group] ~ x[high_group]), col = "blue", lwd = 2, lty = 2) par(mfrow = c(1, 1)) # Compare estimates cat("Full data: slope =", round(coef(lm(y ~ x))[2], 3), "\n") cat("Low group: slope =", round(coef(lm(y[low_group] ~ x[low_group]))[2], 3), "\n") cat("High group: slope =", round(coef(lm(y[high_group] ~ x[high_group]))[2], 3), "\n") ``` --- ## Property 2: Scale Indeterminacy ### The Problem Because we cannot observe $a$, $b$, $c$, or $\theta$ directly, and they occur together in the model, there is **inherent indeterminacy** in the scale. Consider the expression $a_i(\theta_p - b_i)$. Now define new values: - $b_i^* = (b_i + k_1) / k_2$ - $\theta_p^* = (\theta_p + k_1) / k_2$ - $a_i^* = k_2 \times a_i$ - $c_i^* = c_i$ It will be true that: $a_i(\theta_p - b_i) = a_i^*(\theta_p^* - b_i^*)$ ### Resolving Scale Indeterminacy There are two main approaches: #### 1. Item-Side Anchoring - Place constraints on item parameters (e.g., mean item difficulty = 0) - Freely estimate the $\theta$ distribution - More common in Rasch modeling and in Europe #### 2. Person-Side Anchoring - Fix distribution of $\theta$ to have mean = 0 and SD = 1 - Freely estimate item parameters - More common in US and in achievement testing ### Practical Implications 1. **Comparing across studies**: Must ensure same scale/anchoring 2. **Linking**: Need anchor items or anchor persons to equate scales 3. **Interpretation**: The absolute value of $\theta$ is arbitrary; only relative positions matter --- ## Summary ### Assumptions | Assumption | Description | Violation Consequences | |:-----------|:------------|:-----------------------| | Local Independence | Responses independent given $\theta$ | Biased parameter estimates | | Dimensionality | Correct number of latent dimensions | Local dependence, poor fit | | Functional Form | ICC follows logistic/normal ogive | Poor item fit | | Continuous $\theta$ | Latent variable is continuous | (Rarely problematic) | ### Properties | Property | Description | Practical Implication | |:---------|:------------|:---------------------| | Parameter Invariance | Parameters same across groups | Enables fair comparison | | Scale Indeterminacy | Scale defined up to linear transform | Need anchoring conventions | ### Key Takeaways 1. **Local independence** is crucial - violations can seriously bias estimates 2. **Dimensionality** should be checked before fitting unidimensional models 3. **Parameter invariance** is a property, not an assumption - it holds if model fits 4. **Scale indeterminacy** means we need conventions to interpret parameters 5. Always check model assumptions before interpreting results!