Index of Key Terms
This alphabetized glossary covers the terms a student should be comfortable with at the end of EDUC 8720. Each entry links to the chapter (and, where applicable, the section) where the term is developed. Click the term to jump there.
Jump to: A · B · C · D · E · F · G · H · I · J · L · M · N · O · P · Q · R · S · T · U · V · W
A
1PL model. One-parameter logistic IRF in which items vary only in difficulty. Equivalent to the Rasch model up to a scaling constant.
2PL model. Two-parameter logistic IRF that adds a discrimination parameter \(a\), allowing ICCs to differ in slope.
3PL model. Three-parameter logistic IRF that adds a lower-asymptote (“guessing”) parameter \(c\) to handle multiple-choice items.
Anchor items. Items common to two forms used to put them on a common scale in a NEAT design.
Anchored calibration. A single concurrent run of mirt with anchor-item parameters fixed from a prior calibration.
B
Bias (item bias). A validity judgment: an item functions differentially and the differential functioning is attributable to a construct-irrelevant source.
C
Classical Item Analysis. The diagnostic examination of item statistics (difficulty, discrimination, distractor behavior) prior to fitting an IRT model.
Classical Test Theory (CTT). The measurement framework in which observed scores are decomposed into true score plus random error.
Compensatory MIRT. A multidimensional IRT model in which low ability on one dimension can be offset by high ability on another.
Computerized Adaptive Testing (CAT). A test administration in which each examinee receives a personalized sequence of items chosen in real time to maximize measurement precision.
Concurrent calibration. A single-run equating strategy in which item parameters from multiple forms are estimated simultaneously on a common metric.
Content constraints (CAT). Rules that enforce coverage of content specifications during adaptive item selection.
Cronbach’s alpha. A lower bound on reliability under the assumption of essentially tau-equivalent forms; the most commonly reported internal-consistency index.
D
Differential Item Functioning (DIF). A statistical finding that examinees from different groups with equal ability have different probabilities of a correct response.
Discrimination parameter (\(a\)). The slope of the ICC at its inflection point; controls how sharply the probability of a correct response changes with \(\theta\).
Distractor analysis. Examination of the performance of incorrect multiple-choice options to detect miskeyed or non-functioning distractors.
E
EAP (Expected A Posteriori). The mean of the posterior distribution of \(\theta\) given a response pattern; the default person-score estimator in mirt.
EM algorithm. The Expectation-Maximization algorithm used in MMLE to alternate between imputing expected latent trait values and re-estimating item parameters.
Equating. Adjusting the scores from alternate forms so that they are interchangeable, satisfying Lord’s equating criteria.
Equipercentile equating. A non-parametric equating method that matches scores at equal percentile ranks.
ETS delta. A log-transformation of the Mantel-Haenszel odds ratio with thresholds (A/B/C) for classifying DIF severity.
F
Fisher information. The expected curvature of the log-likelihood; the basis of item and test information functions.
Fixing item parameters. Using mirt’s pars = values and related mechanisms to constrain selected item parameters during estimation.
G
Generalized Partial Credit Model (GPCM). A polytomous IRT model that extends the PCM by allowing items to vary in discrimination.
Graded Response Model (GRM). Samejima’s polytomous IRT model for ordered categories, formulated via cumulative category boundaries.
Guessing parameter (\(c\)). The lower asymptote of the ICC; the probability of a correct response as \(\theta \to -\infty\).
H
Haebara method. A characteristic-curve linking method that minimizes the squared difference between anchor-item ICCs across forms.
I
Impact. A between-group difference in expected item performance that reflects real ability differences, not bias.
Information function (item). The expected curvature of the log-likelihood at a given \(\theta\); the item’s contribution to measurement precision.
Information function (test). The sum of item information functions; its inverse square root is the CSEM for \(\theta\).
Invariance (parameter). The IRT property that item parameters do not depend on the sample of examinees, and person parameters do not depend on the set of items.
Item Bank. The calibrated pool of items from which a CAT algorithm selects.
Item Characteristic Curve (ICC). A function relating \(\theta\) to the probability of a correct response for a given item.
Item Difficulty (\(b\)). The location on the \(\theta\) scale at which the probability of a correct response reaches its inflection point.
Item purification. The iterative DIF procedure in which flagged items are excluded from the matching criterion and the analysis is re-run.
J
Joint Maximum Likelihood (JMLE). A parameter-estimation strategy that estimates item and person parameters simultaneously by alternating Newton-Raphson steps.
L
Latent trait (\(\theta\)). The unobserved attribute that an IRT model places examinees on.
Linking. Placing item and person parameters from separate calibrations on a common scale, usually via linear or characteristic-curve transformations.
Local independence. The IRT assumption that, conditional on \(\theta\), item responses are statistically independent.
Logistic function. The sigmoidal link function \(\{1 + \exp[-\text{logit}]\}^{-1}\) used in the 1PL/2PL/3PL.
Logit. The log-odds of a correct response; the linear predictor in a logistic IRT model.
lordif algorithm. Iterative ordinal-logistic-regression DIF detection with IRT-based matching criterion and purification, implemented in the R package of the same name.
M
Mantel-Haenszel. A contingency-table DIF procedure that computes a common odds ratio across ability-matched strata.
MAP (Maximum A Posteriori). The mode of the posterior of \(\theta\) given a response pattern.
Marginal Maximum Likelihood (MMLE). A parameter-estimation strategy that integrates \(\theta\) out of the joint likelihood using a prior, then maximizes via EM.
Mean-mean linking. A linking transformation based on matching the means of anchor-item \(b\)’s and \(a\)’s across forms.
Mean-sigma linking. A linking transformation based on matching means and standard deviations of anchor-item \(b\)’s across forms.
mirt. The R package for multidimensional item response theory that we use for almost all estimation and plotting.
Multidimensional IRT (MIRT). IRT models in which the latent trait is a vector \(\boldsymbol{\theta}\) rather than a scalar.
N
NEAT design. Non-Equivalent groups with Anchor Test—a data collection design in which two samples take different forms linked by common items.
Newton-Raphson. The iterative optimization algorithm used in JMLE estimation steps.
Nominal Response Model (NRM). Bock’s polytomous IRT model for unordered response categories.
Normal ogive. The CDF of a standard normal; an alternative to the logistic as the IRF link function.
Non-uniform DIF. A DIF pattern in which the two groups’ ICCs cross — the item favors different groups at different \(\theta\) levels.
O
Oblique rotation. A factor rotation that allows dimensions to be correlated (e.g., oblimin, promax).
P
Parallel forms. Tests assumed to have identical true scores and identical error variances across examinees.
Partial Credit Model (PCM). Masters’ polytomous IRT model for ordered categories built from adjacent-category logits.
Person Parameter Estimation. Estimation of \(\theta\) for each examinee using MLE, MAP, or EAP.
Point-biserial correlation. The correlation between a dichotomous item score and the total test score; a classical discrimination index.
Posterior distribution. The distribution of \(\theta\) given the observed responses, combining the prior and the likelihood.
Prior distribution. The assumed population distribution of \(\theta\) (typically \(\mathcal{N}(0,1)\)) used in Bayesian person scoring and in MMLE.
Q
Quadrature. Numerical integration over a set of support points with associated weights; used in MMLE to integrate over the latent trait.
R
Rasch model. The 1-parameter logistic model; from the Rasch tradition, a requirement for fundamental measurement rather than a special case of the 2PL.
Rating Scale Model (RSM). Andrich’s polytomous model in which threshold locations are constrained to be the same across items.
Reliability coefficient. The proportion of observed-score variance attributable to true-score variance.
S
Scale indeterminacy. The fact that the IRT \(\theta\) metric is identified only up to a linear transformation; requires an anchoring choice.
Scale constraints (mirt). The specific identification choices in mirt (mean-variance on \(\theta\) or anchoring on an item) that pin down the metric.
SEM (Standard Error of Measurement, CTT). \(s_X\sqrt{1-r_{xx'}}\); the classical measure of individual-score imprecision.
SEM / CSEM (IRT). The conditional standard error of measurement, computed from the test information function as \(1/\sqrt{I(\theta)}\).
Spearman-Brown formula. Projects how reliability changes when test length is multiplied by a factor \(k\).
Specific objectivity. Rasch’s requirement that comparisons between persons be independent of the items used, and vice versa.
Stocking-Lord method. A characteristic-curve linking method that minimizes the squared difference between anchor-item TCCs across forms.
T
Test Characteristic Curve (TCC). The sum of item ICCs as a function of \(\theta\); the expected test score.
Thurstone’s method of absolute scaling. A classical vertical-scaling technique based on linking grade-level \(z\)-scores on common items.
True score. The expected value of an examinee’s observed score over hypothetical parallel administrations.
U
Unidimensionality. The IRT assumption that a single latent trait accounts for the responses.
Uniform DIF. A DIF pattern in which the ICCs of the two groups are parallel — one group is uniformly favored across \(\theta\).
V
Varimax rotation. An orthogonal rotation that maximizes the variance of squared loadings within each factor.
Vertical scaling. Construction of a score scale that spans multiple grade levels so longitudinal growth can be described.
W
Wright map / Item-person map. A joint visualization of item difficulties and person abilities on the same logit scale.