Psychometric Modeling with IRT

A Companion Compilation for EDUC 8720 (Spring 2026)

Author

Derek C. Briggs

Published

April 22, 2026

Preface

This book is a hands-on companion to EDUC 8720: Psychometric Modeling: IRT, a doctoral seminar at the University of Colorado Boulder. It collects, in one place, the R Markdown tutorials that I developed with the help of Claude Code across the Spring 2026 semester so that students have a single searchable reference for the concepts and ideas about psychometric modeling that they are expected to be learning.

What the Course is About

EDUC 8720 focuses on item response theory (IRT) and its applications in educational and psychological testing. IRT models express the probability that a person responds in a higher category of a test or survey item as a function of item parameters (difficulty, discrimination, and sometimes guessing or upper-asymptote parameters) and a person parameter (typically denoted $\theta$, representing the latent attribute being measured). The course covers the core features of IRT—model specification and interpretation, parameter estimation, and evaluation of model fit—before turning to extensions for polytomously scored items and models that relax unidimensionality (multidimensional IRT). In the second half of the course we examine the practical uses to which IRT is put:

ensuring that test scores are comparable over time even as items and examinees change (test equating and linking),
building score scales that span multiple grades so that we can make inferences about longitudinal growth (vertical scaling),
constructing item banks that can be administered adaptively (computerized adaptive testing), and
evaluating whether items function fairly across groups (differential item functioning).

Students approach these topics from two distinct traditions—the Rasch tradition, which emphasizes parameter separation and invariance as prerequisites for fundamental measurement, and the Lord/Birnbaum tradition, which treats IRT primarily as a flexible toolbox for efficient test construction. Both traditions receive attention here.

All computation uses the R language, primarily through the mirt package. A few additional packages (e.g. lordif, mirtCAT, psych) appear where appropriate.

What This Book Is Not

This compilation is not a textbook replacement. It does not reproduce the required readings listed in the syllabus (Baker & Kim 2017; Embretson & Reise 2000; the ITEMS modules and journal articles posted to Canvas). Rather, it sits alongside those readings as a place where the computational side of the ideas is worked through in R, end to end.

Nor is this a substitute for class discussion, office hours, or the oral exam. The course’s stated goal is functional literacy in IRT: enough vocabulary and hands-on fluency to read the psychometric literature critically, apply IRT models to real data with the help of references like this one, and interpret the output sensibly. Everything here is in service of that goal.

How This Book Is Organized

The chapters follow the arc of the semester rather than strict dependency order, but the progression is cumulative:

Chapters 1–2 revisit the classical test theory foundation from EDUC 8710 and introduce classical item analysis, which is the diagnostic work that should precede any IRT fit.
Chapters 3–6 develop the basics of IRT for dichotomously scored items: deriving the item characteristic curve (ICC), interpreting it, building up the test characteristic curve (TCC) and information functions, and stating and checking the assumptions of the model.
Chapter 8 is a practical guide to the mirt package syntax that the remaining chapters rely on.
Chapter 9 presents the Rasch model and the *Rasch model paradigm** as a distinct approach in the way it positions IRT as a model for measurement.
Chapter 10 clarifies the scale constraints that identify the IRT metric in mirt.
Chapters 11–13 work through parameter estimation from the person side (EAP, MAP, MLE) and the item side (JMLE and MMLE with EM).
Chapter 14 addresses model fit.
Chapter 15 extends the machinery to polytomously scored items (PCM, GPCM, GRM, RSM, NRM).
Chapters 16–21 turn to applications: fixing item parameters, linking and equating, differential item functioning, vertical scaling, computerized adaptive testing, and multidimensional IRT.
The final chapter is an Index of Key Terms—a clickable glossary that takes you directly to the section where each term is developed.

Chapters vary in depth. Some are short enough to read in one sitting; others (especially the parameter-estimation chapters) unpack derivations slowly and are meant to be worked through with the relevant reading open alongside.

On the Use of AI

This compilation was co-authored by the instructor with Claude (Anthropic’s conversational AI assistant), using the Claude Code environment with (primarily) Opus 4.6. The syllabus discusses at some length what responsible student use of AI looks like in this course. I will say here only what I tell students: AI is a legitimate tool for checking and extending your understanding, provided you have already done the work of establishing that understanding yourself. Every chapter in this book was drafted through a back-and-forth process in which I specified the pedagogical structure, the order of topics, and the interpretive framing, while Claude handled the heavy lifting of consistent R code, visual presentation, and prose scaffolding. I edited every chapter for accuracy and emphasis. Errors that remain are my own.

How to Use This Book

Read linearly if you are taking the course—each chapter builds on earlier ones.
Jump around if you are using this as a reference after the course—the Index of Key Terms at the end is designed for that.
Clone the repository if you want to knit the chapters yourself. The original R Markdown files live in topic-specific folders in the Measurement and Psychometrics repository; the chapters here are copies that preserve each tutorial’s working directory so that the data files load correctly.
Do not treat this as a substitute for the required readings. Functional literacy requires both the conceptual grounding the readings provide and the computational fluency the tutorials build.

Derek C. Briggs University of Colorado Boulder April 19, 2026

# Preface {.unnumbered} This book is a hands-on companion to **EDUC 8720: Psychometric Modeling: IRT**, a doctoral seminar at the University of Colorado Boulder. It collects, in one place, the R Markdown tutorials that I developed with the help of Claude Code across the Spring 2026 semester so that students have a single searchable reference for the concepts and ideas about psychometric modeling that they are expected to be learning. ## What the Course is About EDUC 8720 focuses on **item response theory (IRT)** and its applications in educational and psychological testing. IRT models express the probability that a person responds in a higher category of a test or survey item as a function of item parameters (difficulty, discrimination, and sometimes guessing or upper-asymptote parameters) and a person parameter (typically denoted $\theta$, representing the latent attribute being measured). The course covers the core features of IRT—**model specification and interpretation, parameter estimation, and evaluation of model fit**—before turning to extensions for **polytomously scored items** and models that relax unidimensionality (**multidimensional IRT**). In the second half of the course we examine the practical uses to which IRT is put: - ensuring that test scores are **comparable over time** even as items and examinees change (test equating and linking), - building score scales that span multiple grades so that we can make inferences about **longitudinal growth** (vertical scaling), - constructing **item banks that can be administered adaptively** (computerized adaptive testing), and - evaluating whether items function fairly across groups (**differential item functioning**). Students approach these topics from **two distinct traditions**—the **Rasch tradition**, which emphasizes parameter separation and invariance as prerequisites for fundamental measurement, and the **Lord/Birnbaum tradition**, which treats IRT primarily as a flexible toolbox for efficient test construction. Both traditions receive attention here. All computation uses the **R** language, primarily through the **`mirt`** package. A few additional packages (e.g. `lordif`, `mirtCAT`, `psych`) appear where appropriate. ## What This Book Is Not This compilation is not a textbook replacement. It does not reproduce the required readings listed in the syllabus (Baker & Kim 2017; Embretson & Reise 2000; the ITEMS modules and journal articles posted to Canvas). Rather, it sits alongside those readings as a place where the *computational* side of the ideas is worked through in R, end to end. Nor is this a substitute for class discussion, office hours, or the oral exam. The course's stated goal is **functional literacy** in IRT: enough vocabulary and hands-on fluency to read the psychometric literature critically, apply IRT models to real data with the help of references like this one, and interpret the output sensibly. Everything here is in service of that goal. ## How This Book Is Organized The chapters follow the arc of the semester rather than strict dependency order, but the progression is cumulative: - **Chapters 1–2** revisit the **classical test theory** foundation from EDUC 8710 and introduce classical item analysis, which is the diagnostic work that should precede any IRT fit. - **Chapters 3–6** develop the **basics of IRT** for dichotomously scored items: deriving the item characteristic curve (ICC), interpreting it, building up the test characteristic curve (TCC) and information functions, and stating and checking the assumptions of the model. - **Chapter 8** is a practical guide to the `mirt` package syntax that the remaining chapters rely on. - **Chapter 9** presents the **Rasch model** and the *Rasch model paradigm** as a distinct approach in the way it positions IRT as a model for measurement. - **Chapter 10** clarifies the **scale constraints** that identify the IRT metric in `mirt`. - **Chapters 11–13** work through **parameter estimation** from the person side (EAP, MAP, MLE) and the item side (JMLE and MMLE with EM). - **Chapter 14** addresses **model fit**. - **Chapter 15** extends the machinery to **polytomously scored items** (PCM, GPCM, GRM, RSM, NRM). - **Chapters 16–21** turn to applications: **fixing item parameters**, **linking and equating**, **differential item functioning**, **vertical scaling**, **computerized adaptive testing**, and **multidimensional IRT**. - The final chapter is an **Index of Key Terms**—a clickable glossary that takes you directly to the section where each term is developed. Chapters vary in depth. Some are short enough to read in one sitting; others (especially the parameter-estimation chapters) unpack derivations slowly and are meant to be worked through with the relevant reading open alongside. ## On the Use of AI This compilation was co-authored by the instructor with **Claude** (Anthropic's conversational AI assistant), using the Claude Code environment with (primarily) Opus 4.6. The syllabus discusses at some length what responsible student use of AI looks like in this course. I will say here only what I tell students: AI is a legitimate tool for *checking* and *extending* your understanding, provided you have already done the work of establishing that understanding yourself. Every chapter in this book was drafted through a back-and-forth process in which I specified the pedagogical structure, the order of topics, and the interpretive framing, while Claude handled the heavy lifting of consistent R code, visual presentation, and prose scaffolding. I edited every chapter for accuracy and emphasis. Errors that remain are my own. ## How to Use This Book - **Read linearly** if you are taking the course—each chapter builds on earlier ones. - **Jump around** if you are using this as a reference after the course—the Index of Key Terms at the end is designed for that. - **Clone the repository** if you want to knit the chapters yourself. The original R Markdown files live in topic-specific folders in the [Measurement and Psychometrics repository](https://github.com/DerekCBriggs/Measurement-and-Psychometrics); the chapters here are copies that preserve each tutorial's working directory so that the data files load correctly. - **Do not treat this as a substitute for the required readings**. Functional literacy requires both the conceptual grounding the readings provide and the computational fluency the tutorials build. Derek C. Briggs University of Colorado Boulder April 19, 2026