Return to PACER
Technical Reference

Statistical Methods
Reference

PACER is built on a fully transparent statistical foundation. This reference documents every method — the probability models, estimation algorithms, and numerical procedures — exactly as they are implemented. Nothing is a black box. Throughout, PACER employs modern computational methods including parallel processing, numerical stabilization, and efficient algorithms designed to perform well at scale (Doran, 2023).

Item Analysis

Classical Test Theory

CTT characterizes items and tests using observed-score statistics from an \(N\times K\) response matrix (\(N\) respondents, \(K\) items). Items may be dichotomous (0/1) or polytomous (non-negative integers). All variance estimates use the sample \((n-1)\) denominator throughout. Missing data are handled pairwise (default) or listwise.

Item-Level Descriptive Statistics

Mean, SD, and Standard Error

Let \(\mathcal{V}_j\) be the set of respondents with non-missing responses to item \(j\) and \(n_j=|\mathcal{V}_j|\). Then

\[\bar{x}_j=\frac{1}{n_j}\sum_{i\in\mathcal{V}_j}x_{ij},\quad s_j=\sqrt{\frac{\sum_{i\in\mathcal{V}_j}(x_{ij}-\bar{x}_j)^2}{n_j-1}},\quad \widehat{SE}_j=\frac{s_j}{\sqrt{n_j}}\]

Relative Difficulty (p-value)

\[p_j=\frac{\bar{x}_j}{x_j^{\max}}\]

Items with \(p_j<0.15\) or \(p_j>0.85\) are flagged. For dichotomous items this equals the classical p-value.

Discrimination Indices

Adjusted Point-Biserial (Item-Rest Correlation)

Item \(j\) is correlated with the sum of the remaining \(K-1\) items, eliminating the inflation from item \(j\) appearing in its own total:

\[T_i^{(-j)}=\sum_{k\neq j}x_{ik},\qquad r_{pb,j}=\frac{\sum_i(x_{ij}-\bar{x}_j)(T_i^{(-j)}-\overline{T^{(-j)}})}{\sqrt{\sum_i(x_{ij}-\bar{x}_j)^2\cdot\sum_i(T_i^{(-j)}-\overline{T^{(-j)}})^2}}\]

This is Pearson's \(r\), computed pairwise over non-missing observations. Items with \(r_{pb,j}<0.20\) are flagged.

Biserial and Polyserial Correlation

These indices estimate the latent linear association assuming the true variable is continuous and normal but observed only in coarsened categories. For an item with \(M\) categories, let \(\hat{F}_m\) be the cumulative proportion through category \(m\). The internal threshold is \(z_m=\Phi^{-1}(1-\hat{F}_m)\) and its standard normal ordinate is \(\varphi(z_m)\):

\[r_{\text{bis},j}=\frac{s_j\cdot r_{pb,j}}{\displaystyle\sum_{m=1}^{M-1}\varphi(z_m)}\]

The last category is excluded because its threshold is \(+\infty\). For \(M=2\) this reduces to the classical biserial formula. \(\Phi^{-1}\) is computed via the Beasley-Springer-Moro rational approximation (accuracy \(\approx10^{-9}\)).

Reliability and Standard Error of Measurement

Cronbach's \(\alpha\)

\[\alpha=\frac{K}{K-1}\!\left(1-\frac{\sum_{j=1}^K s_j^2}{s_T^2}\right)\]

where \(s_j^2\) is the sample variance of item \(j\) and \(s_T^2\) is the sample variance of total scores (Cronbach, 1951). Both use the \((n-1)\) denominator.

Alpha-if-Item-Deleted

For each item \(j\), \(\alpha\) is recomputed on the \((K-1)\)-item submatrix formed by removing column \(j\). An increase signals that item \(j\) is reducing internal consistency.

Standard Error of Measurement

\[\mathrm{SEM}=s_T\sqrt{1-\alpha}\]
IRT Tools — Models

Item Response Models

PACER supports binary and polytomous items in any mixture within a single test. The scaling constant \(D\) defaults to 1.7 (close to the normal-ogive metric; \(D=1\) gives the pure logistic metric). All probabilities are clamped to \([\varepsilon,\,1-\varepsilon]\) with \(\varepsilon\approx1.49\times10^{-8}\) to prevent \(\log 0\) in the likelihood.

Binary Item Response Models

1PL2PL3PL

Three-Parameter Logistic (3PL)

\[P(X=1\mid\theta,a,b,c)=c+(1-c)\,\frac{1}{1+e^{-Da(\theta-b)}}\]
ParamDescription
aDiscrimination — slope at the ICC inflection point; constrained \(a>0\).
bDifficulty — \(\theta\) at the inflection point; \(P(b)=(1+c)/2\).
cPseudo-guessing — lower asymptote on the probability scale; estimated on the logit scale internally.

2PL and 1PL / Rasch

Setting \(c=0\) gives the 2PL. The 1PL additionally fixes \(a=1\) and estimates a free population standard deviation \(\sigma\) that scales the quadrature nodes: \(\theta_k^*=\sigma z_k\), where \(\{z_k\}\) are the standard GH nodes.

Graded Response Model (GRM)

GRM

The GRM (Samejima, 1969) models items with \(K+1\) ordered categories \((0,1,\ldots,K)\). The boundary response function gives the cumulative probability of responding in category \(k\) or higher:

\[P^*(X\ge k\mid\theta)=\frac{1}{1+e^{-Da(\theta-d_k)}},\quad k=1,\ldots,K\]

with ordered thresholds \(d_1

\[P(X=k\mid\theta)=P^*(X\ge k\mid\theta)-P^*(X\ge k+1\mid\theta)\]

Boundary probabilities use the sign convention \(p_{kq}=\sigma(Da(d_k-\theta_q))\), so larger \(d_k\) means a harder threshold. Category probabilities are formed by differencing consecutive rows of the \(K\times Q\) boundary matrix.

Generalized Partial Credit Model (GPCM) and PCM

GPCMPCM

The GPCM (Muraki, 1992) uses adjacent-category transitions. For an item with \(K+1\) categories \((0,\ldots,K)\), define the log-numerator of category \(k\):

\[\eta_k(\theta)=Da\!\left(k\theta-\sum_{j=1}^{k}d_j\right),\quad k=0,1,\ldots,K\]

with \(\eta_0\equiv0\) and a leading \(d_0=0\) prepended to the step-parameter vector. Category probabilities follow the multinomial logit (softmax):

\[P(X=k\mid\theta)=\frac{e^{\eta_k(\theta)}}{\sum_{j=0}^{K}e^{\eta_j(\theta)}}\]

Evaluated via log-sum-exp: \(P(X=k)=\exp(\eta_k-M)/\sum_j\exp(\eta_j-M)\) where \(M=\max_j\eta_j\), preventing overflow at extreme \(\theta\). The PCM (Masters, 1982) is the GPCM with \(a\equiv1\).

ConventionThe step-parameter vector \(\bm{d}\) always has length equal to the number of categories, with \(d_0=0\) prepended. Scores are 1-based internally; scores used in expected-value and information calculations are 0-based.

Rating Scale Model (RSM)

RSM

The RSM (Andrich, 1978) constrains the PCM so that all items share identical step-difficulty offsets \(\tau_1,\ldots,\tau_K\) (\(\tau_0=0\) prepended by convention). Each item has a unique location \(\delta_i\), and its step parameters are

\[d_{ij}=\delta_i+\tau_j,\quad j=0,1,\ldots,K\]

Response probabilities are evaluated with the PCM formula using these compound parameters. The free parameter vector is \([\delta_1,\ldots,\delta_I,\,\tau_1,\ldots,\tau_K]\).

IRT Tools — Calibration

Item Calibration: Marginal Maximum Likelihood via EM

PACER estimates item parameters by marginal maximum likelihood (MML), integrating the latent trait \(\theta\) out of the likelihood. Because \(\theta\) is unknown, direct maximization of the item parameters is not feasible; instead PACER forms the marginal likelihood and maximizes it using the Bock-Aitkin (1981) EM algorithm. Unique response patterns are identified before the E-step to eliminate redundant likelihood evaluations.

The Marginal Likelihood

The goal is to estimate the full vector of item parameters \(\bm{\gamma}\). Since the person parameters \(\theta_i\) are also unknown, they are integrated out under an assumed population distribution \(g(\theta\mid\mu,\sigma)\):

\[\mathcal{L}(\bm{\gamma}\mid\bm{X})=\prod_{i=1}^{N}\int_{-\infty}^{\infty}p(\bm{x}_i\mid\theta_i,\bm{\gamma})\,g(\theta\mid\mu,\sigma)\,d\theta\]

The integral has no closed form and is evaluated by Gauss-Hermite quadrature over \(Q\) nodes:

\[\mathcal{L}(\bm{\gamma}\mid\bm{X})=\prod_{i=1}^{N}\sum_{k=1}^{Q}p(\bm{x}_i\mid\theta_k,\bm{\gamma})\,w_k\]

In practice, logs of individual pattern likelihoods are accumulated rather than computing products, ensuring numerical stability. When the data contain both binary and polytomous items (\(J\) binary, \(M\) polytomous), the pattern likelihood factors as a product over both item types:

\[p(\bm{x}_i\mid\theta,\bm{\gamma})=\prod_{j=1}^{J}P_j(x_{ij}\mid\theta)\cdot\prod_{l=1}^{M}P_l(x_{il}\mid\theta)\]

where \(P_j\) is the binary ICC (1PL/2PL/3PL) and \(P_l\) is the polytomous category probability (GPCM or GRM). Both item types are estimated within the same EM loop.

Model Identification

For the 2PL and 3PL, the population distribution is fixed at \(N(0,1)\) to identify the latent scale. For the 1PL (Rasch model), only the mean need be fixed; PACER therefore estimates \(N(0,\sigma)\), with \(\sigma\) as a free parameter interpretable as the common discrimination across all items.

Starting Values

Because the EM algorithm is iterative, starting values are required. Three strategies are available. Random draws each parameter from a uniform distribution over the admissible parameter space — useful when the likelihood may be multimodal; running the analysis multiple times with different random starts is a recommended diagnostic. Fixed sets \(a_j=1\), \(b_j=0\), \(c_j=0.25\) for all binary items and uses ordered random starting values for polytomous step parameters. Logistic uses logistic regression intercepts to initialize the difficulty parameters (and proportional-odds intercepts for polytomous step parameters), with \(a\) and \(c\) initialized as in the fixed option — this is the default and generally performs best.

E-Step: Posterior Weights and Sufficient Statistics

Given current item parameters, the likelihood of response pattern \(\bm{x}_i\) at node \(\theta_k\) is

\[L_k(\bm{x}_i)=\prod_{j=1}^{J}P_j(X_j=x_{ij}\mid\theta_k)\]

The posterior weight is

\[h_{ik}=\frac{L_k(\bm{x}_i)\,w_k}{\displaystyle\sum_{k'=1}^{Q}L_{k'}(\bm{x}_i)\,w_{k'}}\]

The marginal log-likelihood used to monitor convergence is

\[\ell=\sum_{i=1}^{N}\log\sum_{k=1}^{Q}L_k(\bm{x}_i)\,w_k\]

Binary Sufficient Statistics

For each binary item \(j\) and node \(k\), summed over respondents \(\mathcal{C}_j\) who answered item \(j\):

\[r_{jk}=\sum_{i\in\mathcal{C}_j}h_{ik}\,x_{ij},\qquad f_{jk}=\sum_{i\in\mathcal{C}_j}h_{ik}\]

\(r_{jk}\) is the expected number of correct responses at node \(k\); \(f_{jk}\) is the expected total count.

Polytomous Sufficient Statistics

For polytomous item \(j\), the sufficient statistic at node \(k\) for category \(s\) is

\[\tilde{n}_{jsk}=\sum_{i\in\mathcal{C}_j}h_{ik}\,\mathbf{1}[x_{ij}=s]\]

In the pattern-based implementation, the posterior weight for unique pattern \(p\) at node \(k\) is \(r_{p,k}=c_p\cdot w_k\cdot L_k/m_p\) where \(c_p\) is the pattern count and \(m_p=\sum_k L_k w_k\).

M-Step: Binary Items (2PL / 1PL / 3PL)

The M-step maximizes the expected complete-data log-likelihood for item \(j\):

\[Q_j(a_j,b_j,c_j)=\sum_{k=1}^{Q}\bigl[r_{jk}\ln P_{jk}+(f_{jk}-r_{jk})\ln(1-P_{jk})\bigr]\]

PACER computes analytical gradients for the 2PL. The gradients with respect to \(a_j\) and \(b_j\) are:

\[\frac{\partial Q_j}{\partial a_j}=\sum_k\frac{L_k\,w_k}{m}\cdot\frac{x_j-P_{jk}}{P_{jk}(1-P_{jk})}\cdot D(\theta_k-b_j)P_{jk}(1-P_{jk})\]
\[\frac{\partial Q_j}{\partial b_j}=\sum_k\frac{L_k\,w_k}{m}\cdot\frac{x_j-P_{jk}}{P_{jk}(1-P_{jk})}\cdot(-Da_j)P_{jk}(1-P_{jk})\]

In the multigroup path, the gradient accumulates across all groups \(g\) using group-scaled nodes \(\theta_{gk}=\mu_g+\sigma_g z_k\):

\[\frac{\partial(-\ell_j)}{\partial a_j}=-\sum_g\sum_k D(\theta_{gk}-b_j)\bigl(r_g[j,k]-f_g[j,k]\,P_{jk}^{(g)}\bigr)\]

For the 3PL, \(c_j\) is parameterized on the logit scale internally; its gradient is computed by forward differences.

M-Step: GPCM and PCM

The complete-data log-likelihood for GPCM item \(j\) is the multinomial log-likelihood in expected category counts:

\[Q_j^{\text{GPCM}}(a_j,\bm{d}_j)=\sum_{k=1}^{Q}\sum_{s=0}^{K_j}\tilde{n}_{jsk}\ln P(X_j=s\mid\theta_k;\,a_j,\bm{d}_j)\]

The analytical gradient with respect to \(a_j\) uses the conditional mean of the log-numerator:

\[\bar{\eta}_k=\sum_{s=0}^{K}P(X_j=s\mid\theta_k)\cdot\!\left(s\theta_k-\sum_{m=1}^{s}d_m\right)\]
\[\frac{\partial Q_j^{\text{GPCM}}}{\partial a_j}=D\sum_k\sum_{\text{pat}}r_{\text{pat},k}\Bigl[\Bigl(s_{\text{pat}}\theta_k-\sum_{m=1}^{s_{\text{pat}}}d_m\Bigr)-\bar{\eta}_k\Bigr]\]

For step parameter \(d_j\) (\(j=1,\ldots,K\)), let \(P_k^{\ge j}=\sum_{s\ge j}P(X=s\mid\theta_k)\) be the exceedance probability. Then:

\[\frac{\partial Q_j^{\text{GPCM}}}{\partial d_j}=Da_j\sum_k\sum_{\text{pat}}r_{\text{pat},k}\Bigl[P_k^{\ge j}-\mathbf{1}[s_{\text{pat}}\ge j]\Bigr]\]

This is a score-residual: model-expected exceedance minus the observed indicator. For the PCM, \(a_j\equiv1\) and the \(a\)-gradient is not computed.

M-Step: GRM

The complete-data log-likelihood for GRM item \(j\) is:

\[Q_j^{\text{GRM}}(a_j,\bm{d}_j)=\sum_k\sum_{\text{pat}}r_{\text{pat},k}\ln P(X_j=s_{\text{pat}}\mid\theta_k;\,a_j,\bm{d}_j)\]

Let \(f_k^{(m)}=P^*(X\ge m\mid\theta_k)\cdot[1-P^*(X\ge m\mid\theta_k)]\) be the logistic slope of the \(m\)-th boundary. The gradient with respect to \(a_j\) depends on which category was observed:

\[\frac{\partial Q_j^{\text{GRM}}}{\partial a_j}=\sum_k\frac{r_{\text{pat},k}}{P_{sk}}\cdot\Delta_k(s)\]

The term \(\Delta_k(s)\) depends on the observed category \(s\):

Category\(\Delta_k(s)\)
\(s=0\)\(D(d_1-\theta_k)\,f_k^{(1)}\)
\(0 < s < K\)\(D(d_s-\theta_k)\,f_k^{(s)}-D(d_{s-1}-\theta_k)\,f_k^{(s-1)}\)
\(s=K\)\(-D(d_K-\theta_k)\,f_k^{(K)}\)

The gradient with respect to threshold \(d_m\) is nonzero only for the two categories adjacent to it:

\[\frac{\partial Q_j^{\text{GRM}}}{\partial d_m}=\sum_k\frac{r_{\text{pat},k}}{P_{sk}}\Bigl[\mathbf{1}[s=m]\cdot Da_j f_k^{(m)}+\mathbf{1}[s=m-1]\cdot(-Da_j f_k^{(m)})\Bigr]\]

GRM item gradients are parallelized over items at the M-step.

M-Step: RSM

The RSM free parameter vector is \(\bm{\xi}=[\delta_1,\ldots,\delta_I,\,\tau_1,\ldots,\tau_K]\). Within the optimizer, item \(i\)'s step vector is \(\bm{d}_i=[0,\,\delta_i+\tau_1,\ldots,\delta_i+\tau_K]\) and probabilities are evaluated with the PCM formula (\(a=1\)). The gradients follow from the GPCM step gradient with \(a=1\):

\[\frac{\partial Q^{\text{RSM}}}{\partial\delta_i}=D\sum_k\sum_{\text{pat}}r_{\text{pat},k}\sum_{j=1}^{K}\Bigl[P_{ik}^{\ge j}-\mathbf{1}[s_i\ge j]\Bigr]\]
\[\frac{\partial Q^{\text{RSM}}}{\partial\tau_t}=\sum_{i=1}^{I}D\sum_k\sum_{\text{pat}}r_{\text{pat},k}\Bigl[P_{ik}^{\ge t}-\mathbf{1}[s_i\ge t]\Bigr]\]

The \(\tau_t\) gradient is the sum of per-item step gradients across all \(I\) items, since \(\partial d_{it}/\partial\tau_t=1\) for all \(i\). Item gradients are computed in parallel; \(\tau\) gradients are accumulated sequentially to avoid data races.

Parameter Priors

Priors are incorporated by augmenting the marginal log-likelihood with log-prior terms, one per item parameter:

\[\ln p(\bm{\gamma}\mid\bm{X})=\ln\mathcal{L}(\bm{\gamma}\mid\bm{X})+\sum_{j=1}^{J}\bigl[\ln f(a_j)+\ln f(b_j)+\ln f(c_j)\bigr]\]
ParameterPrior familyLog-density contribution
a — discriminationLog-normal \(\mathcal{LN}(\mu_a,\sigma_a^2)\)\(-(\ln a-\mu_a)^2/(2\sigma_a^2)\)   (default: \(\mu_a=0\), \(\sigma_a=1\))
b — difficultyNormal \(\mathcal{N}(\mu_b,\sigma_b^2)\)\(-(b-\mu_b)^2/(2\sigma_b^2)\)   (default: \(\mu_b=0\), \(\sigma_b=1\))
c — guessingBeta \(\mathcal{B}(\alpha,\beta)\) or logit-normal\((\alpha-1)\ln c+(\beta-1)\ln(1-c)\)

Priors on \(a\) and \(b\) are optional. A prior on \(c\) is applied by default when estimating the 3PL and can be disabled for very large samples.

Beta Prior for the Guessing Parameter

The beta distribution is parameterized by shape parameters \(\alpha\) and \(\beta\), but it is more natural to specify a prior mean and standard deviation. PACER converts these to shape parameters via the method of moments. Starting from

\[\mu=\frac{\alpha}{\alpha+\beta},\qquad\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\]

solving for \(\alpha\) and \(\beta\) gives

\[\alpha=\left(\frac{1-\mu}{\sigma^2}-\frac{1}{\mu}\right)\mu^2,\qquad\beta=\alpha\!\left(\frac{1}{\mu}-1\right)\]

The defaults \(\mu_c=0.20\) and \(\sigma_c=0.087\) correspond to \(\mathcal{B}(4,16)\), which places most mass below 0.35 and is appropriate for items with three to five response options. As an alternative, a logit-normal prior can be selected, which places the prior on \(\text{logit}(c)\) and may be preferable when the guessing parameter is expected to be very small.

Multigroup Calibration (Bock-Zimowski)

For data from multiple groups, PACER implements the Bock-Zimowski (1997) multigroup MML-EM. Each group \(g\) has distribution \(\theta\sim N(\mu_g,\sigma_g^2)\); the reference group (\(g=0\)) is fixed at \(N(0,1)\) for identification.

Group-Scaled Nodes

\[\theta_{gk}=\mu_g+\sigma_g z_k\]

Weights \(\{w_k\}\) are invariant because the Jacobian and density ratio cancel exactly.

Multigroup E-Step

For respondent \(i\) in group \(g\), probabilities are evaluated at \(\theta_{gk}\):

\[h_{ik}=\frac{L_k(\bm{x}_i\mid\theta_{gk})\,w_k}{\sum_{k'}L_{k'}(\bm{x}_i\mid\theta_{gk'})\,w_{k'}}\]

Sufficient statistics \(r_g[j,k]\) and \(f_g[j,k]\) are accumulated per group on the same quadrature index \(k\).

Group Distribution Update (Closed Form)

Let \(W_{gk}=\sum_{i\in g}h_{ik}\). After the item M-step:

\[\hat{\mu}_g=\frac{\sum_k W_{gk}\,\theta_{gk}}{\sum_k W_{gk}},\qquad\hat{\sigma}_g=\sqrt{\frac{\sum_k W_{gk}\,\theta_{gk}^2}{\sum_k W_{gk}}-\hat{\mu}_g^2}\]

The outer loop iterates until

\[\frac{|\ell^{(t+1)}-\ell^{(t)}|}{1+|\ell^{(t)}|}<\tau_\ell\quad\text{and}\quad\max_{g}\,\bigl(\,|\Delta\mu_g|,\;|\Delta\sigma_g|\,\bigr)<10^{-4}\]

where \(\tau_\ell=\max(\text{Tol},\,10^{-5})\). The convergence condition on the group parameters requires that both the mean and standard deviation shift by less than \(10^{-4}\) for every non-reference group.

Optimizer: L-BFGS-B

Item parameters are estimated with L-BFGS-B, a limited-memory quasi-Newton method with box constraints. The inverse Hessian approximation is built from the most recent \(m=10\) curvature pairs \(\{(\bm{s}_t,\bm{y}_t)\}\) where \(\bm{s}_t=\bm{x}_{t+1}-\bm{x}_t\) and \(\bm{y}_t=\nabla f_{t+1}-\nabla f_t\). The two-loop recursion computes the search direction \(\bm{p}_t=-H_t^{-1}\nabla f_t\) without explicitly forming \(H_t^{-1}\). Box constraints are enforced by projecting onto the active constraint set. Convergence: \(|\Delta f|/(0.1+|f|)<\tau\) (default \(\tau=10^{-8}\)). Convergence is evaluated relative to the current function value to ensure scale-invariant termination.

IRT Tools — Scoring

Ability Estimation

Given calibrated item parameters, PACER estimates each respondent's latent trait \(\hat\theta\) from their observed response vector. Four methods are available. All operate on sparse response vectors so that respondents with missing items are scored without imputation.

Maximum Likelihood Estimation (MLE)

\[\hat\theta_{\text{MLE}}=\arg\max_\theta\sum_j\ln P_j(x_j\mid\theta)\]

A dense grid from \(-8\) to \(+8\) in steps of 0.05 locates the global basin; gradient descent with backtracking then refines from the grid minimum. Extreme scores: perfect and null scores produce unbounded estimates. PACER subtracts (or adds) 0.5 from the response of the item with the smallest discrimination \(a_j\). Standard error from observed information via numerical second differentiation:

\[\widehat{SE}_{\text{MLE}}=\left[-\hat\ell''(\hat\theta)\right]^{-1/2},\quad\hat\ell''(\hat\theta)\approx\frac{\hat\ell(\hat\theta+h)-2\hat\ell(\hat\theta)+\hat\ell(\hat\theta-h)}{h^2},\quad h=10^{-5}\]

Maximum A Posteriori (MAP)

\[\hat\theta_{\text{MAP}}=\arg\max_\theta\left[\sum_j\ln P_j(x_j\mid\theta)+\ln\pi(\theta)\right],\quad\ln\pi(\theta)=-\tfrac{1}{2}\!\left(\tfrac{\theta-\mu}{\sigma}\right)^{\!2}-\ln(\sigma\sqrt{2\pi})\]

The normal prior \(N(\mu,\sigma^2)\) (defaults: \(\mu=0\), \(\sigma=1\)) is added to the log-likelihood and the same minimizer is used. MAP is always finite, including for extreme scores.

Expected A Posteriori (EAP)

EAP is the posterior mean under a normal prior, approximated on a uniform quadrature grid spanning \(\mu\pm4\sigma\) in \(Q\) steps with weights proportional to \(\phi((\theta_k-\mu)/\sigma)\) (normalized to sum to one):

\[\hat\theta_{\text{EAP}}=\frac{\sum_{k=1}^{Q}\theta_k\,L_k(\bm{x})\,w_k}{\sum_{k=1}^{Q}L_k(\bm{x})\,w_k}\]

The posterior standard deviation serves as the standard error:

\[\widehat{SE}_{\text{EAP}}=\sqrt{\frac{\sum_k(\theta_k-\hat\theta_{\text{EAP}})^2\,L_k(\bm{x})\,w_k}{\sum_k L_k(\bm{x})\,w_k}}\]

TCC Scoring

TCC scoring (Lord, 1980) inverts the Test Characteristic Curve \(\tau(\theta)=\sum_j E[X_j\mid\theta]\) to find

\[\tau(\hat\theta_{\text{TCC}})=X_{\text{obs}}\]

solved on \([-8,8]\) by Brent's root-finding method. The SE is \(1/\sqrt{I(\hat\theta_{\text{TCC}})}\) from the test information function.

Boundary Condition

A unique TCC estimate exists only when the raw score exceeds the sum of the guessing parameters across all binary items:

\[\sum_{j=1}^{J}c_j < X_{\text{obs}}\]

When this condition is not met — that is, when the observed score is no higher than what could be achieved by guessing alone — TCC assigns the same lower-bound \(\hat\theta\) to all such examinees. Similarly, a perfect raw score produces the upper-bound value defined by the search interval. The default interval \([-8,8]\) covers the range used in practice; if scores fall outside this range the interval endpoints can be widened.

Post-Calibration EAP Thetas

After calibration, PACER computes a \(\hat\theta\) for every respondent using 11-point GH quadrature with a diffuse prior \(N(0,10)\) (\(\sigma=\sqrt{10}\approx3.162\)). These thetas are used for DIF matching and item-fit analysis, not score reporting.

IRT Tools — Test Characteristics

Test Characteristic Curve, Information, and SEM

Test Characteristic Curve (TCC)

\[\text{TCC}(\theta)=\sum_{j=1}^{J}E[X_j\mid\theta]\]
ModelExpected score \(E[X_j\mid\theta]\)
1PL / 2PL / 3PL\(P_j(\theta)\) — the ICC
GRM\(\sum_{k=0}^{K}k\cdot P(X=k\mid\theta)\). Closed form: \(\sum_{k=1}^{K}P^*(X\ge k\mid\theta)\)
GPCM / PCM\(\sum_{k=0}^{K}k\cdot P(X=k\mid\theta)\) with softmax probabilities
RSMSame as PCM with \(a=1\) and \(d_{ij}=\delta_i+\tau_j\)

When proportional TCC is enabled, output is divided by \(X_{\max}=\sum_j\max_k k\) to give a 0–1 scale.

Binary Item Information

For 1PL and 2PL:

\[I_j(\theta)=D^2a_j^2\,P_j(\theta)[1-P_j(\theta)]\]

For 3PL:

\[I_j(\theta)=D^2a_j^2\,\frac{[P_j(\theta)-c_j]^2[1-P_j(\theta)]}{P_j(\theta)(1-c_j)^2}\]

GRM Item Information

Samejima (1969):

\[I_j(\theta)=D^2a_j^2\sum_{k=0}^{K}\frac{[f_k(\theta)-f_{k+1}(\theta)]^2}{P(X=k\mid\theta)}\]

where \(f_k(\theta)=P^*(X\ge k\mid\theta)[1-P^*(X\ge k\mid\theta)]\) is the logistic slope of the \(k\)-th boundary, with sentinels \(f_0\equiv0\) and \(f_{K+1}\equiv0\). Concretely, consecutive logistic-slope values are differenced and each squared difference is divided by the corresponding category probability, then summed and scaled by \(D^2a_j^2\).

GPCM / PCM / RSM Item Information

Information equals \(D^2a^2\) times the conditional variance of the item score (Muraki, 1992):

\[I_j(\theta)=D^2a_j^2\,\text{Var}(X_j\mid\theta)=D^2a_j^2\Bigl[E[X_j^2\mid\theta]-\bigl(E[X_j\mid\theta]\bigr)^2\Bigr]\]

The test-level SEM at each \(\theta\) is

\[\text{SEM}(\theta)=\frac{1}{\sqrt{I(\theta)}}=\frac{1}{\sqrt{\sum_j I_j(\theta)}}\]
IRT Tools — Equating

IRT Equating and Linking

IRT linking places parameter estimates from a new test form onto the scale of an existing item bank using common anchor items. PACER supports six methods in three families. Mixed binary and polytomous items are supported; GRM uses the cumulative-logistic expected score; GPCM/PCM/RSM use the adjacent-category expected score.

The Linear Scale Transformation

All methods estimate \((A,B)\) mapping the new-form theta scale to the bank scale: \(\theta_{\text{bank}}=A\theta_{\text{form}}+B\). Item parameters transform as

\[a_{\text{linked}}=\frac{a_{\text{form}}}{A},\quad b_{\text{linked}}=Ab_{\text{form}}+B,\quad c_{\text{linked}}=c_{\text{form}},\quad d_{\text{linked},k}=Ad_{\text{form},k}+B\]

The reverse transformation is \(a_{\text{rev}}=Aa\), \(b_{\text{rev}}=(b-B)/A\), \(d_{\text{rev},k}=(d_k-B)/A\).

Stocking-Lord Methods

Stocking and Lord (1983) minimize the weighted integral of squared TCC differences over \(Q\) GH nodes. Let \(\tau_1(\theta)\) be the bank TCC and \(\tau_2^*(A,B;\theta)\) the linked new-form TCC.

SL.uni — Unidirectional

\[F_{\text{uni}}(A,B)=\sum_k w_k\bigl[\tau_1(\theta_k)-\tau_2^*(A,B;\theta_k)\bigr]^2\]

SL.sym — Symmetric

\[F_{\text{sym}}(A,B)=F_{\text{uni}}(A,B)+\sum_k w_k\bigl[\tau_1^*(\theta_k)-\tau_2(\theta_k)\bigr]^2\]where \(\tau_1^*\) is the bank TCC reverse-transformed to the form scale.

Haebara Methods

Haebara (1980) replaces TCC comparison with item-by-item ICC comparison. For binary items \(P_{jk}\) is the 3PL ICC; for polytomous items it is \(E[X_j\mid\theta_k]\).

HB.uni — Unidirectional

\[F_{\text{HB,uni}}(A,B)=\sum_k w_k\sum_j\bigl[P_{jk}^{\text{bank}}-P_{jk}^{*(A,B)}\bigr]^2\]

HB.sym — Symmetric

Adds the same sum in the reverse direction — bank ICC reverse-transformed to the form scale vs. the original form ICC.

All four characteristic-curve objectives are minimized by two-dimensional Nelder-Mead simplex (starting simplex side 0.5, up to 2,000 iterations, tolerance \(10^{-10}\)).

Mean/Sigma and Mean/Mean

Mean/Sigma (MS)

\[A=\frac{s_b^{\text{bank}}}{s_b^{\text{form}}},\qquad B=\bar{b}^{\text{bank}}-A\bar{b}^{\text{form}}\]

The location pool: \(b\)-parameters for binary items; all \(d\)-parameters pooled as a flat vector for GPCM/PCM (or averaged per item when the single-location option is selected); GRM boundary parameters; item location \(\delta_i\) only for RSM items.

Mean/Mean (MM)

\[A=\frac{\bar{a}^{\text{bank}}}{\bar{a}^{\text{form}}},\qquad B=\bar{b}^{\text{bank}}-A\bar{b}^{\text{form}}\]

dropD² Iterative Anchor Purification

After the initial equating, a per-item discrepancy statistic is computed:

\[D^2_j=\sum_{k=1}^{Q}w_k\bigl[P_j^{\text{bank}}(\theta_k)-P_j^{\text{linked}}(\theta_k)\bigr]^2\]

For polytomous items \(P_j\) is the expected score \(E[X_j\mid\theta_k]\). Values are standardized to

\[\tilde{D}^2_j=\frac{D^2_j-\bar{D}^2}{s_{D^2}}\]

The item with the largest \(|\tilde{D}^2_j|\) exceeding the cutoff (default 2.5) is removed; equating is re-run on the reduced anchor set. The process repeats until no item exceeds the cutoff or fewer than two anchors remain.

CDM Tools

Cognitive Diagnostic Models

CDMs classify examinees into mastery profiles over \(K\) binary latent attributes \(\bm{\alpha}_i=(\alpha_{i1},\ldots,\alpha_{iK})\in\{0,1\}^K\). The Q-matrix \(\bm{Q}\in\{0,1\}^{J\times K}\) specifies which attributes are required by each item.

DINA Model

The Deterministic Inputs, Noisy And-gate (DINA) model sets the ideal response indicator as \(\xi_{ij}=\prod_{k=1}^{K}\alpha_{ik}^{q_{jk}}\). A respondent achieves \(\xi_{ij}=1\) only if they have mastered all required attributes. The response probability accommodates slipping (\(s_j\)) and guessing (\(g_j\)):

\[P(X_{ij}=1\mid\bm{\alpha}_i)=(1-s_j)^{\xi_{ij}}\,g_j^{1-\xi_{ij}}\]

DINO Model

The Deterministic Inputs, Noisy Or-gate (DINO) model requires mastery of at least one required attribute:

\[\omega_{ij}=1-\prod_{k=1}^{K}(1-\alpha_{ik})^{q_{jk}},\qquad P(X_{ij}=1\mid\bm{\alpha}_i)=(1-s_j)^{\omega_{ij}}\,g_j^{1-\omega_{ij}}\]

EM Estimation of Item Parameters

Item parameters (\(s_j\), \(g_j\)) and class probabilities (\(\pi_c\)) are estimated jointly by marginal maximum likelihood over the \(L=2^K\) latent classes \(\bm{\alpha}^{(c)}\) with prior class probabilities \(\pi_c\ge0\), \(\sum_c\pi_c=1\). The algorithm is initialized with equal class probabilities (\(\pi_c=1/L\)) and uniform starting values for \(g_j\) and \(s_j\). The population distribution is re-estimated at each step, so the posteriors are empirical Bayes estimates.

E-Step

Let \(\Pr(\bm{\alpha}_l\mid\bm{x}_i)\) be the posterior probability that examinee \(i\) has skill profile \(\bm{\alpha}_l\). The expected number of examinees with profile \(\bm{\alpha}_l\) is

\[I_l=\sum_{i=1}^{N}\Pr(\bm{\alpha}_l\mid\bm{x}_i)\]

and the expected number of those examinees who answered item \(j\) correctly is

\[R_{jl}=\sum_{i=1}^{N}\Pr(\bm{\alpha}_l\mid\bm{x}_i)\,x_{ij}\]

Let \(I^0_{jl}\) be the expected count of examinees who lack at least one attribute required for item \(j\) (i.e., \(\xi_{ijl}=0\)) and \(R^0_{jl}\) be the subset of those who answered correctly. Analogously, \(I^1_{jl}\) and \(R^1_{jl}\) correspond to examinees who possess all required attributes (\(\xi_{ijl}=1\)).

M-Step

With these expected counts, the guessing and slip parameters are updated in closed form:

\[\hat{g}_j=\frac{R^0_{jl}}{I^0_{jl}},\qquad\hat{s}_j=\frac{I^1_{jl}-R^1_{jl}}{I^1_{jl}}\]

Class probabilities are updated as \(\hat\pi_c=N^{-1}\sum_i\tau_{ic}\). The EM loop alternates between E-step and M-step until the change in parameters across iterations falls below the convergence tolerance.

Skill Profile Estimation

Once item parameters are calibrated, each examinee's skill profile is estimated from the likelihood of their response pattern \(\bm{x}_i\) given each of the \(L=2^K\) possible profiles:

\[\mathcal{L}(\bm{\alpha}_l;\bm{x}_i)=\prod_{j=1}^{J}\Pr(x_{ij}=1\mid\bm{\alpha}_l,g_j,s_j)^{x_{ij}}\left[1-\Pr(x_{ij}=1\mid\bm{\alpha}_l,g_j,s_j)\right]^{1-x_{ij}}\]

Three estimators are available.

Maximum Likelihood (MLE)

The MLE selects the profile that maximizes the likelihood across all \(L\) possible classifications:

\[\hat{\bm{\alpha}}^{\text{MLE}}_i=\underset{\bm{\alpha}_l}{\operatorname{argmax}}\;\mathcal{L}(\bm{\alpha}_l;\bm{x}_i)\]

Because skill profiles are discrete, this is computed by exhaustive evaluation over all \(L\) profiles. The MLE may be multimodal — two or more profiles can share the same maximum likelihood. PACER flags such cases and reports all profiles achieving the maximum.

Maximum A Posteriori (MAP)

The MAP incorporates a prior distribution \(\Pr(\bm{\alpha}_l)\) over skill profiles (supplied by the user), maximizing the posterior:

\[\hat{\bm{\alpha}}^{\text{MAP}}_i=\underset{\bm{\alpha}_l}{\operatorname{argmax}}\;\mathcal{L}(\bm{\alpha}_l;\bm{x}_i)\,\Pr(\bm{\alpha}_l)\]

where \(\Pr(\bm{\alpha}_l)\) is a discrete prior such that \(\sum_l\Pr(\bm{\alpha}_l)=1\). Like the MLE, the MAP can be multimodal.

Expected A Posteriori (EAP)

The EAP uses the full posterior distribution over profiles. The posterior for profile \(\bm{\alpha}_l\) given \(\bm{x}_i\) is

\[\Pr(\bm{\alpha}_l\mid\bm{x}_i)=\frac{\mathcal{L}(\bm{\alpha}_l;\bm{x}_i)\,\Pr(\bm{\alpha}_l)}{\displaystyle\sum_{l'=1}^{L}\mathcal{L}(\bm{\alpha}_{l'};\bm{x}_i)\,\Pr(\bm{\alpha}_{l'})}\]

Let \(\bm{\tau}_i = [\Pr(\bm{\alpha}_1\mid\bm{x}_i),\ldots,\Pr(\bm{\alpha}_L\mid\bm{x}_i)]\) be the \(1\times L\) posterior vector and \(\bm{P}\) the \(L\times K\) binary matrix of all possible skill profiles. The EAP estimate is

\[\hat{\bm{\alpha}}^{\text{EAP}}_i=\bm{\tau}_i\,\bm{P}\]

This is a \(1\times K\) vector where each element is the marginal posterior probability of mastery for the corresponding attribute — equivalent to \(\sum_{l:\,\alpha_k^{(l)}=1}\Pr(\bm{\alpha}_l\mid\bm{x}_i)\). Unlike the MLE and MAP, the EAP yields continuous mastery probabilities between 0 and 1 and is always unique.

Examinee Skill Profile Estimation

Once item parameters are calibrated, each examinee's mastery profile is characterized by their full posterior distribution over the \(2^K\) latent classes. Three quantities are reported.

Modal Profile (MAP Classification)

The modal-class estimate assigns the examinee to the latent class with the highest posterior probability:

\[\hat{c}_i = \arg\max_{c\in\{1,\ldots,2^K\}}\;\tau_{ic}\]

The estimated attribute profile is \(\hat{\bm{\alpha}}_i = \bm{\alpha}^{(\hat{c}_i)}\), a binary vector of length \(K\) indicating mastery (1) or non-mastery (0) of each attribute. This is the primary classification output.

Marginal Attribute Mastery Probabilities

The marginal probability of mastering attribute \(k\) is obtained by summing the posterior over all latent classes in which attribute \(k\) is mastered:

\[P(\alpha_{ik}=1\mid\bm{x}_i)=\sum_{c:\,\alpha_k^{(c)}=1}\tau_{ic}\]

These \(K\) probabilities provide a continuous measure of mastery confidence. A value near 1 indicates strong evidence of mastery; near 0 indicates strong evidence of non-mastery; intermediate values indicate classification uncertainty for that attribute.

Classification Certainty

The posterior entropy of the class distribution for examinee \(i\) quantifies overall classification certainty:

\[H_i = -\sum_{c=1}^{2^K}\tau_{ic}\ln\tau_{ic}\]

Low entropy indicates that most posterior weight is concentrated on a single class (high certainty). High entropy indicates spread across multiple classes, flagging examinees whose profiles should be interpreted cautiously. The minimum possible entropy is 0 (perfect certainty); the maximum is \(\ln(2^K)=K\ln 2\) (uniform uncertainty).

Item Analysis — DIF

Differential Item Functioning

DIF occurs when respondents from different groups who have the same ability have systematically different probabilities of answering an item correctly. PACER implements the Mantel-Haenszel procedure (Holland & Thayer, 1988) for binary items and the Liu-Agresti (1996) generalization for polytomous items. All procedures first stratify respondents into \(K\) equal-frequency bins (default \(K=10\)) using a matching variable — either total score or IRT theta. Binning uses linear interpolation on the empirical quantile function so that equal proportions of respondents fall in each stratum; the lowest observed score is included in the first bin.

Mantel-Haenszel Procedure (Binary Items)

Within each stratum \(s\), a \(2\times2\) contingency table is formed:

\[\begin{array}{c|cc|c}&\text{Correct}&\text{Incorrect}&\text{Total}\\\hline\text{Reference}&A_s&B_s&n_{R,s}\\\text{Focal}&C_s&D_s&n_{F,s}\\\hline&n_{1,s}&n_{2,s}&n_s\end{array}\]

The MH log-odds ratio pools across strata:

\[\hat\alpha_{\text{MH}}=\frac{\displaystyle\sum_s A_s D_s/n_s}{\displaystyle\sum_s B_s C_s/n_s}\]

The ETS D-DIF statistic:

\[\text{MH D-DIF}=-2.35\ln\hat\alpha_{\text{MH}}\]

Negative D-DIF means the reference group has higher odds of answering correctly at the same ability — the item favors the reference group. Note: some published implementations invert this sign; PACER applies the convention stated above.

Chi-square with Yates continuity correction (1 df):

\[\chi^2_{\text{MH}}=\frac{\bigl(|\sum_s A_s-\sum_s E(A_s)|-0.5\bigr)^2}{\sum_s\mathrm{Var}(A_s)},\quad E(A_s)=\frac{n_{R,s}\,n_{1,s}}{n_s},\quad\mathrm{Var}(A_s)=\frac{n_{R,s}\,n_{F,s}\,n_{1,s}\,n_{2,s}}{n_s^2(n_s-1)}\]

The p-value is computed from the regularized incomplete gamma function via series and continued-fraction expansions (accurate to \(\approx10^{-10}\)).

Generalized MH Procedure (Polytomous Items)

For polytomous items with score values \(v_1<\cdots

\[T=\sum_k\sum_v\frac{N_{R,k}(v)\,n_{F,k}-N_{F,k}(v)\,n_{R,k}}{n_k}\]
\[\mathrm{Var}(T)=\sum_k\frac{n_{R,k}\,n_{F,k}}{n_k^2(n_k-1)}\Bigl[n_k\!\sum_v n_{\cdot k}(v)\bigl(N_{\cdot k}(v)\bigr)^2-\Bigl(\sum_v n_{\cdot k}(v)\,N_{\cdot k}(v)\Bigr)^2\Bigr]\]

The test statistic \(T^2/\mathrm{Var}(T)\) is chi-squared with one degree of freedom.

Standardized Mean Difference (SMD)

\[\text{SMD}=\bar\mu_{\text{focal}}-\bar\mu_{\text{reference}}=\sum_k\frac{n_{F,k}}{N_F}\!\left(\bar X_{F,k}-\bar X_{R,k}\right)\]

Positive SMD means the focal group scores higher (item favors focal); negative means the item favors the reference group. Some published implementations use \(|\text{SMD}|\) for the direction determination; PACER uses the raw signed value as defined above. The standardized SMD is

\[\tilde\delta=\frac{|\text{SMD}|}{s_{\text{pooled}}},\qquad s_{\text{pooled}}=\sqrt{\frac{\sum_k(n_k-1)s_k^2}{N-K}}\]

ETS DIF Classification

ClassBinary (MH D-DIF)Polytomous (\(\tilde\delta\))Meaning
A\(p>.05\) or \(|\text{MH}|<1.0\)\(p>.05\) or \(\tilde\delta\le0.17\)Negligible DIF; item acceptable.
BBetween A and CBetween A and CModerate DIF; item warrants review.
C\(p<.05\) and \(|\text{MH}|\ge1.5\)\(p<.05\) and \(\tilde\delta\ge0.25\)Large DIF; item should likely be removed or revised.
Implementation NotesPACER uses signed MH D-DIF (negative values favor the reference group), the raw signed SMD for direction determination, and infers item score coding from actual observed values — so items coded 1–4 or any other range are handled correctly without recoding.
Numerical Methods — Quadrature

Gauss-Hermite Quadrature

All integrals over the latent-trait distribution are approximated by Gauss-Hermite quadrature with \(Q\) nodes (default \(Q=21\)):

\[\int_{-\infty}^{\infty}f(\theta)\,\phi(\theta)\,d\theta\approx\sum_{k=1}^{Q}w_k\,f(z_k)\]

where \(\phi\) is the standard normal density. Nodes \(\{z_k\}\) are roots of the probabilists' Hermite polynomial \(\text{He}_Q(x)\); weights satisfy \(w_k=v_{k,1}^2\sqrt{2\pi}\) where \(v_{k,1}\) is the first eigenvector component. For common values of \(Q\in\{11,15,21,31,41,51\}\), PACER uses a pre-computed node-and-weight lookup table for speed. For arbitrary \(Q\), nodes and weights are computed on demand via the symmetric QL algorithm applied to the tridiagonal Jacobi matrix with off-diagonal elements \(\beta_i=\sqrt{i}\). For multigroup calibration, nodes are shifted and scaled: \(\theta_{gk}=\mu_g+\sigma_g z_k\). Weights are invariant because the Jacobian and density ratio cancel exactly.

Numerical Methods — Optimization

Optimization Routines

L-BFGS-B (Item Calibration)

Item parameters are estimated by L-BFGS-B. The two-loop recursion computes the quasi-Newton direction \(\bm{p}_t=-H_t^{-1}\nabla f_t\) from \(m=10\) curvature pairs without forming \(H_t^{-1}\) explicitly. Box constraints are enforced by projecting onto the active constraint set. Step length satisfies the strong Wolfe conditions via backtracking. Convergence: \(|\Delta f|/(0.1+|f|)<\tau\) (default \(\tau=10^{-8}\)).

Gradient Descent with Backtracking (Theta Estimation)

MLE and MAP theta estimation use gradient descent with backtracking. At each step: \(\theta_{\text{new}}=\theta-\alpha[-\ell'(\theta)]\) where \(\ell'(\theta)\approx[\ell(\theta+h)-\ell(\theta-h)]/(2h)\), \(h=10^{-6}\). Step size \(\alpha\) is increased by 1.2 on progress and halved on rejection; convergence when \(\alpha<10^{-10}\).

Brent's Method (TCC Inversion)

TCC scoring inverts \(\tau(\theta)=X_{\text{obs}}\) on \([-8,8]\). Brent's method combines bisection (guaranteed progress), secant (linear interpolation), and inverse quadratic interpolation (parabolic fit through three points). The method falls back to bisection whenever the other two strategies would not make sufficient progress, giving guaranteed \(O(\log\varepsilon^{-1})\) worst-case convergence. Tolerance: \(10^{-10}\).

Nelder-Mead Simplex (Equating Objectives)

The two-parameter SL and HB equating objectives are minimized over \((A,B)\) by Nelder-Mead simplex. The simplex of three vertices is updated by reflection, expansion, contraction, or shrinkage at each iteration. Initial simplex side: 0.5; maximum iterations: 2,000; convergence tolerance on objective range: \(10^{-10}\).

Documentation

References

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.

Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory. Springer.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

Doran, H. (2023). A collection of numerical recipes useful for building scalable psychometric applications. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/10769986221116905

Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity. Lawrence Erlbaum.

Liu, I. M., & Agresti, A. (1996). Mantel-Haenszel-type inference for cumulative odds ratios with a stratified ordinal response. Biometrics, 52, 1223–1234.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Erlbaum.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.