(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks)

(b) Describe with examples the te

Question

(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks)

(b) Describe with examples the technique of two-stage sampling. Obtain the variance of the sample mean under two-stage sampling without replacement. Hence, deduce the variance of the sample mean under : (i) Stratified random sampling, and (ii) Cluster sampling (20 marks)

(c) (i) If X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, where Y₁, Y₂ and Y₃ are uncorrelated random variables and each of which has zero mean and unit standard deviation, find the multiple correlation coefficient between X₃ and X₁, X₂.

(ii) Let X be a 3-dimensional random vector with dispersion matrix Σ = ⎛9  3  3⎞
    ⎜3  9  3⎟
    ⎝3  3  9⎠. Determine the first principal component and the proportion of the total variability that it explains. (7+8=15 marks)

UPSC Answer Check · Accepted Answer

Derive the required results systematically across all sub-parts. For (a), establish the linear model and count constraints; for (b), describe two-stage sampling with Indian census/NSSO examples, then derive variance formula and deduce special cases; for (c)(i), compute multiple correlation using matrix algebra; for (c)(ii), find eigenvalues and eigenvectors for PCA. Allocate approximately 30% time to (a), 40% to (b), 15% each to (c)(i) and (c)(ii), ensuring all derivations show complete steps with proper justification.
- For (a): Define the two-way ANOVA model with one observation per cell, identify total contrasts (rk-1), subtract treatment contrasts (k-1 for factor A, r-1 for factor B), and show error contrasts = (r-1)(k-1) using degrees of freedom partition
- For (b): Describe two-stage sampling with NSSO household survey or agricultural census example; derive variance of sample mean under SRSWOR at both stages; deduce stratified random sampling variance by letting second-stage sampling fraction tend to 1
- For (b) continued: Deduce cluster sampling variance by letting first-stage sampling fraction tend to 1, showing how the general formula collapses to known special cases
- For (c)(i): Compute Var(X₁), Var(X₂), Cov(X₁,X₂), Cov(X₃,X₁), Cov(X₃,X₂); set up multiple regression of X₃ on X₁,X₂; calculate R² and multiple correlation coefficient R₃.₁₂
- For (c)(ii): Find eigenvalues of Σ (6, 6, 12), identify first principal component as (1/√3)(1,1,1)′ corresponding to λ=12, and compute proportion of variability as 12/24 = 0.5 or 50%

Dimension	Weight	Max marks	Excellent	Average	Poor
Setup correctness	20%	10	Correctly specifies the linear model for (a) with proper identification of constraints; for (b) clearly defines population structure, sampling units and stages; for (c) correctly states distributional assumptions and matrix properties	Basic model stated but missing some constraints or assumptions; sampling description present but population structure unclear; some distributional properties stated but incomplete	Incorrect model specification or missing essential setup; confuses sampling stages or units; wrong assumptions about Y variables or dispersion matrix properties
Method choice	20%	10	Uses Cochran's theorem for (a); applies appropriate variance decomposition for two-stage sampling with correct conditional expectation approach; uses proper matrix inversion/eigenvalue methods for (c)	Correct general approach but suboptimal methods used; some shortcuts taken that skip important steps; eigenvalue calculation present but method not clearly justified	Wrong methodological approach (e.g., ignores finite population correction, uses independence incorrectly); attempts direct calculation without using matrix properties
Computation accuracy	20%	10	Accurate arithmetic throughout: correct degrees of freedom count (a); precise variance formula derivation with correct coefficients (b); exact multiple correlation value √(2/3) and eigenvalues 6,6,12 with correct eigenvector normalization (c)	Minor computational errors (sign errors, arithmetic slips) that don't fundamentally alter structure; correct final answers but some intermediate steps wrong	Major computational errors affecting conclusions; wrong variance formula; incorrect eigenvalues or failure to normalize eigenvectors; wrong correlation coefficient
Interpretation	20%	10	Explains why error contrasts equal interaction degrees of freedom (a); interprets variance components and explains efficiency comparisons between sampling schemes (b); interprets multiple correlation as strength of linear prediction and PCA as variance maximization (c)	Some interpretation present but superficial; mentions what results mean without explaining statistical significance; limited comparison of sampling methods	No interpretation provided; purely mechanical derivation without explaining what results signify; fails to connect special cases to general formula in (b)
Final answer & units	20%	10	All final answers clearly boxed/stated: (r-1)(k-1) for (a); complete variance formula with deductions for (b); R₃.₁₂ = √(2/3) for (c)(i); first PC as (X₁+X₂+X₃)/√3 with 50% explained variance for (c)(ii)	Final answers present but not clearly distinguished; some parts missing final numerical values; deductions in (b) incomplete	Missing final answers; answers buried in text without clear statement; wrong final answers despite correct method; no explicit deduction of special cases in (b)

Q6

Directive word: Derive

How this answer will be evaluated

Approach

Key points expected

Evaluation rubric

Practice this exact question

More from Statistics 2022 Paper I