(a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and Xⱼ. For a data, it was found r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72.

(i) Check the consistency of the above data.

Question

(a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and Xⱼ. For a data, it was found r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72.

(i) Check the consistency of the above data.

(ii) If r₁₃ is unknown, obtain the limits within which r₁₃ lies given the above values for r₁₂ and r₂₃. 20

(b) In cluster sampling with equal size clusters, obtain the unbiased estimate of population mean. Also obtain its sampling variance as

V(ȳ̄) = (1-f)(NM-1)S²{1+(M-1)ρcl}/[M²(N-1)n],

where notations have their usual meanings. 15

(c) Let Z₃ₓ₁ = (X₁ₓ₁, Y₂ₓ₁)ᵀ ~ N₃((0, 0, 1)ᵀ, [[1, 2, 1], [2, 5, 2], [1, 2, 2]]).

Show that conditional on X₁ₓ₁, the two components of Y₂ₓ₁ are independent but marginally they are not.
15

UPSC Answer Check · Accepted Answer

Derive the required mathematical results systematically across all three parts. For part (a)(i)-(ii), apply correlation matrix properties and determinant conditions first, then use partial correlation bounds. For part (b), build the cluster sampling theory from first principles with ANOVA decomposition. For part (c), partition the multivariate normal distribution and derive conditional distributions. Allocate approximately 40% time to part (a) given its 20 marks, 30% each to parts (b) and (c). Structure as: direct derivations without lengthy introductions, clear theorem statements, step-by-step proofs, and boxed final expressions.
- For (a)(i): Verify positive semi-definiteness of correlation matrix by checking det(R) ≥ 0 or all principal minors non-negative; compute 1 - r₁₂² - r₂₃² - r₁₃² + 2r₁₂r₂₃r₁₃ ≥ 0
- For (a)(ii): Derive bounds using r₁₃ = r₁₂r₂₃ ± √[(1-r₁₂²)(1-r₂₃²)]; obtain numerical interval [0.077, 0.963] or equivalent
- For (b): Define cluster sampling estimator ȳ̄ = (1/nM)ΣᵢΣⱼ yᵢⱼ; prove unbiasedness E(ȳ̄) = Ȳ; derive variance via between-cluster and within-cluster SS decomposition
- For (b): Express variance in ICC form using ρcl = (S_b² - S_w²)/(S_b² + (M-1)S_w²) or equivalent definition; manipulate to reach target formula
- For (c): Partition covariance matrix Σ = [[Σ_XX, Σ_XY], [Σ_YX, Σ_YY]]; derive conditional distribution Y|X ~ N(μ_Y + Σ_YXΣ_XX⁻¹(X-μ_X), Σ_YY - Σ_YXΣ_XX⁻¹Σ_XY)
- For (c): Show conditional covariance matrix is diagonal (implying independence given X₁) while marginal covariance Σ_YY is not diagonal

Dimension	Weight	Max marks	Excellent	Average	Poor
Setup correctness	20%	10	Correctly identifies and states: for (a) the correlation matrix R and PSD condition, for (b) the cluster sampling framework with N clusters of size M and f = n/N, for (c) the partitioned MVN structure with proper matrix dimensions; all notation matches standard statistical literature	Basic setup present but with minor notational inconsistencies (e.g., confusing N and n, or misidentifying matrix blocks in part c); misses explicit statement of some assumptions	Incorrect setup: wrong matrix dimensions, confuses population and sample quantities, or fundamentally misunderstands the sampling design or distribution structure
Method choice	20%	10	Selects optimal methods: for (a) determinant test for consistency and partial correlation for bounds; for (b) ANOVA decomposition with ICC formulation; for (c) standard MVN conditional distribution formula; justifies why alternatives are inferior	Uses correct but suboptimal methods (e.g., numerical verification instead of analytical bounds in a-ii); or correct methods without justification; misses elegant shortcuts	Wrong methodological approach: attempts eigenvalue tests unnecessarily, uses simple random sampling formulas for cluster sampling, or attempts direct integration for conditional distribution
Computation accuracy	20%	10	Flawless calculations: for (a) correct determinant value and precise bounds; for (b) correct algebraic manipulation to target variance formula with proper handling of finite population correction; for (c) accurate matrix inversions and multiplications yielding diagonal conditional covariance	Minor arithmetic slips (e.g., sign errors in expansion, coefficient mistakes in variance formula, or off-by-one in matrix indices) that don't fundamentally derail the solution	Major computational errors: incorrect determinant calculation, wrong variance expression, non-diagonal conditional covariance claimed as diagonal, or algebraic impossibilities
Interpretation	20%	10	Interprets (a) consistency as feasibility of correlation structure and bounds as attainable limits; explains (b) variance inflation via ICC and design effect; clarifies (c) paradox of conditional vs marginal independence with insight into suppressor effects or similar phenomena	States results without deeper interpretation; or gives generic statements ('this shows correlation matters') without connecting to the specific statistical phenomenon demonstrated	Misinterprets results: claims inconsistent data is valid, misunderstands why variance increases with ρcl, or cannot explain why conditional independence doesn't imply marginal independence
Final answer & units	20%	10	All final answers boxed/highlighted: (a)(i) clear consistency verdict with numerical verification, (a)(ii) precise interval, (b) fully derived variance formula matching question format, (c) explicit conditional distribution with covariance matrix showing independence; proper dimensional analysis throughout	Final answers present but poorly formatted; or missing explicit statement of conditional distribution parameters in (c); or variance formula correct but not in required form	Missing final answers, wrong conclusions (inconsistent claimed consistent), incomplete derivations stopping before target expressions, or dimensional inconsistencies (variance with wrong scaling)

Q6

Directive word: Derive

How this answer will be evaluated

Approach

Key points expected

Evaluation rubric

Practice this exact question

More from Statistics 2021 Paper I