Statistics 2024 Paper I 50 marks Compulsory Derive

Q5

(a) How will you justify the usage of the principle of least squares in estimating the parameters of a linear regression model? With usual notations, for the regression model y = Xβ + ε, show that the least square estimator of β is β̂ = (X'X)⁻¹X'y (10 marks) (b) (i) If X̃ is distributed as N₃(μ̃, Σ), find the distribution of [X₁ - X₂; X₂ - X₃]. (5 marks) (ii) If X₁, X₂ and X₃ are three variables, obtain the expression for the partial correlation coefficient between X₁ and X₂ eliminating the effect of X₃, ρ₁₂·₃, in terms of simple correlation coefficients. (5 marks) (c) X₁ and X₂ are independent data sets of order (n₁ × p) and (n₂ × p) respectively from Nₚ(μ̃, Σ). Show that (n₁n₂D²)/n is distributed as T²(p, n-2), where n = n₁ + n₂, and T² and D² represent the Hotelling's T² and Mahalanobis D² respectively. (10 marks) (d) For the population U = {a, b, c, d, e}, consider the following sampling design: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6. Calculate the first-order and second-order inclusion probabilities. Hence show that it is a matter of a stratified design. Identify the strata with their units. (10 marks) (e) Let the incidence matrix of a design be N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]]. Show that— (i) the design is connected balanced; (ii) its efficiency factor is E = 8/9. (6+4=10 marks)

हिंदी में प्रश्न पढ़ें

(a) आप एक रैखिक समाश्रयन निदर्श के प्राचलों को आकलित करने में न्यूनतम वर्गों के सिद्धान्त के उपयोग को कैसे उचित ठहरायेंगे? सामान्य संकेतनों के साथ दर्शाइए कि समाश्रयन निदर्श y = Xβ + ε के लिए β का न्यूनतम वर्ग आकलक β̂ = (X'X)⁻¹X'y है। (10 अंक) (b) (i) यदि X̃, N₃(μ̃, Σ) के रूप में बंटित है, तो [X₁ - X₂; X₂ - X₃] का बंटन प्राप्त कीजिए। (5 अंक) (ii) यदि X₁, X₂ और X₃ तीन चर हैं, तो X₃ के प्रभाव को समाप्त करते हुए X₁ और X₂ के बीच आंशिक सहसंबंध गुणांक, ρ₁₂·₃, के लिए व्यंजक, सरल सहसंबंध गुणांकों के रूप में, प्राप्त कीजिए। (5 अंक) (c) X₁ और X₂, Nₚ(μ̃, Σ) से क्रमशः (n₁ × p) और (n₂ × p) कोटि के स्वतंत्र आंकड़ों के समुच्चय हैं। दर्शाइए कि (n₁n₂D²)/n का बंटन T²(p, n-2) के रूप में हुआ है, जहाँ n = n₁ + n₂ है और T² तथा D² क्रमशः हॉटेलिंग T² तथा महालनोबिस D² को निरूपित करते हैं। (10 अंक) (d) समष्टि U = {a, b, c, d, e} के लिए निम्नलिखित प्रतिचयन अभिकल्पना पर विचार कीजिए: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6। प्रथम कोटि तथा द्वितीय कोटि की अंतर्वेश प्रायिकताओं की गणना कीजिए। अतः दर्शाइए कि यह एक स्तरित अभिकल्पना का मामला है। स्तरों को उनकी इकाइयों के साथ चिह्नित कीजिए। (10 अंक) (e) मान लीजिए कि एक अभिकल्पना का आपतन आव्यूह N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]] है। दर्शाइए कि— (i) अभिकल्पना संबद्ध संतुलित है; (ii) इसकी दक्षता कारक E = 8/9 है। (6+4=10 अंक)

Directive word: Derive

This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.

See our UPSC directive words guide for a full breakdown of how to respond to each command word.

How this answer will be evaluated

Approach

Derive requires rigorous mathematical proof and step-by-step derivation across all sub-parts. Allocate approximately 20% time to part (a) on least squares justification and derivation, 20% to part (b) on multivariate normal transformations and partial correlation, 20% to part (c) on Hotelling's T² distribution, 20% to part (d) on inclusion probabilities and stratified design identification, and 20% to part (e) on connectedness, balance and efficiency factor. Begin with clear statement of assumptions, proceed through systematic derivations with matrix algebra where needed, and conclude with explicit verification of claimed properties.

Key points expected

  • Part (a): Justify least squares via Gauss-Markov theorem (BLUE property under Gauss-Markov assumptions) or via maximum likelihood under normality; derive β̂ = (X'X)⁻¹X'y by minimizing S(β) = ε'ε = (y-Xβ)'(y-Xβ) using matrix differentiation
  • Part (b)(i): Apply linear transformation property of multivariate normal; define transformation matrix A = [[1, -1, 0], [0, 1, -1]]; derive distribution as N₂(Aμ̃, AΣA') with explicit mean and covariance structure
  • Part (b)(ii): Derive ρ₁₂·₃ = (ρ₁₂ - ρ₁₃ρ₂₃)/√[(1-ρ₁₃²)(1-ρ₂₃²)] using residual correlation formula or partial covariance matrix inversion
  • Part (c): Define D² = (x̄₁ - x̄₂)'S⁻¹(x̄₁ - x̄₂); use independence of sample means and pooled covariance; apply Wishart and Hotelling's T² construction to show (n₁n₂/n)D² ~ T²(p, n-2)
  • Part (d): Calculate πᵢ = Σ_{s∋i} P(s) for first-order inclusion probabilities; calculate πᵢⱼ = Σ_{s∋i,j} P(s) for second-order; verify πᵢⱼ = πᵢπⱼ/πₕ for stratified structure; identify strata as {a,b}, {c}, {d,e} or equivalent based on inclusion pattern analysis
  • Part (e)(i): Verify connectedness via incidence matrix rank or graph connectivity; verify balance via constant λ = Σⱼ nᵢⱼnᵢ'ⱼ for all i ≠ i' pairs
  • Part (e)(ii): Calculate efficiency factor E = (v-1)/[r(k-1)] × (harmonic mean of eigenvalues) or via C-matrix eigenvalues; show E = 8/9 explicitly

Evaluation rubric

DimensionWeightMax marksExcellentAveragePoor
Setup correctness20%10Correctly states all model assumptions for (a) including E(ε)=0, Var(ε)=σ²I, rank(X)=p; properly defines multivariate normal parameters in (b); correctly specifies independence conditions and dimensions in (c); accurately enumerates all samples in (d); correctly interprets incidence matrix structure in (e)States most assumptions but misses some (e.g., omits rank condition in (a) or independence in (c)); minor errors in parameter dimensions; incomplete sample enumeration in (d)Missing or incorrect assumptions; wrong model specification; fundamental misunderstanding of distribution parameters or design structure
Method choice20%10Selects optimal derivation paths: matrix calculus for (a), linear transformation theorem for (b)(i), residual/regression approach for (b)(ii), Wishart-Hotelling connection for (c), systematic inclusion probability calculation for (d), C-matrix analysis for (e); justifies method choices brieflyCorrect methods chosen but without justification; occasionally suboptimal approach (e.g., direct integration instead of transformation theorem); some parts use valid but lengthy methodsIncorrect methods selected (e.g., univariate approach for multivariate problems); inappropriate use of formulas; confused or missing methodology
Computation accuracy20%10Flawless matrix algebra including correct handling of (X'X)⁻¹ existence; accurate determinant and inverse calculations; correct eigenvalue computation for efficiency factor; precise arithmetic for all inclusion probabilities (πₐ=3/6, π_b=4/6, etc.); no calculation errors in T² derivationMinor arithmetic slips (e.g., sign errors in covariance transformations, off-by-one in degrees of freedom); correct approach but some intermediate values wrong; final answers affected but method clearMajor computational errors (e.g., incorrect matrix multiplication, wrong inverse formula, miscalculated probabilities summing to ≠1); missing critical steps; illegible or incoherent calculations
Interpretation20%10Explains why least squares is BLUE in (a); interprets transformed variables in (b)(i) as contrasts; explains partial correlation as correlation of residuals; interprets T² as scaled Mahalanobis distance; clearly identifies stratification structure with reasoning in (d); explains connectedness and balance implications in (e)Some interpretation present but superficial; misses connection between algebraic result and statistical meaning; limited explanation of design propertiesPurely mechanical derivations with no interpretation; fails to explain what results mean statistically; missing identification of strata or design properties
Final answer & units20%10All final answers explicitly stated: β̂ formula in (a), complete N₂ distribution parameters in (b)(i), ρ₁₂·₃ formula in (b)(ii), T² distribution with correct parameters in (c), complete πᵢ and πᵢⱼ tables with strata identification in (d), verified connected balanced status and E=8/9 in (e); proper matrix/vector notation throughoutMost answers stated but some incomplete (e.g., missing covariance matrix in (b)(i), partial identification of strata); notation occasionally inconsistentMissing final answers; incomplete or wrong distribution specifications; no summary of inclusion probabilities; efficiency factor not calculated or wrong

Practice this exact question

Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.

Evaluate my answer →

More from Statistics 2024 Paper I