Q5
(a) How will you justify the usage of the principle of least squares in estimating the parameters of a linear regression model? With usual notations, for the regression model y = Xβ + ε, show that the least square estimator of β is β̂ = (X'X)⁻¹X'y (10 marks) (b) (i) If X̃ is distributed as N₃(μ̃, Σ), find the distribution of [X₁ - X₂; X₂ - X₃]. (5 marks) (ii) If X₁, X₂ and X₃ are three variables, obtain the expression for the partial correlation coefficient between X₁ and X₂ eliminating the effect of X₃, ρ₁₂·₃, in terms of simple correlation coefficients. (5 marks) (c) X₁ and X₂ are independent data sets of order (n₁ × p) and (n₂ × p) respectively from Nₚ(μ̃, Σ). Show that (n₁n₂D²)/n is distributed as T²(p, n-2), where n = n₁ + n₂, and T² and D² represent the Hotelling's T² and Mahalanobis D² respectively. (10 marks) (d) For the population U = {a, b, c, d, e}, consider the following sampling design: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6. Calculate the first-order and second-order inclusion probabilities. Hence show that it is a matter of a stratified design. Identify the strata with their units. (10 marks) (e) Let the incidence matrix of a design be N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]]. Show that— (i) the design is connected balanced; (ii) its efficiency factor is E = 8/9. (6+4=10 marks)
हिंदी में प्रश्न पढ़ें
(a) आप एक रैखिक समाश्रयन निदर्श के प्राचलों को आकलित करने में न्यूनतम वर्गों के सिद्धान्त के उपयोग को कैसे उचित ठहरायेंगे? सामान्य संकेतनों के साथ दर्शाइए कि समाश्रयन निदर्श y = Xβ + ε के लिए β का न्यूनतम वर्ग आकलक β̂ = (X'X)⁻¹X'y है। (10 अंक) (b) (i) यदि X̃, N₃(μ̃, Σ) के रूप में बंटित है, तो [X₁ - X₂; X₂ - X₃] का बंटन प्राप्त कीजिए। (5 अंक) (ii) यदि X₁, X₂ और X₃ तीन चर हैं, तो X₃ के प्रभाव को समाप्त करते हुए X₁ और X₂ के बीच आंशिक सहसंबंध गुणांक, ρ₁₂·₃, के लिए व्यंजक, सरल सहसंबंध गुणांकों के रूप में, प्राप्त कीजिए। (5 अंक) (c) X₁ और X₂, Nₚ(μ̃, Σ) से क्रमशः (n₁ × p) और (n₂ × p) कोटि के स्वतंत्र आंकड़ों के समुच्चय हैं। दर्शाइए कि (n₁n₂D²)/n का बंटन T²(p, n-2) के रूप में हुआ है, जहाँ n = n₁ + n₂ है और T² तथा D² क्रमशः हॉटेलिंग T² तथा महालनोबिस D² को निरूपित करते हैं। (10 अंक) (d) समष्टि U = {a, b, c, d, e} के लिए निम्नलिखित प्रतिचयन अभिकल्पना पर विचार कीजिए: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6। प्रथम कोटि तथा द्वितीय कोटि की अंतर्वेश प्रायिकताओं की गणना कीजिए। अतः दर्शाइए कि यह एक स्तरित अभिकल्पना का मामला है। स्तरों को उनकी इकाइयों के साथ चिह्नित कीजिए। (10 अंक) (e) मान लीजिए कि एक अभिकल्पना का आपतन आव्यूह N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]] है। दर्शाइए कि— (i) अभिकल्पना संबद्ध संतुलित है; (ii) इसकी दक्षता कारक E = 8/9 है। (6+4=10 अंक)
Directive word: Derive
This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
Derive requires rigorous mathematical proof and step-by-step derivation across all sub-parts. Allocate approximately 20% time to part (a) on least squares justification and derivation, 20% to part (b) on multivariate normal transformations and partial correlation, 20% to part (c) on Hotelling's T² distribution, 20% to part (d) on inclusion probabilities and stratified design identification, and 20% to part (e) on connectedness, balance and efficiency factor. Begin with clear statement of assumptions, proceed through systematic derivations with matrix algebra where needed, and conclude with explicit verification of claimed properties.
Key points expected
- Part (a): Justify least squares via Gauss-Markov theorem (BLUE property under Gauss-Markov assumptions) or via maximum likelihood under normality; derive β̂ = (X'X)⁻¹X'y by minimizing S(β) = ε'ε = (y-Xβ)'(y-Xβ) using matrix differentiation
- Part (b)(i): Apply linear transformation property of multivariate normal; define transformation matrix A = [[1, -1, 0], [0, 1, -1]]; derive distribution as N₂(Aμ̃, AΣA') with explicit mean and covariance structure
- Part (b)(ii): Derive ρ₁₂·₃ = (ρ₁₂ - ρ₁₃ρ₂₃)/√[(1-ρ₁₃²)(1-ρ₂₃²)] using residual correlation formula or partial covariance matrix inversion
- Part (c): Define D² = (x̄₁ - x̄₂)'S⁻¹(x̄₁ - x̄₂); use independence of sample means and pooled covariance; apply Wishart and Hotelling's T² construction to show (n₁n₂/n)D² ~ T²(p, n-2)
- Part (d): Calculate πᵢ = Σ_{s∋i} P(s) for first-order inclusion probabilities; calculate πᵢⱼ = Σ_{s∋i,j} P(s) for second-order; verify πᵢⱼ = πᵢπⱼ/πₕ for stratified structure; identify strata as {a,b}, {c}, {d,e} or equivalent based on inclusion pattern analysis
- Part (e)(i): Verify connectedness via incidence matrix rank or graph connectivity; verify balance via constant λ = Σⱼ nᵢⱼnᵢ'ⱼ for all i ≠ i' pairs
- Part (e)(ii): Calculate efficiency factor E = (v-1)/[r(k-1)] × (harmonic mean of eigenvalues) or via C-matrix eigenvalues; show E = 8/9 explicitly
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 10 | Correctly states all model assumptions for (a) including E(ε)=0, Var(ε)=σ²I, rank(X)=p; properly defines multivariate normal parameters in (b); correctly specifies independence conditions and dimensions in (c); accurately enumerates all samples in (d); correctly interprets incidence matrix structure in (e) | States most assumptions but misses some (e.g., omits rank condition in (a) or independence in (c)); minor errors in parameter dimensions; incomplete sample enumeration in (d) | Missing or incorrect assumptions; wrong model specification; fundamental misunderstanding of distribution parameters or design structure |
| Method choice | 20% | 10 | Selects optimal derivation paths: matrix calculus for (a), linear transformation theorem for (b)(i), residual/regression approach for (b)(ii), Wishart-Hotelling connection for (c), systematic inclusion probability calculation for (d), C-matrix analysis for (e); justifies method choices briefly | Correct methods chosen but without justification; occasionally suboptimal approach (e.g., direct integration instead of transformation theorem); some parts use valid but lengthy methods | Incorrect methods selected (e.g., univariate approach for multivariate problems); inappropriate use of formulas; confused or missing methodology |
| Computation accuracy | 20% | 10 | Flawless matrix algebra including correct handling of (X'X)⁻¹ existence; accurate determinant and inverse calculations; correct eigenvalue computation for efficiency factor; precise arithmetic for all inclusion probabilities (πₐ=3/6, π_b=4/6, etc.); no calculation errors in T² derivation | Minor arithmetic slips (e.g., sign errors in covariance transformations, off-by-one in degrees of freedom); correct approach but some intermediate values wrong; final answers affected but method clear | Major computational errors (e.g., incorrect matrix multiplication, wrong inverse formula, miscalculated probabilities summing to ≠1); missing critical steps; illegible or incoherent calculations |
| Interpretation | 20% | 10 | Explains why least squares is BLUE in (a); interprets transformed variables in (b)(i) as contrasts; explains partial correlation as correlation of residuals; interprets T² as scaled Mahalanobis distance; clearly identifies stratification structure with reasoning in (d); explains connectedness and balance implications in (e) | Some interpretation present but superficial; misses connection between algebraic result and statistical meaning; limited explanation of design properties | Purely mechanical derivations with no interpretation; fails to explain what results mean statistically; missing identification of strata or design properties |
| Final answer & units | 20% | 10 | All final answers explicitly stated: β̂ formula in (a), complete N₂ distribution parameters in (b)(i), ρ₁₂·₃ formula in (b)(ii), T² distribution with correct parameters in (c), complete πᵢ and πᵢⱼ tables with strata identification in (d), verified connected balanced status and E=8/9 in (e); proper matrix/vector notation throughout | Most answers stated but some incomplete (e.g., missing covariance matrix in (b)(i), partial identification of strata); notation occasionally inconsistent | Missing final answers; incomplete or wrong distribution specifications; no summary of inclusion probabilities; efficiency factor not calculated or wrong |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2024 Paper I
- Q1 (a) Two events A and B are such that P(A) = 1/3, P(B) = 1/4 and P(A|B) + P(B|A) = 2/3. Evaluate the following: (i) P(A^c ∪ B^c) (5 marks) (…
- Q2 (a) Let the joint probability density function of two random variables X and Y be f(x, y) = x/3, for 0 < 2x < 3y < 6; 0, otherwise. Compute…
- Q3 (a) Let moment generating function of random variable X exist in the neighbourhood of zero and if $$E(X^n) = \frac{1}{5} + (-1)^n \frac{2}{…
- Q4 (a) Find the most powerful test of size α(= 0·05) for testing H₀: μ = 0 vs. H₁: μ = 1, given a random sample of size 25 from N(μ, 16) popul…
- Q5 (a) How will you justify the usage of the principle of least squares in estimating the parameters of a linear regression model? With usual…
- Q6 (a) (X, Y) has bivariate normal distribution BN(μ₁, μ₂, σ₁², σ₂², ρ). (i) Show that X and Y are independent if and only if ρ = 0. (6 marks)…
- Q7 (a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata unde…
- Q8 (a) A 2²-factorial design was used to develop the yield of a crop. Two factors A and B were used at two levels: low (–1) and high (+1). The…