Statistics 2021 Paper I 50 marks Derive

Q6

(a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and Xⱼ. For a data, it was found r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72. (i) Check the consistency of the above data. (ii) If r₁₃ is unknown, obtain the limits within which r₁₃ lies given the above values for r₁₂ and r₂₃. 20 (b) In cluster sampling with equal size clusters, obtain the unbiased estimate of population mean. Also obtain its sampling variance as V(ȳ̄) = (1-f)(NM-1)S²{1+(M-1)ρcl}/[M²(N-1)n], where notations have their usual meanings. 15 (c) Let Z₃ₓ₁ = (X₁ₓ₁, Y₂ₓ₁)ᵀ ~ N₃((0, 0, 1)ᵀ, [[1, 2, 1], [2, 5, 2], [1, 2, 2]]). Show that conditional on X₁ₓ₁, the two components of Y₂ₓ₁ are independent but marginally they are not. 15

हिंदी में प्रश्न पढ़ें

(a) किसी बहु रैखिक समाश्रयण निदर्श जिसमें तीन सह-विचर X₁, X₂ और X₃ हैं, के लिए, माना rᵢⱼ, Xᵢ और Xⱼ में सहसंबंध गुणांक दर्शाता है। किन्हीं आँकड़ों के लिए, देखा गया कि r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72 है। (i) उपर्युक्त आँकड़ों की संगतता जाँचिए। (ii) यदि r₁₃ अज्ञात हो, तो ऊपर दिए गए r₁₂ और r₂₃ के मानों से r₁₃ की सीमाएँ प्राप्त कीजिए। 20 (b) समान आकार वाले गुच्छों के गुच्छ प्रतिचयन में, समष्टि माध्य का अनभिनत आकलक प्राप्त कीजिए। इसका प्रतिचयन प्रसरण भी निम्न रूप में ज्ञात कीजिए : V(ȳ̄) = (1-f)(NM-1)S²{1+(M-1)ρcl}/[M²(N-1)n] जहाँ संकेतों के अपने सामान्य अर्थ हैं। 15 (c) माना Z₃ₓ₁ = (X₁ₓ₁, Y₂ₓ₁)ᵀ ~ N₃((0, 0, 1)ᵀ, [[1, 2, 1], [2, 5, 2], [1, 2, 2]]). दर्शाइए कि X₁ₓ₁ के प्रतिबंध पर, Y₂ₓ₁ के दो घटक स्वतंत्र हैं लेकिन उपांतिय वे स्वतंत्र नहीं हैं। 15

Directive word: Derive

This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.

See our UPSC directive words guide for a full breakdown of how to respond to each command word.

How this answer will be evaluated

Approach

Derive the required mathematical results systematically across all three parts. For part (a)(i)-(ii), apply correlation matrix properties and determinant conditions first, then use partial correlation bounds. For part (b), build the cluster sampling theory from first principles with ANOVA decomposition. For part (c), partition the multivariate normal distribution and derive conditional distributions. Allocate approximately 40% time to part (a) given its 20 marks, 30% each to parts (b) and (c). Structure as: direct derivations without lengthy introductions, clear theorem statements, step-by-step proofs, and boxed final expressions.

Key points expected

  • For (a)(i): Verify positive semi-definiteness of correlation matrix by checking det(R) ≥ 0 or all principal minors non-negative; compute 1 - r₁₂² - r₂₃² - r₁₃² + 2r₁₂r₂₃r₁₃ ≥ 0
  • For (a)(ii): Derive bounds using r₁₃ = r₁₂r₂₃ ± √[(1-r₁₂²)(1-r₂₃²)]; obtain numerical interval [0.077, 0.963] or equivalent
  • For (b): Define cluster sampling estimator ȳ̄ = (1/nM)ΣᵢΣⱼ yᵢⱼ; prove unbiasedness E(ȳ̄) = Ȳ; derive variance via between-cluster and within-cluster SS decomposition
  • For (b): Express variance in ICC form using ρcl = (S_b² - S_w²)/(S_b² + (M-1)S_w²) or equivalent definition; manipulate to reach target formula
  • For (c): Partition covariance matrix Σ = [[Σ_XX, Σ_XY], [Σ_YX, Σ_YY]]; derive conditional distribution Y|X ~ N(μ_Y + Σ_YXΣ_XX⁻¹(X-μ_X), Σ_YY - Σ_YXΣ_XX⁻¹Σ_XY)
  • For (c): Show conditional covariance matrix is diagonal (implying independence given X₁) while marginal covariance Σ_YY is not diagonal

Evaluation rubric

DimensionWeightMax marksExcellentAveragePoor
Setup correctness20%10Correctly identifies and states: for (a) the correlation matrix R and PSD condition, for (b) the cluster sampling framework with N clusters of size M and f = n/N, for (c) the partitioned MVN structure with proper matrix dimensions; all notation matches standard statistical literatureBasic setup present but with minor notational inconsistencies (e.g., confusing N and n, or misidentifying matrix blocks in part c); misses explicit statement of some assumptionsIncorrect setup: wrong matrix dimensions, confuses population and sample quantities, or fundamentally misunderstands the sampling design or distribution structure
Method choice20%10Selects optimal methods: for (a) determinant test for consistency and partial correlation for bounds; for (b) ANOVA decomposition with ICC formulation; for (c) standard MVN conditional distribution formula; justifies why alternatives are inferiorUses correct but suboptimal methods (e.g., numerical verification instead of analytical bounds in a-ii); or correct methods without justification; misses elegant shortcutsWrong methodological approach: attempts eigenvalue tests unnecessarily, uses simple random sampling formulas for cluster sampling, or attempts direct integration for conditional distribution
Computation accuracy20%10Flawless calculations: for (a) correct determinant value and precise bounds; for (b) correct algebraic manipulation to target variance formula with proper handling of finite population correction; for (c) accurate matrix inversions and multiplications yielding diagonal conditional covarianceMinor arithmetic slips (e.g., sign errors in expansion, coefficient mistakes in variance formula, or off-by-one in matrix indices) that don't fundamentally derail the solutionMajor computational errors: incorrect determinant calculation, wrong variance expression, non-diagonal conditional covariance claimed as diagonal, or algebraic impossibilities
Interpretation20%10Interprets (a) consistency as feasibility of correlation structure and bounds as attainable limits; explains (b) variance inflation via ICC and design effect; clarifies (c) paradox of conditional vs marginal independence with insight into suppressor effects or similar phenomenaStates results without deeper interpretation; or gives generic statements ('this shows correlation matters') without connecting to the specific statistical phenomenon demonstratedMisinterprets results: claims inconsistent data is valid, misunderstands why variance increases with ρcl, or cannot explain why conditional independence doesn't imply marginal independence
Final answer & units20%10All final answers boxed/highlighted: (a)(i) clear consistency verdict with numerical verification, (a)(ii) precise interval, (b) fully derived variance formula matching question format, (c) explicit conditional distribution with covariance matrix showing independence; proper dimensional analysis throughoutFinal answers present but poorly formatted; or missing explicit statement of conditional distribution parameters in (c); or variance formula correct but not in required formMissing final answers, wrong conclusions (inconsistent claimed consistent), incomplete derivations stopping before target expressions, or dimensional inconsistencies (variance with wrong scaling)

Practice this exact question

Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.

Evaluate my answer →

More from Statistics 2021 Paper I