Q6
(a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and Xⱼ. For a data, it was found r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72. (i) Check the consistency of the above data. (ii) If r₁₃ is unknown, obtain the limits within which r₁₃ lies given the above values for r₁₂ and r₂₃. 20 (b) In cluster sampling with equal size clusters, obtain the unbiased estimate of population mean. Also obtain its sampling variance as V(ȳ̄) = (1-f)(NM-1)S²{1+(M-1)ρcl}/[M²(N-1)n], where notations have their usual meanings. 15 (c) Let Z₃ₓ₁ = (X₁ₓ₁, Y₂ₓ₁)ᵀ ~ N₃((0, 0, 1)ᵀ, [[1, 2, 1], [2, 5, 2], [1, 2, 2]]). Show that conditional on X₁ₓ₁, the two components of Y₂ₓ₁ are independent but marginally they are not. 15
हिंदी में प्रश्न पढ़ें
(a) किसी बहु रैखिक समाश्रयण निदर्श जिसमें तीन सह-विचर X₁, X₂ और X₃ हैं, के लिए, माना rᵢⱼ, Xᵢ और Xⱼ में सहसंबंध गुणांक दर्शाता है। किन्हीं आँकड़ों के लिए, देखा गया कि r₁₂ = 0·77, r₂₃ = 0·52, r₁₃ = 0·72 है। (i) उपर्युक्त आँकड़ों की संगतता जाँचिए। (ii) यदि r₁₃ अज्ञात हो, तो ऊपर दिए गए r₁₂ और r₂₃ के मानों से r₁₃ की सीमाएँ प्राप्त कीजिए। 20 (b) समान आकार वाले गुच्छों के गुच्छ प्रतिचयन में, समष्टि माध्य का अनभिनत आकलक प्राप्त कीजिए। इसका प्रतिचयन प्रसरण भी निम्न रूप में ज्ञात कीजिए : V(ȳ̄) = (1-f)(NM-1)S²{1+(M-1)ρcl}/[M²(N-1)n] जहाँ संकेतों के अपने सामान्य अर्थ हैं। 15 (c) माना Z₃ₓ₁ = (X₁ₓ₁, Y₂ₓ₁)ᵀ ~ N₃((0, 0, 1)ᵀ, [[1, 2, 1], [2, 5, 2], [1, 2, 2]]). दर्शाइए कि X₁ₓ₁ के प्रतिबंध पर, Y₂ₓ₁ के दो घटक स्वतंत्र हैं लेकिन उपांतिय वे स्वतंत्र नहीं हैं। 15
Directive word: Derive
This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
Derive the required mathematical results systematically across all three parts. For part (a)(i)-(ii), apply correlation matrix properties and determinant conditions first, then use partial correlation bounds. For part (b), build the cluster sampling theory from first principles with ANOVA decomposition. For part (c), partition the multivariate normal distribution and derive conditional distributions. Allocate approximately 40% time to part (a) given its 20 marks, 30% each to parts (b) and (c). Structure as: direct derivations without lengthy introductions, clear theorem statements, step-by-step proofs, and boxed final expressions.
Key points expected
- For (a)(i): Verify positive semi-definiteness of correlation matrix by checking det(R) ≥ 0 or all principal minors non-negative; compute 1 - r₁₂² - r₂₃² - r₁₃² + 2r₁₂r₂₃r₁₃ ≥ 0
- For (a)(ii): Derive bounds using r₁₃ = r₁₂r₂₃ ± √[(1-r₁₂²)(1-r₂₃²)]; obtain numerical interval [0.077, 0.963] or equivalent
- For (b): Define cluster sampling estimator ȳ̄ = (1/nM)ΣᵢΣⱼ yᵢⱼ; prove unbiasedness E(ȳ̄) = Ȳ; derive variance via between-cluster and within-cluster SS decomposition
- For (b): Express variance in ICC form using ρcl = (S_b² - S_w²)/(S_b² + (M-1)S_w²) or equivalent definition; manipulate to reach target formula
- For (c): Partition covariance matrix Σ = [[Σ_XX, Σ_XY], [Σ_YX, Σ_YY]]; derive conditional distribution Y|X ~ N(μ_Y + Σ_YXΣ_XX⁻¹(X-μ_X), Σ_YY - Σ_YXΣ_XX⁻¹Σ_XY)
- For (c): Show conditional covariance matrix is diagonal (implying independence given X₁) while marginal covariance Σ_YY is not diagonal
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 10 | Correctly identifies and states: for (a) the correlation matrix R and PSD condition, for (b) the cluster sampling framework with N clusters of size M and f = n/N, for (c) the partitioned MVN structure with proper matrix dimensions; all notation matches standard statistical literature | Basic setup present but with minor notational inconsistencies (e.g., confusing N and n, or misidentifying matrix blocks in part c); misses explicit statement of some assumptions | Incorrect setup: wrong matrix dimensions, confuses population and sample quantities, or fundamentally misunderstands the sampling design or distribution structure |
| Method choice | 20% | 10 | Selects optimal methods: for (a) determinant test for consistency and partial correlation for bounds; for (b) ANOVA decomposition with ICC formulation; for (c) standard MVN conditional distribution formula; justifies why alternatives are inferior | Uses correct but suboptimal methods (e.g., numerical verification instead of analytical bounds in a-ii); or correct methods without justification; misses elegant shortcuts | Wrong methodological approach: attempts eigenvalue tests unnecessarily, uses simple random sampling formulas for cluster sampling, or attempts direct integration for conditional distribution |
| Computation accuracy | 20% | 10 | Flawless calculations: for (a) correct determinant value and precise bounds; for (b) correct algebraic manipulation to target variance formula with proper handling of finite population correction; for (c) accurate matrix inversions and multiplications yielding diagonal conditional covariance | Minor arithmetic slips (e.g., sign errors in expansion, coefficient mistakes in variance formula, or off-by-one in matrix indices) that don't fundamentally derail the solution | Major computational errors: incorrect determinant calculation, wrong variance expression, non-diagonal conditional covariance claimed as diagonal, or algebraic impossibilities |
| Interpretation | 20% | 10 | Interprets (a) consistency as feasibility of correlation structure and bounds as attainable limits; explains (b) variance inflation via ICC and design effect; clarifies (c) paradox of conditional vs marginal independence with insight into suppressor effects or similar phenomena | States results without deeper interpretation; or gives generic statements ('this shows correlation matters') without connecting to the specific statistical phenomenon demonstrated | Misinterprets results: claims inconsistent data is valid, misunderstands why variance increases with ρcl, or cannot explain why conditional independence doesn't imply marginal independence |
| Final answer & units | 20% | 10 | All final answers boxed/highlighted: (a)(i) clear consistency verdict with numerical verification, (a)(ii) precise interval, (b) fully derived variance formula matching question format, (c) explicit conditional distribution with covariance matrix showing independence; proper dimensional analysis throughout | Final answers present but poorly formatted; or missing explicit statement of conditional distribution parameters in (c); or variance formula correct but not in required form | Missing final answers, wrong conclusions (inconsistent claimed consistent), incomplete derivations stopping before target expressions, or dimensional inconsistencies (variance with wrong scaling) |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2021 Paper I
- Q1 (a) A production unit manufacturing surgical masks is concerned about the quality of their masks. A random sample of n masks are inspected…
- Q2 (a) Let Y₁, Y₂, Y₃, ... be independent and identical Poisson random variables with parameter 1. Use central limit theorem to establish $$n!…
- Q3 (a) Let X and Y be two independent random variables following exponential distribution with mean $\frac{1}{\lambda}$ and $\frac{1}{\mu}$ re…
- Q4 (a) Let X₁, X₂, ..., Xₙ be a random sample from Poisson distribution with mean λ > 0. Define a statistic W = (1 − 1/n)^T, T = Σᵢ₌₁ⁿ Xᵢ (i)…
- Q5 (a) For a simple linear regression model Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n (i) Derive the least square estimators of β₀ and β₁, clearly sta…
- Q6 (a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and X…
- Q7 (a) (i) What is confounding in factorial experiments ? (ii) A $2^6$ factorial experiment is conducted in blocks of size $2^3$. Write the co…
- Q8 (a) (i) In stratified sampling under optimum allocation, how will you proceed to select units from different strata, if one or more nᵢ's ha…