Q6
(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks) (b) Describe with examples the technique of two-stage sampling. Obtain the variance of the sample mean under two-stage sampling without replacement. Hence, deduce the variance of the sample mean under : (i) Stratified random sampling, and (ii) Cluster sampling (20 marks) (c) (i) If X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, where Y₁, Y₂ and Y₃ are uncorrelated random variables and each of which has zero mean and unit standard deviation, find the multiple correlation coefficient between X₃ and X₁, X₂. (ii) Let X be a 3-dimensional random vector with dispersion matrix Σ = ⎛9 3 3⎞ ⎜3 9 3⎟ ⎝3 3 9⎠. Determine the first principal component and the proportion of the total variability that it explains. (7+8=15 marks)
हिंदी में प्रश्न पढ़ें
(a) द्विधा वर्गीकृत आंकड़ों के एक समूह, जिसमें कारक A के k स्तर हैं और कारक B के r स्तर हैं, प्रत्येक कोष्ठक में एक प्रेक्षण है । दर्शाइए कि त्रुटि विपर्यासों की कुल संख्या (r – 1) (k – 1) है । (15 अंक) (b) द्वि-चरण प्रतिचयन तकनीक का उदाहरणों सहित वर्णन कीजिए । प्रतिस्थापन रहित द्वि-चरण प्रतिचयन के अंतर्गत प्रतिदर्श माध्य का प्रसरण प्राप्त कीजिए । इससे प्रतिदर्श माध्य का प्रसरण : (i) स्तरित यादृच्छिक प्रतिचयन, एवं (ii) गुच्छ प्रतिचयन के अंतर्गत निकालिए । (20 अंक) (c) (i) यदि X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, जहाँ Y₁, Y₂ और Y₃ असहसंबंधित यादृच्छिक चर हैं तथा इनमें से प्रत्येक का माध्य शून्य एवं मानक विचलन एक है, तो X₃ और X₁, X₂ के बीच बहुसंबंध गुणांक ज्ञात कीजिए । (ii) मान लीजिए कि X एक 3-विमीय यादृच्छिक सदिश है जिसका परिक्षेपण आव्यूह Σ = ⎛9 3 3⎞ ⎜3 9 3⎟ ⎝3 3 9⎠ है । प्रथम मुख्य घटक का एवं इसके द्वारा वर्णित किए गए संपूर्ण परिवर्तनशीलता के भाग का निर्धारण कीजिए । (7+8=15 अंक)
Directive word: Derive
This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
Derive the required results systematically across all sub-parts. For (a), establish the linear model and count constraints; for (b), describe two-stage sampling with Indian census/NSSO examples, then derive variance formula and deduce special cases; for (c)(i), compute multiple correlation using matrix algebra; for (c)(ii), find eigenvalues and eigenvectors for PCA. Allocate approximately 30% time to (a), 40% to (b), 15% each to (c)(i) and (c)(ii), ensuring all derivations show complete steps with proper justification.
Key points expected
- For (a): Define the two-way ANOVA model with one observation per cell, identify total contrasts (rk-1), subtract treatment contrasts (k-1 for factor A, r-1 for factor B), and show error contrasts = (r-1)(k-1) using degrees of freedom partition
- For (b): Describe two-stage sampling with NSSO household survey or agricultural census example; derive variance of sample mean under SRSWOR at both stages; deduce stratified random sampling variance by letting second-stage sampling fraction tend to 1
- For (b) continued: Deduce cluster sampling variance by letting first-stage sampling fraction tend to 1, showing how the general formula collapses to known special cases
- For (c)(i): Compute Var(X₁), Var(X₂), Cov(X₁,X₂), Cov(X₃,X₁), Cov(X₃,X₂); set up multiple regression of X₃ on X₁,X₂; calculate R² and multiple correlation coefficient R₃.₁₂
- For (c)(ii): Find eigenvalues of Σ (6, 6, 12), identify first principal component as (1/√3)(1,1,1)′ corresponding to λ=12, and compute proportion of variability as 12/24 = 0.5 or 50%
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 10 | Correctly specifies the linear model for (a) with proper identification of constraints; for (b) clearly defines population structure, sampling units and stages; for (c) correctly states distributional assumptions and matrix properties | Basic model stated but missing some constraints or assumptions; sampling description present but population structure unclear; some distributional properties stated but incomplete | Incorrect model specification or missing essential setup; confuses sampling stages or units; wrong assumptions about Y variables or dispersion matrix properties |
| Method choice | 20% | 10 | Uses Cochran's theorem for (a); applies appropriate variance decomposition for two-stage sampling with correct conditional expectation approach; uses proper matrix inversion/eigenvalue methods for (c) | Correct general approach but suboptimal methods used; some shortcuts taken that skip important steps; eigenvalue calculation present but method not clearly justified | Wrong methodological approach (e.g., ignores finite population correction, uses independence incorrectly); attempts direct calculation without using matrix properties |
| Computation accuracy | 20% | 10 | Accurate arithmetic throughout: correct degrees of freedom count (a); precise variance formula derivation with correct coefficients (b); exact multiple correlation value √(2/3) and eigenvalues 6,6,12 with correct eigenvector normalization (c) | Minor computational errors (sign errors, arithmetic slips) that don't fundamentally alter structure; correct final answers but some intermediate steps wrong | Major computational errors affecting conclusions; wrong variance formula; incorrect eigenvalues or failure to normalize eigenvectors; wrong correlation coefficient |
| Interpretation | 20% | 10 | Explains why error contrasts equal interaction degrees of freedom (a); interprets variance components and explains efficiency comparisons between sampling schemes (b); interprets multiple correlation as strength of linear prediction and PCA as variance maximization (c) | Some interpretation present but superficial; mentions what results mean without explaining statistical significance; limited comparison of sampling methods | No interpretation provided; purely mechanical derivation without explaining what results signify; fails to connect special cases to general formula in (b) |
| Final answer & units | 20% | 10 | All final answers clearly boxed/stated: (r-1)(k-1) for (a); complete variance formula with deductions for (b); R₃.₁₂ = √(2/3) for (c)(i); first PC as (X₁+X₂+X₃)/√3 with 50% explained variance for (c)(ii) | Final answers present but not clearly distinguished; some parts missing final numerical values; deductions in (b) incomplete | Missing final answers; answers buried in text without clear statement; wrong final answers despite correct method; no explicit deduction of special cases in (b) |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2022 Paper I
- Q1 (a) Let X and Y be independent random variables with exponential distribution having respective means $\frac{1}{\lambda_1}$ and $\frac{1}{\…
- Q2 (a) Let a random variable X have exponential distribution with mean 1/θ, θ > 0. To test H₀ : θ = 3 against H₁ : θ = 2, construct sequential…
- Q3 (a) (i) How large a sample must be taken in order that the probability will be at least 0·90 that the sample mean will be within 0·4 – neig…
- Q4 (a) Consider Poisson distribution $$P_{\theta}(X = j) = \frac{e^{-\theta} \theta^{j}}{j!} = p_{j}, j = 0, 1, 2, ....$$ Let $f_{j}$ be the f…
- Q5 (a) Define general linear model with usual assumptions. If y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, where u₁, u₂, u₃ are mutually i…
- Q6 (a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell.…
- Q7 (a) Consider the following data given for a BIBD with v = b = 4, r = k = 3, λ = 2 and N = 12 : Analyse the design. [Given that : F₃,₅ (0·05…
- Q8 (a) (i) What are orthogonal polynomials ? How do you fit an orthogonal polynomial of degree 'p' ? (ii) For the model Y_(n×1) = X_(n×k) β_(k…