Statistics

UPSC Statistics 2022 — Paper I

All 8 questions from UPSC Civil Services Mains Statistics 2022 Paper I (400 marks total). Every stem reproduced in full, with directive-word analysis, marks, word limits, and answer-approach pointers.

8Questions

400Total marks

2022Year

Paper IPaper

Topics covered

Probability distributions and statistical inference (1)Sequential probability ratio test and order statistics (1)Probability theory and statistical inference (1)Statistical estimation and hypothesis testing (1)Linear models, multivariate normal, experimental design, sampling (1)ANOVA, sampling techniques, multivariate analysis (1)Design of experiments and multivariate analysis (1)Regression, sampling and experimental design (1)

A

50M Compulsory prove Probability distributions and statistical inference

(a) Let X and Y be independent random variables with exponential distribution having respective means $\frac{1}{\lambda_1}$ and $\frac{1}{\lambda_2}$, $\lambda_1 > 0, \lambda_2 > 0$. Find E [max (X, Y)]. (10 marks) (b) Using Central Limit Theorem, show that $$\lim_{n \to \infty} e^{-n} \sum_{k=0}^{n} \frac{n^k}{k!} = \frac{1}{2}$$ (10 marks) (c) An unbiased six-sided die is thrown twice. Let X denote the smaller of the scores obtained. Then show that the probability mass function (p.m.f.) of X is given by : $$p_X(x) = \frac{13-2x}{36}, \quad x = 1, 2, ..., 6$$ $$= 0, \quad \text{otherwise.}$$ (10 marks) (d) Let T₁ and T₂ be two unbiased estimators of θ with Var(T₁) = Var(T₂), then show that Corr(T₁, T₂) ≥ 2e – 1, where e is the efficiency of each estimator. (10 marks) (e) An urn contains 5 marbles of which θ are white and the others black. In order to test null hypothesis H₀ : θ = 3 versus alternative hypothesis H₁ : θ = 4, two marbles are drawn at random. H₀ is rejected if both the marbles are white, otherwise H₀ is accepted. Show that probability of type I error in case of without replacement and with replacement schemes, both are less than 0·40, but power of the test under with replacement is higher than that of under without replacement scheme. (10 marks)

हिंदी में पढ़ें

(a) मान लीजिए कि X और Y स्वतंत्र एवं चरघातांकी बंटित यादृच्छिक चर हैं जिनके माध्य क्रमशः $\frac{1}{\lambda_1}$ और $\frac{1}{\lambda_2}$ हैं, जहाँ $\lambda_1 > 0, \lambda_2 > 0$ है। E [max (X, Y)] ज्ञात कीजिए। (10 अंक) (b) केन्द्रीय सीमा प्रमेय का प्रयोग करते हुए दर्शाइए कि $$\lim_{n \to \infty} e^{-n} \sum_{k=0}^{n} \frac{n^k}{k!} = \frac{1}{2}$$ (10 अंक) (c) छः फलकों वाले एक निष्पक्ष पाँसे को दो बार फेंका जाता है। मान लीजिए कि प्राप्त समंकों में छोटे समंक को X से निर्दिष्ट किया जाता है। तब दर्शाइए कि X का प्रायिकता द्रव्यमान फलन (पी.एम.एफ.) इस प्रकार दिया जाता है : $$p_X(x) = \frac{13-2x}{36}, \quad x = 1, 2, ..., 6$$ $$= 0, \quad \text{अन्यथा।}$$ (10 अंक) (d) मान लीजिए θ के लिए T₁ और T₂ दो अनभिनत आकलक हैं जिनके प्रसरण Var(T₁) = Var(T₂) हैं, तब दर्शाइए कि Corr(T₁, T₂) ≥ 2e – 1, जहाँ e प्रत्येक आकलक की दक्षता है। (10 अंक) (e) एक कलश में 5 मार्बल हैं जिनमें से θ सफेद हैं और बाकी काले हैं। निराकरणीय परिकल्पना H₀ : θ = 3 का वैकल्पिक परिकल्पना H₁ : θ = 4 के विरुद्ध परीक्षण करने के लिए दो मार्बल यादृच्छया लिए गए हैं। यदि दोनों मार्बल सफेद आते हैं तो H₀ को अस्वीकार किया जाता है, अन्यथा H₀ को स्वीकार किया जाता है। दर्शाइए कि प्रथम प्रकार की त्रुटि की प्रायिकता प्रतिस्थापन रहित तथा प्रतिस्थापन सहित दोनों योजनाओं में 0·40 से कम है, लेकिन परीक्षण की क्षमता प्रतिस्थापन सहित योजना में प्रतिस्थापन रहित योजना से अधिक है। (10 अंक)

Answer approach & key points

Prove each of the five results systematically, allocating approximately 2 minutes per mark (20 minutes total). For (a), use the identity E[max(X,Y)] = E[X] + E[Y] - E[min(X,Y)] or direct integration; for (b), recognize the Poisson sum and apply CLT with continuity correction; for (c), enumerate favorable outcomes for minimum value; for (d), apply Cauchy-Schwarz and efficiency definition; for (e), compute hypergeometric vs binomial probabilities. Present each proof with clear statement of assumptions, step-by-step derivation, and boxed final result.

(a) Correct setup using E[max(X,Y)] = ∫∫ max(x,y)f_X(x)f_Y(y)dxdy or equivalent identity with min(X,Y) ~ Exp(λ₁+λ₂)
(b) Identification of Poisson(n) probability mass function and application of CLT with continuity correction to show P(S_n ≤ n) → Φ(0) = 1/2
(c) Enumeration of outcomes where min equals k: (k,k), (k,j) for j>k, (i,k) for i>k, yielding count (13-2k) for each k = 1,...,6
(d) Use of Var(T₁+T₂) ≥ 0 and efficiency definition e = [CRLB]/Var(T₁) to establish the inequality Corr(T₁,T₂) ≥ 2e-1
(e) Type I error: P(reject|H₀) = C(3,2)/C(5,2) = 0.3 (without replacement) vs (3/5)² = 0.36 (with replacement); Power: P(reject|H₁) = C(4,2)/C(5,2) = 0.6 vs (4/5)² = 0.64, showing higher power with replacement

Open full rubric & evaluate →

50M construct Sequential probability ratio test and order statistics

(a) Let a random variable X have exponential distribution with mean 1/θ, θ > 0. To test H₀ : θ = 3 against H₁ : θ = 2, construct sequential probability ratio test. Show that probability of terminating the test at the first stage when null hypothesis is true is 1 – 8/27 ((A–B)/AB), where B and A, B < A, are stopping bounds. (20 marks) (b) Each Sunday a fisherman visits one of three possible locations near his home : he goes to the sea with probability 1/2, to a river with probability 1/4, or to a lake with probability 1/4. If he goes to the sea there is an 80% chance that he will catch fish; corresponding figures for the river and the lake are 40% and 60% respectively. (i) Find the probability that, on a given Sunday, he catches fish. (ii) If, on a particular Sunday, he comes home without catching anything, determine the most likely place that he has been to. (5+10=15 marks) (c) Let X₁ < X₂ < X₃ be the order statistics from uniform population having probability density function f(x; θ) = 1/θ, 0 < x < θ. Show that 4X₁ is an unbiased estimator of θ. (15 marks)

हिंदी में पढ़ें

(a) मान लीजिए कि एक यादृच्छिक चर X का बंटन चरघातांकी है जिसका माध्य 1/θ, θ > 0 है। H₀ : θ = 3 का H₁ : θ = 2 के विरुद्ध परीक्षण करने के लिए अनुक्रमिक प्रायिकता अनुपात परीक्षण की रचना कीजिए। यदि निराकरणीय परिकल्पना सत्य है, तो दर्शाइए कि प्रथम चरण में परीक्षण निरस्त होने की प्रायिकता 1 – 8/27 ((A–B)/AB) है, जहाँ B और A, B < A, समाप्ति सीमाएँ हैं। (20 अंक) (b) प्रत्येक रविवार को एक मछुआरा अपने घर के पास तीन संभावित स्थानों में से किसी एक स्थान पर जाता है : वह प्रायिकता 1/2 के साथ समुद्र को, प्रायिकता 1/4 के साथ एक नदी को, या प्रायिकता 1/4 के साथ एक सरोवर को जाता है। यदि वह समुद्र को जाता है, तो उसके मछली पकड़ने का संयोग 80% है; अनुरूपी संख्याएँ नदी और सरोवर के लिए क्रमशः 40% और 60% हैं। (i) दिए गए एक रविवार के दिन वह मछली पकड़े, इस बात की प्रायिकता ज्ञात कीजिए। (ii) यदि किसी दिए गए रविवार के दिन वह बिना मछली पकड़े घर वापस आता है, तो वह जहाँ से वापस आया, उस अधिकतम संभावित स्थान का निर्धारण कीजिए। (5+10=15 अंक) (c) मान लीजिए कि एकसमान समष्टि जिसका प्रायिकता घनत्व फलन f(x; θ) = 1/θ, 0 < x < θ है, से X₁ < X₂ < X₃ क्रम प्रतिदर्शज लिए गए हैं। दर्शाइए कि 4X₁, θ का एक अनभिनत आकलक है। (15 अंक)

Answer approach & key points

Construct the sequential probability ratio test for part (a) by deriving the likelihood ratio and identifying stopping bounds, allocating approximately 40% of effort given its 20 marks. For part (b), apply Bayes' theorem to solve the probability and posterior location problem, spending ~30% of time. For part (c), derive the distribution of the first order statistic and verify unbiasedness, using the remaining ~30%. Present derivations step-by-step with clear probabilistic reasoning throughout.

Part (a): Derive likelihood ratio Λₙ = (3/2)ⁿ exp(-∑Xᵢ/6) for SPRT with stopping bounds A and B, and show termination probability at first stage under H₀ equals 1 − P(B < (3/2)exp(−X₁/6) < A)
Part (a): Evaluate P(termination at stage 1 | H₀) = 1 − [exp(−6ln(2A/3)) − exp(−6ln(2B/3))] and simplify to 1 − (8/27)((A−B)/AB) using exponential CDF
Part (b)(i): Apply total probability theorem: P(catch) = (1/2)(0.8) + (1/4)(0.4) + (1/4)(0.6) = 0.65
Part (b)(ii): Use Bayes' theorem to find P(sea|no catch) = 0.1/0.35, P(river|no catch) = 0.15/0.35, P(lake|no catch) = 0.1/0.35; identify river as most likely location
Part (c): Derive PDF of X₍₁₎ as f₍₁₎(x) = 3(θ−x)²/θ³ for 0 < x < θ, compute E(X₍₁₎) = θ/4, and conclude E(4X₍₁₎) = θ proving unbiasedness

Open full rubric & evaluate →

50M solve Probability theory and statistical inference

(a) (i) How large a sample must be taken in order that the probability will be at least 0·90 that the sample mean will be within 0·4 – neighbourhood of the population mean, provided the population standard deviation is 2 ? (8 marks) (ii) Examine whether the weak law of large numbers holds for the sequence {Xₖ} of independent random variables defined as follows : $$P(X_k = -1 - \frac{1}{k}) = \frac{1}{2}\left\{1 - \left(1 - \frac{1}{k^2}\right)^{1/2}\right\},$$ $$P(X_k = 1 + \frac{1}{k}) = \frac{1}{2}\left\{1 + \left(1 - \frac{1}{k^2}\right)^{1/2}\right\}.$$ (7 marks) (b) Theoretical probabilities in the four cells of a multinomial distribution are $\frac{2+\theta}{4}$, $\frac{1-\theta}{4}$, $\frac{1-\theta}{4}$ and $\frac{\theta}{4}$, whereas the observed frequencies are 108, 27, 30 and 8 respectively, then estimate θ by maximum likelihood method. Also, obtain the standard error of the estimate. (20 marks) (c) If X is a random variable with characteristic function $$\varphi(t) = \begin{cases} 1-|t|, & |t| \leq 1 \\ 0, & \text{otherwise}, \end{cases}$$ then obtain the corresponding probability density function. (15 marks)

हिंदी में पढ़ें

(a) (i) एक प्रतिदर्श कितना बड़ा लेना चाहिए ताकि इस बात की प्रायिकता कम-से-कम 0·90 होगी कि प्रतिदर्श माध्य समष्टि माध्य के 0·4 - सामीप्य के दायरे में होगा, बशर्ते कि समष्टि मानक विचलन 2 है ? (8 अंक) (ii) परीक्षण कीजिए कि क्या बहुत संख्याओं का दुर्बल नियम निम्न परिभाषित स्वतंत्र यादृच्छिक चरों के अनुक्रम {Xₖ} के लिए लागू होता है : $$P(X_k = -1 - \frac{1}{k}) = \frac{1}{2}\left\{1 - \left(1 - \frac{1}{k^2}\right)^{1/2}\right\},$$ $$P(X_k = 1 + \frac{1}{k}) = \frac{1}{2}\left\{1 + \left(1 - \frac{1}{k^2}\right)^{1/2}\right\}.$$ (7 अंक) (b) एक बहुपद बंटन में चार कोष्ठकों की सैद्धांतिक प्रायिकताएं $\frac{2+\theta}{4}$, $\frac{1-\theta}{4}$, $\frac{1-\theta}{4}$ और $\frac{\theta}{4}$ हैं, जबकि प्रेक्षित बारंबारताएं क्रमशः: 108, 27, 30 और 8 हैं, तब θ का आकलन अधिकतम सम्भाविता विधि से कीजिए। आकल की मानक त्रुटि भी निकालिए। (20 अंक) (c) यदि X एक यादृच्छिक चर है जिसका अभिलक्षण फलन $$\varphi(t) = \begin{cases} 1-|t|, & |t| \leq 1 \\ 0, & \text{अन्यथा}, \end{cases}$$ है, तब संगत प्रायिकता घनत्व फलन को प्राप्त कीजिए। (15 अंक)

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 15 minutes to part (a)(i) on sample size determination using CLT, 15 minutes to part (a)(ii) on verifying WLLN conditions, 25 minutes to part (b) on MLE estimation and standard error computation for multinomial data, and 20 minutes to part (c) on deriving PDF from characteristic function via Fourier inversion. Begin each part with clear statement of the statistical principle being applied, show all computational steps explicitly, and conclude with precise numerical answers or definitive conclusions.

Part (a)(i): Apply Central Limit Theorem with z₀.₉₀ = 1.645 to obtain n ≥ (1.645 × 2/0.4)² = 67.65 → n = 68
Part (a)(ii): Verify E(Xₖ) = 1/k and Var(Xₖ) = 1 - 1/k², then apply Chebyshev or Kolmogorov's criterion to establish WLLN holds
Part (b): Formulate multinomial likelihood L(θ), take log-likelihood, solve dℓ/dθ = 0 to get θ̂ = (2×108 - 27 - 30 + 2×8)/(108+27+30+8) = 0.5, then compute Fisher information for SE(θ̂)
Part (c): Apply Fourier inversion formula f(x) = (1/2π)∫₋₁¹ (1-|t|)e⁻ⁱᵗˣ dt, evaluate to obtain f(x) = (1/πx²)(1 - cos x) = (1/2π)sinc²(x/2) for x ≠ 0, with f(0) = 1/2π
Demonstrate understanding that characteristic function φ(t) = (1-|t|)₊ corresponds to triangular distribution on [-1,1] in frequency domain, yielding Fejér kernel/sinc² in density domain

Open full rubric & evaluate →

50M discuss Statistical estimation and hypothesis testing

(a) Consider Poisson distribution $$P_{\theta}(X = j) = \frac{e^{-\theta} \theta^{j}}{j!} = p_{j}, j = 0, 1, 2, ....$$ Let $f_{j}$ be the frequency for X = j and $E(f_{j}) = m_{j} = np_{j}$. Discuss how you obtain minimum chi-square estimate for $\theta$. Does minimum chi-square method necessarily yield a sufficient statistic even if it exists ? (20 marks) (b) (i) Let the joint probability density function of X and Y be $$f(x, y) = C . \exp \{-(4x^{2} + 9y^{2} - xy)\},$$ where C is a constant. Find E(X), V(X), E(Y), V(Y) and the correlation coefficient between X and Y. (10 marks) (ii) If $X_{1}, X_{2}, ..., X_{6}$ are independent random variables such that $$P(X_{i} = -1) = P(X_{i} = 1) = \frac{1}{2}, i = 1, 2, ..., 6,$$ then obtain the value of $$P\left[\sum_{i=1}^{6} X_{i} = 4\right].$$ (5 marks) (c) The following data present the time (in minutes), that a commuter had to wait to catch a bus to reach his destination : Use the sign-test at 0·05 level of significance to test the claim of the bus operators that commuters do not have to wait for more than 15 minutes before the bus is made available to them. [Given Z₍₀.₀₂₅₎ = 1·96, Z₍₀.₀₅₎ = 1·645] (15 marks)

हिंदी में पढ़ें

(a) प्वासों बंटन $$P_{\theta}(X = j) = \frac{e^{-\theta} \theta^{j}}{j!} = p_{j}, j = 0, 1, 2, ....$$ पर विचार कीजिए । मान लीजिए कि X = j की बारम्बारता $f_{j}$ है तथा $E(f_{j}) = m_{j} = np_{j}$ है । आप $\theta$ का न्यूनतम काई-वर्ग आकल कैसे प्राप्त करेंगे, इसकी विवेचना कीजिए । यदि पर्याप्त प्रतिदर्शज का अस्तित्व भी है तो क्या न्यूनतम काई-वर्ग विधि पर्याप्त प्रतिदर्शज अवश्य देगा ? (20 अंक) (b) (i) मान लीजिए कि X तथा Y का संयुक्त प्रायिकता घनत्व फलन $$f(x, y) = C . \exp \{-(4x^{2} + 9y^{2} - xy)\},$$ है, जहाँ C एक अचर है । E(X), V(X), E(Y), V(Y) और X और Y के बीच सहसंबंध गुणांक को ज्ञात कीजिए । (10 अंक) (ii) यदि स्वतंत्र यादृच्छिक चर $X_{1}, X_{2}, ..., X_{6}$ इस प्रकार हैं कि $$P(X_{i} = -1) = P(X_{i} = 1) = \frac{1}{2}, i = 1, 2, ..., 6$$ हैं, तब $$P\left[\sum_{i=1}^{6} X_{i} = 4\right]$$ का मान प्राप्त कीजिए । (5 अंक) (c) निम्नलिखित आँकड़े एक यात्री को उसके गंतव्य तक पहुँचने के लिए बस को पकड़ने के लिए किए गए प्रतीक्षा समय (मिनटों में) को दर्शाते हैं : साइन-परीक्षण का 0·05 सार्थकता स्तर पर उपयोग करते हुए बस संचालकों के द्वारा दावा कि यात्रियों को बस को पकड़ने के लिए 15 मिनट से अधिक प्रतीक्षा नहीं करनी पड़ती, का परीक्षण कीजिए। [दिया गया है Z₍₀.₀₂₅₎ = 1·96, Z₍₀.₀₅₎ = 1·645] (15 अंक)

Answer approach & key points

The directive 'discuss' in part (a) requires a balanced analytical treatment with derivation and critical evaluation, while parts (b) and (c) are primarily computational. Allocate approximately 40% of effort to part (a) given its 20 marks and theoretical depth, 30% to part (b) covering both (i) bivariate normal properties and (ii) probability calculation, and 30% to part (c) for the non-parametric test. Structure as: brief theoretical exposition for (a), systematic derivations for (b), and complete hypothesis testing procedure for (c).

For (a): Derivation of minimum chi-square estimator for Poisson parameter by minimizing Σ(f_j - np_j)²/(np_j) with respect to θ, leading to the estimating equation
For (a): Critical discussion that minimum chi-square method does NOT necessarily yield sufficient statistics—contrast with MLE which preserves sufficiency via factorization theorem; cite example where MCS estimator differs from sufficient statistic
For (b)(i): Recognition of bivariate normal form, completion of squares to identify μ_x = μ_y = 0, extraction of variances σ_x² = 9/35, σ_y² = 4/35 and covariance to find ρ = 1/6
For (b)(ii): Identification that ΣX_i follows distribution of (number of +1's) - (number of -1's), equivalent to 2×Binomial(6,½) - 6, yielding P(ΣX_i=4) = P(5 successes) = 6/64 = 3/32
For (c): Correct application of sign test with null hypothesis H₀: median ≤ 15 vs H₁: median > 15, counting positive signs (values > 15), using normal approximation with continuity correction, and proper conclusion based on Z = 1.645 critical value

Open full rubric & evaluate →

B

50M Compulsory solve Linear models, multivariate normal, experimental design, sampling

(a) Define general linear model with usual assumptions. If y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, where u₁, u₂, u₃ are mutually independent random variables with mean zero and variance σ², then find the least square estimators of β₁ and β₂. (10 marks) (b) Given X ~ N₃(μ, Σ), where μ = (2, 4, 3)' and Σ = ⎛8 2 3⎞ ⎜2 4 1⎟ ⎝3 1 3⎠ (i) find the regression function of X₁ on X₂ and X₃, and (ii) compute the conditional variance of X₁ given X₂ and X₃. (10 marks) (c) What is a uniformity trial ? Explain how it can be used to determine optimum shape and size. (10 marks) (d) In a 2⁶ – factorial experiment, the key block is given as : (1), ab, cd, ef, ace, abef, abcd, bce, cdef, acf, ade, abcdef, bde, bcf, adf, bdf. Identify the confounded effects. (10 marks) (e) If the coefficients of variation of x and y are equal and the correlation coefficient between x and y is ρ = 2/3, compute the efficiency of ratio estimator relative to the mean of a simple random sample. (10 marks)

हिंदी में पढ़ें

(a) सामान्य रैखिक निदर्श को प्रचलित कल्पनाओं सहित परिभाषित कीजिए । यदि y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, जहाँ u₁, u₂, u₃ परस्पर स्वतंत्र यादृच्छिक चर हैं जिनका माध्य शून्य तथा प्रसरण σ² है, तो β₁ और β₂ के न्यूनतम वर्ग आकलकों को ज्ञात कीजिए । (10 अंक) (b) दिया गया है कि X ~ N₃(μ, Σ), जहाँ μ = (2, 4, 3)' और Σ = ⎛8 2 3⎞ ⎜2 4 1⎟ ⎝3 1 3⎠ (i) X₁ का X₂ और X₃ पर समाश्रयण फलन ज्ञात कीजिए, और (ii) X₂ और X₃ के दिए होने पर X₁ के सप्रतिबंध प्रसरण की गणना कीजिए । (10 अंक) (c) एकसमानता परीक्षण क्या है ? व्याख्या कीजिए कि इसका उपयोग इष्टतम आकृति और आकार ज्ञात करने के लिए कैसे किया जा सकता है । (10 अंक) (d) एक 2⁶ – बहु-उपादानी प्रयोग में, मुख्य खंडक इस प्रकार दिया गया है : (1), ab, cd, ef, ace, abef, abcd, bce, cdef, acf, ade, abcdef, bde, bcf, adf, bdf. संकीर्ण प्रभावों की पहचान कीजिए । (10 अंक) (e) यदि x और y के विचरण गुणांक समान हैं और x और y के बीच सहसंबंध गुणांक ρ = 2/3 है, तो अनुपात आकलक की दक्षता सरल यादृच्छिक प्रतिदर्श के माध्य के सापेक्ष परिकलित कीजिए । (10 अंक)

Answer approach & key points

This is a computational-cum-descriptive question requiring precise derivations and calculations across five sub-parts. Allocate approximately 20% time to part (a) for matrix formulation of GLM and LSE derivation, 20% to part (b) for multivariate normal conditional distributions, 15% to part (c) for explaining uniformity trials with agricultural field trial context, 25% to part (d) for systematic identification of confounded effects in 2⁶ factorial, and 20% to part (e) for ratio estimator efficiency computation. Begin each part with clear statement of method, show all computational steps, and conclude with boxed final answers.

Part (a): Correct matrix formulation of GLM y = Xβ + u with assumptions E(u)=0, Var(u)=σ²I; proper construction of design matrix X and derivation of LSE β̂ = (X'X)⁻¹X'y yielding β̂₁ = (y₁ - y₂)/2 and β̂₂ = (y₁ + y₂ + 2y₃)/2
Part (b)(i): Correct partitioning of Σ into Σ₁₁, Σ₁₂, Σ₂₁, Σ₂₂ and computation of regression coefficients β = Σ₁₂Σ₂₂⁻¹ for E(X₁|X₂,X₃) = μ₁ + Σ₁₂Σ₂₂⁻¹(x₂-μ₂, x₃-μ₃)'
Part (b)(ii): Computation of conditional variance Var(X₁|X₂,X₃) = Σ₁₁ - Σ₁₂Σ₂₂⁻¹Σ₂₁ using Schur complement
Part (c): Definition of uniformity trial as trial with uniform treatment to assess field variability; explanation of how coefficient of variation and soil heterogeneity index guide selection of plot shape (long narrow for fertility gradient) and size (balancing variance reduction vs cost)
Part (d): Systematic identification of confounded effects by finding generalized interaction of defining contrasts; recognition that key block corresponds to I = ABCDEF or equivalent 6-factor interaction confounding
Part (e): Application of ratio estimator efficiency formula RE = (1-ρ²)/(Cₓ²/Cᵧ² + 1 - 2ρCₓ/Cᵧ) with Cₓ = Cᵧ yielding simplified computation; final numerical answer for efficiency

Open full rubric & evaluate →

50M derive ANOVA, sampling techniques, multivariate analysis

(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks) (b) Describe with examples the technique of two-stage sampling. Obtain the variance of the sample mean under two-stage sampling without replacement. Hence, deduce the variance of the sample mean under : (i) Stratified random sampling, and (ii) Cluster sampling (20 marks) (c) (i) If X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, where Y₁, Y₂ and Y₃ are uncorrelated random variables and each of which has zero mean and unit standard deviation, find the multiple correlation coefficient between X₃ and X₁, X₂. (ii) Let X be a 3-dimensional random vector with dispersion matrix Σ = ⎛9 3 3⎞ ⎜3 9 3⎟ ⎝3 3 9⎠. Determine the first principal component and the proportion of the total variability that it explains. (7+8=15 marks)

हिंदी में पढ़ें

(a) द्विधा वर्गीकृत आंकड़ों के एक समूह, जिसमें कारक A के k स्तर हैं और कारक B के r स्तर हैं, प्रत्येक कोष्ठक में एक प्रेक्षण है । दर्शाइए कि त्रुटि विपर्यासों की कुल संख्या (r – 1) (k – 1) है । (15 अंक) (b) द्वि-चरण प्रतिचयन तकनीक का उदाहरणों सहित वर्णन कीजिए । प्रतिस्थापन रहित द्वि-चरण प्रतिचयन के अंतर्गत प्रतिदर्श माध्य का प्रसरण प्राप्त कीजिए । इससे प्रतिदर्श माध्य का प्रसरण : (i) स्तरित यादृच्छिक प्रतिचयन, एवं (ii) गुच्छ प्रतिचयन के अंतर्गत निकालिए । (20 अंक) (c) (i) यदि X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, जहाँ Y₁, Y₂ और Y₃ असहसंबंधित यादृच्छिक चर हैं तथा इनमें से प्रत्येक का माध्य शून्य एवं मानक विचलन एक है, तो X₃ और X₁, X₂ के बीच बहुसंबंध गुणांक ज्ञात कीजिए । (ii) मान लीजिए कि X एक 3-विमीय यादृच्छिक सदिश है जिसका परिक्षेपण आव्यूह Σ = ⎛9 3 3⎞ ⎜3 9 3⎟ ⎝3 3 9⎠ है । प्रथम मुख्य घटक का एवं इसके द्वारा वर्णित किए गए संपूर्ण परिवर्तनशीलता के भाग का निर्धारण कीजिए । (7+8=15 अंक)

Answer approach & key points

Derive the required results systematically across all sub-parts. For (a), establish the linear model and count constraints; for (b), describe two-stage sampling with Indian census/NSSO examples, then derive variance formula and deduce special cases; for (c)(i), compute multiple correlation using matrix algebra; for (c)(ii), find eigenvalues and eigenvectors for PCA. Allocate approximately 30% time to (a), 40% to (b), 15% each to (c)(i) and (c)(ii), ensuring all derivations show complete steps with proper justification.

For (a): Define the two-way ANOVA model with one observation per cell, identify total contrasts (rk-1), subtract treatment contrasts (k-1 for factor A, r-1 for factor B), and show error contrasts = (r-1)(k-1) using degrees of freedom partition
For (b): Describe two-stage sampling with NSSO household survey or agricultural census example; derive variance of sample mean under SRSWOR at both stages; deduce stratified random sampling variance by letting second-stage sampling fraction tend to 1
For (b) continued: Deduce cluster sampling variance by letting first-stage sampling fraction tend to 1, showing how the general formula collapses to known special cases
For (c)(i): Compute Var(X₁), Var(X₂), Cov(X₁,X₂), Cov(X₃,X₁), Cov(X₃,X₂); set up multiple regression of X₃ on X₁,X₂; calculate R² and multiple correlation coefficient R₃.₁₂
For (c)(ii): Find eigenvalues of Σ (6, 6, 12), identify first principal component as (1/√3)(1,1,1)′ corresponding to λ=12, and compute proportion of variability as 12/24 = 0.5 or 50%

Open full rubric & evaluate →

50M analyse Design of experiments and multivariate analysis

(a) Consider the following data given for a BIBD with v = b = 4, r = k = 3, λ = 2 and N = 12 : Analyse the design. [Given that : F₃,₅ (0·05) = 5·41] 15 (b) (i) The data matrix of a random sample of size n = 3 from a bivariate normal population BVN (μ₁, μ₂, σ₁², σ₂², ρ) is X = [6 10; 10 6; 8 2]. Test the null hypothesis H₀ : μ = μ₀ against H₁ : μ ≠ μ₀, where μ₀' = (8, 5), at 10% level of significance. [You are given : F₀.₁₀; ₂, ₁ = 49·5, F₀.₁₀; ₁, ₂ = 8·53] (ii) Suppose n₁ = 11 and n₂ = 12, observations are made on two random vectors X₁ and X₂ which are assumed to have bivariate normal distribution with a common covariance matrix Σ, but possibly different mean vectors μ₁ and μ₂. The sample mean vectors and pooled covariance matrix are X̄₁ = (-1, -1)', X̄₂ = (2, 1)', S_pooled = (7 -1; -1 5). Obtain Mahalanobis sample distance D² and Fisher's linear discriminant function. Assign the observation X₀ = (0, 1)' to either population Π₁ or Π₂. 10+10=20 (c) A sample of size n is drawn with equal probability and without replacement from a population with size N. Let Ŷ_N = Σᵣ₌₁ⁿ aᵣ yᵣ be any linear estimate of the population mean Ȳ_N, where aᵣ are constants and yᵣ denotes the value of the unit included in the sample at the rᵗʰ draw. (i) Show that Ŷ_N is an unbiased estimate of Ȳ_N if and only if Σᵣ₌₁ⁿ aᵣ = 1 (ii) Under above condition V(Ŷ_N) = (S²/N)[NΣᵣ₌₁ⁿ aᵣ² - 1] (iii) If aᵣ = 1/n, for what value of n may this variance of the sample mean in simple random sampling without replacement be exactly half the variance of the mean of a random sample of the same size taken with replacement ? 15

हिंदी में पढ़ें

(a) किसी बी.आई.बी.डी. (BIBD), जहाँ v = b = 4, r = k = 3, λ = 2 और N = 12, के लिए दिए गए निम्नलिखित आँकड़ों पर विचार कीजिए : अभिकल्पना का विश्लेषण कीजिए । [दिया गया है : F₃,₅ (0·05) = 5·41] 15 (b) (i) एक द्विचर प्रसामान्य समष्टि BVN (μ₁, μ₂, σ₁², σ₂², ρ) से लिए गए आमाप n = 3 के एक यादृच्छिक प्रतिदर्श का न्यास मैट्रिक्स X = [6 10; 10 6; 8 2] है। वैकल्पिक परिकल्पना H₁ : μ ≠ μ₀ के विरुद्ध निराकरणीय परिकल्पना H₀ : μ = μ₀, का परीक्षण 10% सार्थकता-स्तर पर कीजिए, जहाँ μ₀' = (8, 5) है। [आपको दिया गया है : F₀.₁₀; ₂, ₁ = 49·5, F₀.₁₀; ₁, ₂ = 8·53] (ii) मान लीजिए कि दो यादृच्छिक सदिशों X₁ और X₂, जो एक समान सहप्रसरण आव्यूह Σ, किन्तु सम्भवतः भिन्न माध्य सदिशों μ₁ और μ₂ के साथ द्विचर प्रसामान्य बंटन का अनुसरण करते माने जाते हैं, पर n₁ = 11 और n₂ = 12 प्रेक्षण बनाए जाते हैं। प्रतिदर्श माध्य सदिश और संयुक्त सहप्रसरण आव्यूह हैं : X̄₁ = (-1, -1)', X̄₂ = (2, 1)', Sसंयुक्त = (7 -1; -1 5)। महालनोबिस प्रतिदर्श दूरी D² और फिशर के रैखिक विभिक्तकर फलन को प्राप्त कीजिए। प्रेक्षण X₀ = (0, 1)' को या तो समष्टि Π₁ या Π₂ को निर्दिष्ट कीजिए। 10+10=20 (c) N आकार की समष्टि से n आकार का एक प्रतिदर्श समान प्रायिकता एवं प्रतिस्थापन रहित के साथ चुना गया । मान लीजिए कि Ŷ_N = Σᵣ₌₁ⁿ aᵣ yᵣ समष्टि माध्य Ȳ_N का कोई रैखिक आकल है, जहाँ aᵣ अचर हैं और yᵣ rवें ढंग पर प्रतिदर्श में सम्मिलित इकाई का मान है । (i) दर्शाइए कि Ŷ_N, Ȳ_N का एक अनभिनत आकल है यदि और केवल यदि Σᵣ₌₁ⁿ aᵣ = 1 (ii) उपर्युक्त प्रतिबंध के अंतर्गत V(Ŷ_N) = (S²/N)[NΣᵣ₌₁ⁿ aᵣ² - 1] (iii) यदि aᵣ = 1/n, तो n के किस मान के लिए प्रतिस्थापन रहित सरल यादृच्छिक प्रतिचयन में प्रतिदर्शी माध्य का यह प्रसरण उसी आकार के प्रतिस्थापन सहित लिए गए यादृच्छिक प्रतिदर्श के माध्य के प्रसरण का बिल्कुल आधा होगा ? 15

Answer approach & key points

The directive 'analyse' demands systematic examination with computational rigour across all sub-parts. Allocate approximately 30% time to part (a) BIBD analysis, 40% to part (b) multivariate tests and discriminant analysis, and 30% to part (c) sampling theory proofs. Structure as: brief identification of appropriate statistical methods for each sub-part → step-by-step computational working with formulae stated → interpretation of results in context → final conclusions with statistical significance statements.

Part (a): Verify BIBD parameters satisfy λ(v-1) = r(k-1), construct ANOVA table with SST, SSB, SStr, SSE, compute F-ratio and compare with critical value 5.41 for treatment significance
Part (b)(i): Compute sample mean vector, sample covariance matrix S, Hotelling's T² statistic, convert to F-statistic using F = (n-p)/((n-1)p) × T² with p=2, compare with given critical value
Part (b)(ii): Calculate Mahalanobis D² = (X̄₁-X̄₂)'S_pooled⁻¹(X̄₁-X̄₂), derive Fisher's linear discriminant function Z = a'X where a = S_pooled⁻¹(X̄₁-X̄₂), compute discriminant scores and classify X₀
Part (c)(i): Prove unbiasedness by showing E(Ŷ_N) = Ȳ_N requires Σaᵣ = 1 using linearity of expectation and equal probability sampling properties
Part (c)(ii): Derive variance expression using V(yᵣ) = σ² and Cov(yᵣ, yₛ) = -σ²/(N-1) for r≠s, expand V(Σaᵣyᵣ) and simplify
Part (c)(iii): Set V(SRSWOR) = ½ V(SRSWR), i.e., (N-n)/(Nn) × S² = ½ × S²/n, solve to get n = N/2

Open full rubric & evaluate →

50M derive Regression, sampling and experimental design

(a) (i) What are orthogonal polynomials ? How do you fit an orthogonal polynomial of degree 'p' ? (ii) For the model Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, where X_(n×k) is a matrix of rank k (k < n), find out the value of E[Y'(I_n - X(X'X)⁻¹X')Y]. 10+10=20 (b) Consider an artificial population of three farms. Their selection probabilities and the wheat production (in '000 tons) are as follows : Farm unit (i) : 1 2 3; Selection probability (pᵢ) : 0·3 0·2 0·5; Wheat production (yᵢ) : 11 6 25. Draw all possible samples of size 2 with replacement (order is to be considered). Show that Horvitz-Thompson estimator of total wheat production is unbiased. 15 (c) What is a missing plot technique ? Derive the missing value formula for a Latin Square Design. How would you proceed to analyse such a design ? 15

हिंदी में पढ़ें

(a) (i) लांबिक बहुपद क्या हैं ? 'p' घातीय लांबिक बहुपद का आसंजन आप कैसे करेंगे ? (ii) निर्देश Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, जहाँ X_(n×k) (k < n) का एक आयुः है, के लिए E[Y'(I_n - X(X'X)⁻¹X')Y] का मान ज्ञात कीजिए । 10+10=20 (b) तीन फार्मों की एक कृत्रिम समष्टि पर विचार कीजिए । उनकी चयन प्रायिकताएँ और गेहूँ उत्पादन ('000 टन में) निम्न प्रकार हैं : फार्म इकाई (i) : 1 2 3; चयन प्रायिकता (pᵢ) : 0·3 0·2 0·5; गेहूँ उत्पादन (yᵢ) : 11 6 25। आकार 2 के सभी संभावित प्रतिदर्शों को प्रतिस्थापन सहित निकालिए (क्रम पर विचार किया जाना है) । दर्शाइए कि कुल गेहूँ उत्पादन का हॉर्विट्ज़-थॉम्पसन आकलक अनभिनत है । 15 (c) लुप्त खंड तकनीक क्या है ? किसी लैटिन वर्ग अभिकल्पना के लिए लुप्त मान सूत्र व्युत्पन्न कीजिए । ऐसी अभिकल्पना का विश्लेषण करने के लिए आप कैसे अग्रसर होंगे ? 15

Answer approach & key points

Begin with (a)(i) defining orthogonal polynomials with the orthogonality condition Σφᵢ(x)φⱼ(x)=0 for i≠j, then describe the recurrence relation method for fitting degree p. For (a)(ii), recognize the residual sum of squares form and apply E[u'Mu]=σ²tr(M) to obtain (n-k)σ². In (b), enumerate all 9 ordered samples with replacement, compute πᵢ=1-(1-pᵢ)² for inclusion probabilities, verify E[Ŷ_HT]=Y. For (c), derive the Latin Square missing value formula ŷ=(R+C+T-2G)/((t-1)(t-2)) and outline the adjusted ANOVA procedure. Allocate ~40% time to (a), ~30% each to (b) and (c).

(a)(i) Definition: orthogonal polynomials satisfy Σφᵢ(x)φⱼ(x)=0 for i≠j over the point set; fitting uses recurrence φᵣ₊₁(x)=(x-aᵣ)φᵣ(x)-bᵣφᵣ₋₁(x) with specific coefficient formulas
(a)(ii) Recognition that Iₙ-X(X'X)⁻¹X' is the residual maker matrix M; E[Y'MY]=E[u'Mu]=σ²tr(M)=σ²(n-k) using tr(Iₙ)=n and tr(X(X'X)⁻¹X')=k
(b) Enumeration of 9 ordered samples: (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3) with their probabilities; calculation of first-order inclusion probabilities πᵢ=1-(1-pᵢ)²; verification that Σ(yᵢ/πᵢ)·πᵢ/Σ1 = Y
(c) Missing plot technique: Yates' method for estimating missing observations by minimizing error sum of squares; derivation using ∂SSE/∂y=0 for Latin Square layout
(c) Latin Square missing value formula: ŷ = (tRᵢ + tCⱼ + tTₖ - 2G) / [(t-1)(t-2)] where R,C,T are respective totals and G is grand total; analysis proceeds with reduced degrees of freedom and bias correction in treatment SS

Open full rubric & evaluate →

Practice Statistics 2022 Paper I answer writing

Pick any question above, write your answer, and get a detailed AI evaluation against UPSC's standard rubric.

Start free evaluation →