Statistics

UPSC Statistics 2024 — Paper I

All 8 questions from UPSC Civil Services Mains Statistics 2024 Paper I (400 marks total). Every stem reproduced in full, with directive-word analysis, marks, word limits, and answer-approach pointers.

8Questions

400Total marks

2024Year

Paper IPaper

Topics covered

Probability theory and statistical inference (1)Joint distributions and limiting distributions (1)Statistical inference and estimation theory (1)Hypothesis testing and non-parametric methods (1)Linear regression, multivariate normal distribution, sampling design (1)Bivariate normal distribution, principal components, linear regression estimation (1)Stratified sampling and BIBD (1)Factorial design and ANOVA (1)

A

50M Compulsory solve Probability theory and statistical inference

(a) Two events A and B are such that P(A) = 1/3, P(B) = 1/4 and P(A|B) + P(B|A) = 2/3. Evaluate the following: (i) P(A^c ∪ B^c) (5 marks) (ii) P(A|B^c) + P(B|A^c) (5 marks) (b) Suppose the joint probability function of two random variables X and Y is f(x, y) = (xy^(x-1))/3; x = 1, 2, 3 and 0 < y < 1. Compute the following: (i) P(X ≥ 2 and Y ≥ 1/2) (5 marks) (ii) P(X ≥ 2) (5 marks) (c) Let X₁, X₂, ... is a sequence of independent and identically distributed random variables with mean (μ) and variance (σ²) < ∞, and assume Sₙ = X₁ + X₂ + ... + Xₙ. Show that WLLN does not hold for sequence ⟨Sₙ⟩ of random variables. (10 marks) (d) Write the criterion of a good estimator. Let X₁, X₂ be iid P(λ) random variables, then show that T = X₁ + X₂ is sufficient while T = X₁ + 2X₂ is not sufficient for estimating λ. (10 marks) (e) For testing H₀: μ = 100 vs. H₁: μ ≠ 100, a random sample of size 50 is drawn from a normal population with unknown mean μ and variance 200. If α = 0.05, then obtain the critical region. (10 marks)

हिंदी में पढ़ें

(a) दो घटनाएँ A और B इस प्रकार हैं कि P(A) = 1/3, P(B) = 1/4 और P(A|B) + P(B|A) = 2/3. निम्नलिखित के मान निकालिए : (i) P(A^c ∪ B^c) (5) (ii) P(A|B^c) + P(B|A^c) (5) (b) मान लीजिए कि दो यादृच्छिक चरों X और Y का संयुक्त प्रायिकता फलन f(x, y) = (xy^(x-1))/3; x = 1, 2, 3 और 0 < y < 1 है। निम्नलिखित का परिकलन कीजिए : (i) P(X ≥ 2 और Y ≥ 1/2) (5) (ii) P(X ≥ 2) (5) (c) मान लीजिए कि X₁, X₂, ... स्वतंत्र और सर्वसम बाँटित यादृच्छिक चरों का एक अनुक्रम है, जिसका माध्य (μ) और प्रसरण (σ²) < ∞ है, तथा मान लीजिए कि Sₙ = X₁ + X₂ + ... + Xₙ है। दर्शाइए कि यादृच्छिक चरों का अनुक्रम ⟨Sₙ⟩ दुर्बल बहुत संख्या नियम (WLLN) का पालन नहीं करता है। (10) (d) एक अच्छे आकलक का मापदंड लिखिए। माना कि X₁, X₂ स्वतंत्र और सर्वसम बाँटित (iid) P(λ) यादृच्छिक चर हैं, तब दर्शाइए कि λ के आकलन के लिए T = X₁ + X₂ पर्याप्त है, जबकि T = X₁ + 2X₂ पर्याप्त नहीं है। (10) (e) H₀: μ = 100 विरुद्ध H₁: μ ≠ 100 के परीक्षण के लिए अज्ञात माध्य μ और प्रसरण 200 वाली एक प्रसामान्य समष्टि से आमाप 50 का एक यादृच्छिक प्रतिदर्श लिया गया है। यदि α = 0.05 है, तो कांतिक क्षेत्र प्राप्त कीजिए। (10)

Answer approach & key points

Solve this multi-part numerical problem by allocating time proportionally to marks: approximately 20% for (a), 20% for (b), 20% for (c), 20% for (d), and 20% for (e). Begin each sub-part by stating the relevant formula or theorem, show complete step-by-step working, and conclude with clearly boxed final answers. For (c) and (d), include brief theoretical justification before numerical demonstration.

For (a): Use P(A|B) + P(B|A) = 2/3 to find P(A∩B) = 1/12, then apply De Morgan's laws for P(A^c ∪ B^c) = 1 - P(A∩B) = 11/12
For (a)(ii): Calculate conditional probabilities using P(A|B^c) = [P(A) - P(A∩B)]/P(B^c) and similar for P(B|A^c), yielding 1/9 + 1/8 = 17/72
For (b): Integrate joint PDF over appropriate regions; for (i) sum x=2,3 and integrate y from 1/2 to 1; for (ii) marginalize over y
For (c): Show WLLN fails by proving Var(S_n)/n² → σ² ≠ 0, so S_n/n does not converge in probability to μ (unlike sample mean)
For (d): State sufficiency criterion (Factorization theorem), show T=X₁+X₂ ∼ P(2λ) allows factorization, while T=X₁+2X₂ does not
For (e): Construct z-test with σ²=200, n=50; critical region |z| > 1.96 becomes |x̄ - 100| > 3.92 or x̄ < 96.08 or x̄ > 103.92

Open full rubric & evaluate →

50M calculate Joint distributions and limiting distributions

(a) Let the joint probability density function of two random variables X and Y be f(x, y) = x/3, for 0 < 2x < 3y < 6; 0, otherwise. Compute the following: (i) E(Y|X = x) (10 marks) (ii) E(var(Y|X = x)) (10 marks) (b) Find the distribution function of random variable X, for which the characteristic function is φ(t) = e^(-t²), -∞ < t < ∞. Also compute P(X > 2√2) in terms of Φ(z), where Φ(z) = ∫_{-∞}^{z} (1/√(2π)) e^(-θ²/2) dθ. (15 marks) (c) Let X₁, X₂, ..., X₂ₙ be iid N(0, 1) variates. Find the limiting distribution of [(X₁/X₂) + (X₃/X₄) + ... + (X₂ₙ₋₁/X₂ₙ)] / [X₁² + X₂² + ... + Xₙ²]. (15 marks)

हिंदी में पढ़ें

(a) मान लीजिए कि दो यादृच्छिक चरों X तथा Y का संयुक्त प्रायिकता घनत्व फलन f(x, y) = x/3, 0 < 2x < 3y < 6; 0, अन्यथा है। निम्नलिखित की गणना कीजिए : (i) E(Y|X = x) (10) (ii) E(var(Y|X = x)) (10) (b) यादृच्छिक चर X का बंटन फलन प्राप्त कीजिए जिसका अभिलक्षण फलन φ(t) = e^(-t²), -∞ < t < ∞ है। इसके अलावा Φ(z) के संदर्भ में P(X > 2√2) की गणना भी कीजिए, जहाँ Φ(z) = ∫_{-∞}^{z} (1/√(2π)) e^(-θ²/2) dθ. (15) (c) मान लीजिए कि X₁, X₂, ..., X₂ₙ स्वतंत्र और सर्वसम बंति (iid) N(0, 1) विचर हैं। [(X₁/X₂) + (X₃/X₄) + ... + (X₂ₙ₋₁/X₂ₙ)] / [X₁² + X₂² + ... + Xₙ²] का सीमान्त बंटन ज्ञात कीजिए। (15)

Answer approach & key points

Calculate the required quantities systematically across all parts: spend approximately 40% time on part (a) covering conditional expectation and conditional variance with proper region identification; 30% on part (b) for characteristic function inversion and normal probability computation; and 30% on part (c) for establishing the limiting distribution using Slutsky's theorem and properties of ratio distributions. Begin with clear region sketches for (a), apply Fourier inversion for (b), and justify convergence arguments for (c).

For (a)(i): Correctly identify the region 0 < 2x < 3y < 6, derive marginal f_X(x) = x(2-x)/3 for 0 < x < 3, and obtain conditional pdf f_{Y|X}(y|x) leading to E(Y|X=x) = (2+x)/3
For (a)(ii): Compute Var(Y|X=x) = (2-x)²/36 and then calculate E[Var(Y|X)] by integrating against f_X(x) to obtain 1/30
For (b): Recognize φ(t) = e^{-t²} corresponds to N(0, 2) via inversion formula or matching with standard normal CF, then express P(X > 2√2) = 1 - Φ(2) using variance σ² = 2
For (c): Identify that numerator S_n = Σ(X_{2k-1}/X_{2k}) has E(S_n) undefined but apply Cauchy distribution properties, while denominator T_n = ΣX_i² ~ χ²_n, then use Slutsky/continuous mapping to show limiting distribution is standard Cauchy
Proper handling of ratio of independent quantities in (c): Show numerator terms are iid Cauchy(0,1) and denominator is χ²_n, establishing that the ratio converges to Cauchy(0,1) or requires normalization clarification

Open full rubric & evaluate →

50M solve Statistical inference and estimation theory

(a) Let moment generating function of random variable X exist in the neighbourhood of zero and if $$E(X^n) = \frac{1}{5} + (-1)^n \frac{2}{5} + \frac{2^{n+1}}{5}; \quad n = 1, 2, 3, \cdots$$ then find the values of the following: (i) $P(|X - 0.75| \leq 1.5)$ (10 marks) (ii) $P(|X - \mu| < \sigma)$; $\mu = E(X)$ and $\sigma^2 = \text{var}(X)$ (10 marks) [Use $\sqrt{1.84} = 1.36$] (b) (i) Write the importance of Cramer-Rao inequality and Rao-Blackwell theorem. (5 marks) (ii) Let $X \sim B(1, \theta)$, then find the uniformly minimum variance unbiased estimator (UMVUE) of $\theta(1-\theta)$. (10 marks) (c) Obtain the maximum likelihood estimates of $\alpha$ and $\beta$ for a random sample from the exponential population $$f(x; \alpha, \beta) = Ce^{-\beta(x-\alpha)}, \alpha \leq x < \infty, \beta > 0$$ (15 marks)

हिंदी में पढ़ें

(a) मान लीजिए कि, शून्य के सामीप्य में, यादृच्छिक चर $X$ के आघूर्ण जनक फलन का अस्तित्व है और यदि $$E(X^n) = \frac{1}{5} + (-1)^n \frac{2}{5} + \frac{2^{n+1}}{5}; \quad n = 1, 2, 3, \cdots$$ है, तो निम्नलिखित के मान ज्ञात कीजिए : (i) $P(|X - 0.75| \leq 1.5)$ (10 अंक) (ii) $P(|X - \mu| < \sigma)$; $\mu = E(X)$ और $\sigma^2 = \text{var}(X)$ (10 अंक) [प्रयोग कीजिए $\sqrt{1.84} = 1.36$] (b) (i) क्रैमर-राव असमिका एवं राव-ब्लैकवेल प्रमेय के महत्व लिखिए। (5 अंक) (ii) मान लीजिए कि $X \sim B(1, \theta)$, तब $\theta(1-\theta)$ का एकसमान न्यूनतम प्रसरण अनभिनत आकलक (UMVUE) निकालिए। (10 अंक) (c) चरघातांकी समिष्टि $$f(x; \alpha, \beta) = Ce^{-\beta(x-\alpha)}, \alpha \leq x < \infty, \beta > 0$$ से लिए गए एक यादृच्छिक प्रतिदर्श के लिए $\alpha$ और $\beta$ के अधिकतम संभाविता आकलक ज्ञात कीजिए। (15 अंक)

Answer approach & key points

Solve this multi-part numerical problem by first identifying the probability distribution from the given moment pattern in part (a), then applying appropriate estimation theory for parts (b) and (c). Allocate approximately 35% time to part (a) (20 marks), 25% to part (b) (15 marks), and 40% to part (c) (15 marks) based on computational complexity. Structure as: distribution identification → probability calculations → theoretical exposition → UMVUE derivation → MLE derivation with likelihood analysis.

For (a): Identify X as a discrete mixture distribution with P(X=-1)=2/5, P(X=0)=1/5, P(X=2)=2/5 by comparing E(X^n) with MGF expansion or direct pattern recognition from the given moment formula
For (a)(i): Calculate P(|X-0.75|≤1.5) = P(-0.75≤X≤2.25) by enumerating which mixture components satisfy the inequality, yielding P(X=-1)+P(X=0)+P(X=2)=1 or appropriate subset
For (a)(ii): Compute μ=E(X)=0.6 and σ²=Var(X)=1.84, then find P(|X-0.6|<1.36) using the identified distribution support points
For (b)(i): Explain Cramer-Rao inequality provides variance lower bound for unbiased estimators enabling efficiency comparison; Rao-Blackwell theorem enables improvement of unbiased estimators via conditioning on sufficient statistics
For (b)(ii): Derive UMVUE of θ(1-θ) using Lehmann-Scheffé theorem: identify T=ΣX_i as complete sufficient statistic, find unbiased estimator based on sample variance or direct calculation, condition to obtain final form
For (c): Obtain MLEs by writing likelihood L(α,β)=C^n exp[-βΣ(x_i-α)] with constraint α≤x_(1), show likelihood increases with α so α̂=X_(1), then maximize with respect to β to get β̂=n/[Σ(X_i-X_(1))]

Open full rubric & evaluate →

50M solve Hypothesis testing and non-parametric methods

(a) Find the most powerful test of size α(= 0·05) for testing H₀: μ = 0 vs. H₁: μ = 1, given a random sample of size 25 from N(μ, 16) population. (20 marks) (b) A lot consists of some defective items. A random sample of 25 items has 6 defective items with probability p₁ = θ and 19 non-defective items with probability p₂ = 1 – θ. Then estimate θ using the following: (i) MLE method (ii) Minimum χ²-method (iii) Modified minimum χ²-method (15 marks) (c) Differentiate between Mann-Whitney U-test and Wilcoxon sign test. The following data pertain to APGAR scores of 15 pregnant women in two care programmes A and B: Programme A : 8 7 6 2 5 8 7 3 Programme B : 9 9 7 8 10 9 6 Is there a significant difference in APGAR scores of pregnant women under the two care programmes? [Given, U₍₀.₀₅₎ = 10] (15 marks)

हिंदी में पढ़ें

(a) H₀: μ = 0 विरुद्ध H₁: μ = 1 के परीक्षण के लिए, α(= 0·05) आमाप का शक्तम परीक्षण प्राप्त कीजिए, जबकि 25 आमाप का एक यादृच्छिक प्रतिदर्श N(μ, 16) समष्टि से लिया गया है। (20 अंक) (b) एक प्रचय में कुछ दोषपूर्ण वस्तुएँ हैं। 25 वस्तुओं के एक यादृच्छिक प्रतिदर्श में 6 दोषपूर्ण वस्तुएँ हैं, जिसकी प्रायिकता p₁ = θ है और 19 दोष रहित वस्तुएँ हैं, जिसकी प्रायिकता p₂ = 1 – θ है। तब निम्न का उपयोग करके θ का आकलन कीजिए : (i) MLE विधि (ii) न्यूनतम χ²-विधि (iii) आपरिवर्तित न्यूनतम χ²-विधि (15 अंक) (c) मैन-हिटनी U-परीक्षण और विल्कॉक्सन चिह्न परीक्षण के बीच अंतर कीजिए। निम्नलिखित आँकड़े दो देखभाल कार्यक्रमों A और B में 15 गर्भवती महिलाओं के APGAR स्कोरों से सम्बन्धित हैं : कार्यक्रम A : 8 7 6 2 5 8 7 3 कार्यक्रम B : 9 9 7 8 10 9 6 क्या दोनों देखभाल कार्यक्रमों के अन्तर्गत गर्भवती महिलाओं के APGAR स्कोरों में सार्थक अंतर है? [दिया गया है, U₍₀.₀₅₎ = 10] (15 अंक)

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 40% time to part (a) given its 20 marks, 30% to part (b) covering three estimation methods, and 30% to part (c) involving both differentiation and non-parametric testing. Structure as: (a) derive Neyman-Pearson lemma application with critical region, (b) present three estimation approaches with clear derivations, (c) tabulate differences then perform Mann-Whitney U-test with ranking and decision.

Part (a): Apply Neyman-Pearson lemma to derive most powerful test; identify critical region as sample mean > k; compute critical value k = 0.4 + 1.645×(4/5) = 1.716 using Z-test; state rejection rule and power function
Part (b)(i): Derive MLE as θ̂ = 6/25 = 0.24 using binomial likelihood L(θ) = θ⁶(1-θ)¹⁹ and log-likelihood differentiation
Part (b)(ii): Set up minimum χ² by minimizing Σ(Oi-Ei)²/Ei; show equivalence to MLE in this case or derive modified form with expected frequencies 25θ and 25(1-θ)
Part (b)(iii): Apply modified minimum χ² using weights in denominator; demonstrate Neyman modification or minimum logit χ² approach if applicable
Part (c): Differentiate Mann-Whitney (two independent samples, ordinal/rank data) vs Wilcoxon signed-rank (paired/matched samples); correctly identify independent samples scenario here
Part (c) computation: Rank pooled data (1-15), compute sum of ranks for smaller sample (Programme B, n=7), calculate U = 56 - 28 = 28 or U' = 49 - 28 = 21; compare with critical value U₀.₀₅ = 10; conclude no significant difference since U > 10

Open full rubric & evaluate →

B

50M Compulsory derive Linear regression, multivariate normal distribution, sampling design

(a) How will you justify the usage of the principle of least squares in estimating the parameters of a linear regression model? With usual notations, for the regression model y = Xβ + ε, show that the least square estimator of β is β̂ = (X'X)⁻¹X'y (10 marks) (b) (i) If X̃ is distributed as N₃(μ̃, Σ), find the distribution of [X₁ - X₂; X₂ - X₃]. (5 marks) (ii) If X₁, X₂ and X₃ are three variables, obtain the expression for the partial correlation coefficient between X₁ and X₂ eliminating the effect of X₃, ρ₁₂·₃, in terms of simple correlation coefficients. (5 marks) (c) X₁ and X₂ are independent data sets of order (n₁ × p) and (n₂ × p) respectively from Nₚ(μ̃, Σ). Show that (n₁n₂D²)/n is distributed as T²(p, n-2), where n = n₁ + n₂, and T² and D² represent the Hotelling's T² and Mahalanobis D² respectively. (10 marks) (d) For the population U = {a, b, c, d, e}, consider the following sampling design: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6. Calculate the first-order and second-order inclusion probabilities. Hence show that it is a matter of a stratified design. Identify the strata with their units. (10 marks) (e) Let the incidence matrix of a design be N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]]. Show that— (i) the design is connected balanced; (ii) its efficiency factor is E = 8/9. (6+4=10 marks)

हिंदी में पढ़ें

(a) आप एक रैखिक समाश्रयन निदर्श के प्राचलों को आकलित करने में न्यूनतम वर्गों के सिद्धान्त के उपयोग को कैसे उचित ठहरायेंगे? सामान्य संकेतनों के साथ दर्शाइए कि समाश्रयन निदर्श y = Xβ + ε के लिए β का न्यूनतम वर्ग आकलक β̂ = (X'X)⁻¹X'y है। (10 अंक) (b) (i) यदि X̃, N₃(μ̃, Σ) के रूप में बंटित है, तो [X₁ - X₂; X₂ - X₃] का बंटन प्राप्त कीजिए। (5 अंक) (ii) यदि X₁, X₂ और X₃ तीन चर हैं, तो X₃ के प्रभाव को समाप्त करते हुए X₁ और X₂ के बीच आंशिक सहसंबंध गुणांक, ρ₁₂·₃, के लिए व्यंजक, सरल सहसंबंध गुणांकों के रूप में, प्राप्त कीजिए। (5 अंक) (c) X₁ और X₂, Nₚ(μ̃, Σ) से क्रमशः (n₁ × p) और (n₂ × p) कोटि के स्वतंत्र आंकड़ों के समुच्चय हैं। दर्शाइए कि (n₁n₂D²)/n का बंटन T²(p, n-2) के रूप में हुआ है, जहाँ n = n₁ + n₂ है और T² तथा D² क्रमशः हॉटेलिंग T² तथा महालनोबिस D² को निरूपित करते हैं। (10 अंक) (d) समष्टि U = {a, b, c, d, e} के लिए निम्नलिखित प्रतिचयन अभिकल्पना पर विचार कीजिए: P({a, b, d}) = 1/6, P({a, b, e}) = 1/6, P({a, d, e}) = 1/6, P({b, c, d}) = 1/6, P({b, c, e}) = 1/6, P({c, d, e}) = 1/6। प्रथम कोटि तथा द्वितीय कोटि की अंतर्वेश प्रायिकताओं की गणना कीजिए। अतः दर्शाइए कि यह एक स्तरित अभिकल्पना का मामला है। स्तरों को उनकी इकाइयों के साथ चिह्नित कीजिए। (10 अंक) (e) मान लीजिए कि एक अभिकल्पना का आपतन आव्यूह N = [[1, 1, 1, 0], [1, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]] है। दर्शाइए कि— (i) अभिकल्पना संबद्ध संतुलित है; (ii) इसकी दक्षता कारक E = 8/9 है। (6+4=10 अंक)

Answer approach & key points

Derive requires rigorous mathematical proof and step-by-step derivation across all sub-parts. Allocate approximately 20% time to part (a) on least squares justification and derivation, 20% to part (b) on multivariate normal transformations and partial correlation, 20% to part (c) on Hotelling's T² distribution, 20% to part (d) on inclusion probabilities and stratified design identification, and 20% to part (e) on connectedness, balance and efficiency factor. Begin with clear statement of assumptions, proceed through systematic derivations with matrix algebra where needed, and conclude with explicit verification of claimed properties.

Part (a): Justify least squares via Gauss-Markov theorem (BLUE property under Gauss-Markov assumptions) or via maximum likelihood under normality; derive β̂ = (X'X)⁻¹X'y by minimizing S(β) = ε'ε = (y-Xβ)'(y-Xβ) using matrix differentiation
Part (b)(i): Apply linear transformation property of multivariate normal; define transformation matrix A = [[1, -1, 0], [0, 1, -1]]; derive distribution as N₂(Aμ̃, AΣA') with explicit mean and covariance structure
Part (b)(ii): Derive ρ₁₂·₃ = (ρ₁₂ - ρ₁₃ρ₂₃)/√[(1-ρ₁₃²)(1-ρ₂₃²)] using residual correlation formula or partial covariance matrix inversion
Part (c): Define D² = (x̄₁ - x̄₂)'S⁻¹(x̄₁ - x̄₂); use independence of sample means and pooled covariance; apply Wishart and Hotelling's T² construction to show (n₁n₂/n)D² ~ T²(p, n-2)
Part (d): Calculate πᵢ = Σ_{s∋i} P(s) for first-order inclusion probabilities; calculate πᵢⱼ = Σ_{s∋i,j} P(s) for second-order; verify πᵢⱼ = πᵢπⱼ/πₕ for stratified structure; identify strata as {a,b}, {c}, {d,e} or equivalent based on inclusion pattern analysis
Part (e)(i): Verify connectedness via incidence matrix rank or graph connectivity; verify balance via constant λ = Σⱼ nᵢⱼnᵢ'ⱼ for all i ≠ i' pairs
Part (e)(ii): Calculate efficiency factor E = (v-1)/[r(k-1)] × (harmonic mean of eigenvalues) or via C-matrix eigenvalues; show E = 8/9 explicitly

Open full rubric & evaluate →

50M prove Bivariate normal distribution, principal components, linear regression estimation

(a) (X, Y) has bivariate normal distribution BN(μ₁, μ₂, σ₁², σ₂², ρ). (i) Show that X and Y are independent if and only if ρ = 0. (6 marks) (ii) If (X, Y) follows BN(3, 1, 16, 25, 3/5), obtain P(3 < Y < 8 | X = 7), given Φ(2) = 0.9772 and Φ(-0.25) = 0.4017, and Φ(x) represents the area under the standard normal curve from -∞ to x. (6 marks) (iii) If (X, Y) follows BN(0, 0, 1, 1, 0), what will be the distribution of Z = Y/X? (4 marks) (iv) State the multivariate extension of (i) when X̃ follows Nₚ(μ̃, Σ). (4 marks) (b) Define principal components and canonical correlation. How can one attain data reduction using principal components? If (X₁, X₂) has covariance matrix Σ = [[1, ρ], [ρ, 1]], then find the principal components. (15 marks) (c) For the simple linear regression model y = β₀ + β₁x + ε, where β₀ and β₁ are parameters and ε has zero mean and an unknown variance σ², find the estimates of β₀ and β₁ by the principle of least squares as well as the method of maximum likelihood. Examine whether they are identical. (15 marks)

हिंदी में पढ़ें

(a) (X, Y) का द्विवर प्रसामान्य बंटन BN(μ₁, μ₂, σ₁², σ₂², ρ) है। (i) दर्शाइए कि X और Y स्वतंत्र हैं, यदि और केवल यदि ρ = 0 है। (6 अंक) (ii) यदि (X, Y) का बंटन BN(3, 1, 16, 25, 3/5) है, तो P(3 < Y < 8 | X = 7) निकालिए, दिया है Φ(2) = 0.9772 और Φ(-0.25) = 0.4017 तथा Φ(x), -∞ से x तक का मानक प्रसामान्य वक्र के अंतर्गत क्षेत्रफल दर्शाता है। (6 अंक) (iii) यदि (X, Y) का बंटन BN(0, 0, 1, 1, 0) है, तो Z = Y/X का बंटन क्या होगा? (4 अंक) (iv) जब X̃, Nₚ(μ̃, Σ) का अनुसरण करता है, तो (i) का बहुचर विस्तरण लिखिए। (4 अंक) (b) मुख्य घटकों और विहित सहसंबंध को परिभाषित कीजिए। मुख्य घटकों का उपयोग करके कोई दत्त समान्यन कैसे प्राप्त कर सकता है? यदि (X₁, X₂) का सहप्रसरण आव्यूह Σ = [[1, ρ], [ρ, 1]] है, तो मुख्य घटकों को ज्ञात कीजिए। (15 अंक) (c) एक साधारण रैखिक समाश्रयन निदर्श y = β₀ + β₁x + ε के लिए, जहाँ β₀ और β₁ प्राचल हैं तथा ε का माध्य 0 और प्रसरण σ² अज्ञात है, न्यूनतम वर्ग सिद्धांत और अधिकतम संभाविता विधि से β₀ और β₁ के आकलक निकालिए। जाँच कीजिए कि क्या वे एकसमान हैं। (15 अंक)

Answer approach & key points

Prove the independence condition in (a)(i) using factorization of joint density; for (a)(ii)-(iv), calculate conditional distributions and identify the Cauchy distribution; for (b), define concepts then derive eigenvalues/eigenvectors for PC extraction; for (c), derive both estimators and compare. Allocate ~40% time to part (a) [20 marks], ~30% each to (b) and (c) [15 marks each], with explicit theorem statements and step-by-step derivations throughout.

(a)(i) Prove ρ=0 ⇔ independence by showing joint density factorizes into marginal densities, using the bivariate normal PDF structure
(a)(ii) Compute conditional distribution Y|X=7 ~ N(μ₂ + ρ(σ₂/σ₁)(x-μ₁), σ₂²(1-ρ²)), then standardize and use Φ values
(a)(iii) Identify Z=Y/X as ratio of independent N(0,1) variables, hence standard Cauchy distribution
(a)(iv) State multivariate extension: X̃ ~ Nₚ(μ̃, Σ) has independent components iff Σ is diagonal
(b) Define PCs as uncorrelated linear combinations maximizing variance; define canonical correlation as correlation between linear combinations of two variable sets; data reduction by retaining top k PCs; derive eigenvalues (1±ρ) and eigenvectors for given Σ
(c) Derive LSE by minimizing Σ(yᵢ-β₀-β₁xᵢ)²; derive MLE using normal error assumption; show identical estimators but different variance estimators
Compare LSE (distribution-free) vs MLE (requires normality) and note σ²_MLE = SSE/n vs σ²_LSE = SSE/(n-2)

Open full rubric & evaluate →

50M prove Stratified sampling and BIBD

(a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata under Neyman allocation are n'_1 and n'_2, and under other type of allocation are n_1 and n_2. Define r = n'_1/n'_2 and μ = n_1/(rn_2). Then prove that the efficiency of stratified random sampling with respect to stratified random sampling under Neyman allocation is given by e = μ(r+1)²/((μr + 1)(μ + r)). (20 marks) (b) A bank has 40000 clients in its computer files, divided into 4000 branches, each managing exactly 10 clients. To estimate the proportion of clients for whom the bank has granted loan, a simple random sample of 40 branches is selected. From the selected sample, for each branch i, a list of clients (A_i) having a loan is prepared; i = 1, 2, ..., 40. The data observed from the selected sample are Σ(i=1 to 40) A_i = 200 and Σ(i=1 to 40) A_i² = 1156. (i) What type of sampling is this? (3 marks) (ii) State the expression of the parameter to estimate and obtain its unbiased estimate. (6 marks) (iii) Estimate the variance of the unbiased estimator obtained in part (ii). (6 marks) (c) (i) Verify whether the following BIBD are possible: (1) v = b = 22, r = k = 7, λ = 2; (2) v = 10, b = 18, r = 9, k = 5, λ = 4. Given that the design is resolvable. (ii) Given below is the incidence matrix (N) of a block design. Find the degrees of freedom associated with the adjusted treatment sum of squares and the degrees of freedom for the error sum of squares.

हिंदी में पढ़ें

(a) एक बहुत बड़ी समष्टि को दो स्तरों में विभाजित किया गया है। नेमन नियतन के अनुसार, दो स्तरों के लिए, आमाप n के स्तरीत यादृच्छिक प्रतिदर्श की इकाइयों के नियतन n'_1 और n'_2 हैं और दूसरे प्रकार की नियतन विधि के अनुसार n_1 तथा n_2 हैं। r = n'_1/n'_2 तथा μ = n_1/(rn_2) को परिभाषित कीजिए। तब सिद्ध कीजिए कि नेमन नियतन के अंतर्गत स्तरीत यादृच्छिक प्रतिचयन के सापेक्ष स्तरीत यादृच्छिक प्रतिचयन की दक्षता e = μ(r+1)²/((μr + 1)(μ + r)) है। (20 अंक) (b) एक बैंक में इसके कम्प्यूटर की फाइलों में 40000 ग्राहक हैं, जिनको 4000 शाखाओं में बाँटा गया है, प्रत्येक शाखा ठीक 10 ग्राहकों का प्रबंध करती है। जिन ग्राहकों को बैंक का ऋण दिया गया है, उनके अनुपात का आकलन करने के लिए 40 शाखाओं का एक सरल यादृच्छिक प्रतिदर्श चुना गया है। चुने गये प्रतिदर्श में से, प्रत्येक शाखा i के लिए ग्राहकों, जिन्होंने ऋण लिया है, उनकी एक सूची (A_i) तैयार की गई है; i = 1, 2, ..., 40 है। चयनित प्रतिदर्श से प्रेक्षित आँकड़े Σ(i=1 से 40) A_i = 200 और Σ(i=1 से 40) A_i² = 1156 प्राप्त हुए हैं। (i) यह किस प्रकार का प्रतिचयन है? (3 अंक) (ii) प्राचल जिसका आकलन करना है, उसके लिए व्यंजक (एक्सप्रेशन) लिखिए और उसका अनभिनत आकलक ज्ञात कीजिए। (6 अंक) (iii) भाग (ii) में प्राप्त अनभिनत आकलक के विचरण का आकलन कीजिए। (6 अंक) (c) (i) सत्यापित कीजिए कि क्या नीचे दिये गये BIBD संभव हैं: (1) v = b = 22, r = k = 7, λ = 2; (2) v = 10, b = 18, r = 9, k = 5, λ = 4। दिया गया है कि अभिकल्पना विभोज्य है। (ii) नीचे एक खंडक अभिकल्पना का आपतन आव्यूह (N) दिया गया है। समायोजित उपचार वर्गों के योग से संबद्ध स्वातंत्र्य-कोटियाँ और त्रुटि वर्गों के योग के लिए स्वातंत्र्य-कोटियाँ ज्ञात कीजिए।

Answer approach & key points

This question demands rigorous mathematical derivation and proof for part (a), followed by applied numerical analysis for parts (b) and (c). Spend approximately 35% of time on part (a) given its 20 marks and proof complexity; allocate 25% to part (b) covering cluster sampling identification, unbiased estimation and variance calculation; and 40% to part (c) on BIBD verification and degrees of freedom computation. Structure as: (a) state assumptions and derive efficiency ratio step-by-step; (b) identify two-stage/cluster sampling, construct appropriate estimators using given sums; (c) verify necessary conditions for BIBD existence and compute rank of C-matrix for degrees of freedom.

Part (a): Define stratum variances S₁², S₂² and sample sizes under Neyman allocation n'₁ = nS₁/(S₁+S₂), n'₂ = nS₂/(S₁+S₂), then express r = S₁/S₂ and derive Var(ȳ_st) under both allocations to obtain the efficiency formula
Part (b)(i): Identify this as two-stage sampling (or cluster sampling) where branches are primary units and clients are secondary units, with 4000 first-stage units and 10 second-stage units per cluster
Part (b)(ii): Parameter is population proportion P = ΣA_i/(MN) where M=10, N=4000; unbiased estimator is p̂ = (ΣA_i)/(mM) = 200/(40×10) = 0.5 where m=40
Part (b)(iii): Variance estimator requires between-cluster mean square s_b² = [ΣA_i² - (ΣA_i)²/m]/(m-1) = [1156 - 1000]/39 = 4, then v(p̂) = (N-n)s_b²/(NnM²) with finite population correction
Part (c)(i): Verify BIBD conditions: vr = bk, λ(v-1) = r(k-1), and for resolvable designs b ≥ v + r - 1; Design (1) fails as 22×7 ≠ 22×7 check shows λ(v-1)=42 ≠ r(k-1)=42 actually holds but resolvability requires b≥v+r-1=28 which fails; Design (2) verify 10×9=18×5=90, λ(v-1)=36=r(k-1)=36, and resolvability check
Part (c)(ii): For given incidence matrix N, compute C = rI_v - Nk⁻¹N' or treatment information matrix, find rank(C) = v-1 for connected design giving adjusted treatment SS df = v-1, error df = n-v-b+1 or appropriate based on design parameters

Open full rubric & evaluate →

50M solve Factorial design and ANOVA

(a) A 2²-factorial design was used to develop the yield of a crop. Two factors A and B were used at two levels: low (–1) and high (+1). The experiment was replicated two times with completely randomized way. The data obtained are as follows: | Factor A | Factor B | Estimated Average Effect | | – | – | | | + | – | 8 | | – | + | –5 | | + | + | 2 | The sum of squares of all the yields = 510.5 The grand total of all the yields = 50.00 (i) Analyze the data and identify the significant factors. (12 marks) (ii) Develop the regression model and predict the yield when A and B both are at low level (–1). (8 marks) [Given, F₍₁, ₄, ₀.₀₅₎ = 7.71] (b) To estimate the population mean Ȳ of a characteristic Y, a simple random sample of size 1000 was selected from a population of size 1000000 by without replacement. The population mean of an auxiliary character X is X̄ = 15. The other results are given below: s²ᵧ = 20, s²ₓ = 25, sₓᵧ = 15, x̄ = 14, ȳ = 10. (i) Estimate Ȳ using difference, ratio and regression estimators. (6 marks) (ii) Estimate the MSE of these estimators. Which estimator should we choose to estimate Ȳ? (9 marks) (c) Write down the model used in the analysis of a two-way classification with interactions, stating the assumptions. What are the hypotheses tested in this scenario? Obtain the expression for the sum of squares and complete the ANOVA. (15 marks)

हिंदी में पढ़ें

(a) एक फसल की उपज विकसित करने के लिए एक ²-बहु-उपदानी अभिकल्पना का उपयोग किया गया है। दो घटकों A और B का उपयोग दो स्तरों, निम्न (−1) और उच्च (+1), पर किया गया है। प्रयोग को पूर्णतः यादृच्छीकृत तरीके से दो बार पुनरावृत्ति किया गया है। प्राप्त आँकड़े इस प्रकार हैं: | घटक A | घटक B | आकलित औसत प्रभाव | | − | − | | | + | − | 8 | | − | + | −5 | | + | + | 2 | सभी उपजों के वर्गों का योग = 510.5 सभी उपजों का कुल योग = 50.00 (i) आँकड़ों का विश्लेषण कीजिए और महत्त्वपूर्ण घटकों की पहचान कीजिए। (12 अंक) (ii) समाश्रयन निदर्श विकसित कीजिए और जब A तथा B दोनों निम्न स्तर (−1) पर हों, तब उपज का पूर्वानुमान कीजिए। (8 अंक) [दिया गया है, F₍₁, ₄, ₀.₀₅₎ = 7.71] (b) एक अभिलक्षण Y के समष्टि माध्य Ȳ का आकलन करने के लिए, 1000 आमाप का एक सरल यादृच्छिक प्रतिदर्श 1000000 आमाप की समष्टि में से प्रतिस्थापन रहित चुना गया है। सहायक अभिलक्षण X का समष्टि माध्य X̄ = 15 है। अन्य परिणाम नीचे दिये गये हैं: s²ᵧ = 20, s²ₓ = 25, sₓᵧ = 15, x̄ = 14, ȳ = 10। (i) अंतर, अनुपात और समाश्रयण आकलकों का उपयोग करते हुए Ȳ का आकलन कीजिए। (6 अंक) (ii) इन आकलकों की MSE का आकलन कीजिए। Ȳ का आकलन करने के लिए हमें कौन-सा आकलक चुनना चाहिए? (9 अंक) (c) मान्यताओं का उल्लेख करते हुए अन्योन्यक्रियाओं सहित द्विविधा वर्गीकरण के विश्लेषण में उपयोग किये गये निदर्श को लिखिए। इसके संदर्भ में किन परिकल्पनाओं का परीक्षण किया जाता है? वर्गों के योग का व्यंजक प्राप्त कीजिए और ANOVA को पूर्ण कीजिए। (15 अंक)

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 35% time to part (a) [20 marks], 25% to part (b) [15 marks], and 40% to part (c) [25 marks]. Begin with clear model specification and ANOVA table construction for the 2² factorial in (a), followed by systematic calculation of difference, ratio and regression estimators in (b), and complete theoretical derivation of two-way ANOVA with interaction in (c). Present all computational steps in tabular format with explicit F-test conclusions and MSE comparisons.

For (a)(i): Calculate main effects A and B, interaction effect AB, construct complete ANOVA table with 3 d.f. for treatments and 4 d.f. for error, compare F-calculated with F-critical=7.71 to identify significant factors
For (a)(ii): Develop regression equation Y = β₀ + β₁A + β₂B + β₁₂AB with coded variables, substitute A=-1, B=-1 to predict yield at low-low combination
For (b)(i): Compute difference estimator ȳ_D = ȳ + (X̄ - x̄), ratio estimator ȳ_R = ȳ(X̄/x̄), and regression estimator ȳ_lr = ȳ + b(X̄ - x̄) where b = s_xy/s_x²
For (b)(ii): Calculate MSE for each estimator using appropriate formulas (MSE(ȳ_D), approximate MSE for ratio, and MSE(ȳ_lr) = (1-f)(s_y²(1-ρ²))/n), select estimator with minimum MSE
For (c): State model y_ijk = μ + α_i + β_j + (αβ)_ij + ε_ijk with assumptions (normality, independence, homoscedasticity, Σα_i=Σβ_j=Σ(αβ)_ij=0), hypotheses H₀: all α_i=0, all β_j=0, all (αβ)_ij=0, derive SSA, SSB, SSAB, SSE with degrees of freedom and complete ANOVA table

Open full rubric & evaluate →

Practice Statistics 2024 Paper I answer writing

Pick any question above, write your answer, and get a detailed AI evaluation against UPSC's standard rubric.

Start free evaluation →