Statistics

UPSC Statistics 2025 — Paper I

All 8 questions from UPSC Civil Services Mains Statistics 2025 Paper I (400 marks total). Every stem reproduced in full, with directive-word analysis, marks, word limits, and answer-approach pointers.

8Questions
400Total marks
2025Year
Paper IPaper

Topics covered

Probability theory and distributions (1)Random variables and distributions (1)Probability distributions and estimation theory (1)Sequential probability ratio test and non-parametric tests (1)Linear regression, multivariate normal, sample statistics, sample size, experimental design (1)Bivariate normal, joint distributions, conditional expectation, generalized least squares (1)Latin square design and factorial experiments (1)Principal components and missing value analysis (1)

A

Q1
50M Compulsory prove Probability theory and distributions

(a) Let E, F and G be three pairwise independent events such that P(E∩F) = 0·1 and P(F∩G) = 0·3. Prove that P(Eᶜ∪G) ≥ 11/12. 10 marks (b) If X and Y are non-negative independent random variables and their joint moment generating function is given by M_{X,Y}(t₁,t₂) = e^{t₁+2e^{t₂}-3}; t₁ > 0, t₂ > 0, then show that 2P(X+Y=2) = 9P(X+Y=0). 10 marks (c) If X₁, X₂, ... be a sequence of i.i.d. U(0, 1) random variables, then find the value of Lim_{n→∞} P(∑_{i=1}^{n} X_i ≤ n/2 + √(n/144)) [use √3 = 1·74, Φ(0·29) = 0·6141, Φ(1) = 0·8413]. 10 marks (d) Let X₁, X₂, ..., Xₙ be a random sample from normal N(μ, σ²) distribution. Obtain sufficient statistic for parameters (μ, σ²) when both the parameters are unknown. If σ² is known, what will be sufficient statistic for parameter μ ? 10 marks (e) A random variable X has the following distribution under H₀ and H₁ : x : 1 2 3 4 5 6 f₀(x) : 0·01 0·01 0·01 0·01 0·01 0·95 f₁(x) : 0·05 0·04 0·03 0·02 0·01 0·85 Find the best test of size 0·03 and its probability of type-II error for testing H₀ : f = f₀ versus H₁ : f = f₁. Is it unbiased test ? Why ? 10 marks

हिंदी में पढ़ें

(a) मान लीजिए E, F और G तीन युग्मतः स्वतंत्र घटनाएँ इस प्रकार हैं कि P(E∩F) = 0·1 और P(F∩G) = 0·3 है। सिद्ध कीजिए कि P(Eᶜ∪G) ≥ 11/12 है। 10 (b) यदि X और Y ऋणेतर स्वतंत्र यादृच्छिक चर हैं तथा जिनका संयुक्त आघूर्ण जनक फलन M_{X,Y}(t₁,t₂) = e^{t₁+2e^{t₂}-3}; t₁ > 0, t₂ > 0 है, तब दर्शाइए कि, 2P(X+Y=2) = 9P(X+Y=0) है। 10 (c) यदि X₁, X₂, ... स्वतंत्र और सर्वथा बंटित U(0, 1) यादृच्छिक चरों का एक अनुक्रम है, तब Lim_{n→∞} P(∑_{i=1}^{n} X_i ≤ n/2 + √(n/144)) का मान ज्ञात कीजिए। [प्रयोग कीजिए, √3 = 1·74, Φ(0·29) = 0·6141, Φ(1) = 0·8413] 10 (d) मान लीजिए कि X₁, X₂, ..., Xₙ प्रसामान्य N(μ, σ²) बंटन से लिया गया एक यादृच्छिक प्रतिदर्श है। जब दोनों प्राचल अज्ञात हैं, तो प्राचलों (μ, σ²) के लिए पर्याप्त प्रतिदर्शज निकालिए। यदि σ² ज्ञात है, तो प्राचल μ के लिए पर्याप्त प्रतिदर्शज क्या होगा ? 10 (e) H₀ तथा H₁ के अंतर्गत एक यादृच्छिक चर X का बंटन निम्नलिखित है : x : 1 2 3 4 5 6 f₀(x) : 0·01 0·01 0·01 0·01 0·01 0·95 f₁(x) : 0·05 0·04 0·03 0·02 0·01 0·85 H₀ : f = f₀ विरुद्ध H₁ : f = f₁ का परीक्षण करने के लिए 0·03 आमाप वाला श्रेष्ठतम परीक्षण तथा उसकी द्वितीय प्रकार की त्रुटि की प्रायिकता प्राप्त कीजिए। क्या यह परीक्षण अनभिनत है ? क्यों ? 10

Answer approach & key points

This question demands rigorous mathematical proofs and derivations across five sub-parts. Begin with (a) by establishing bounds using pairwise independence and probability inequalities; for (b) extract marginal MGFs to identify distributions and compute probabilities; for (c) apply CLT with given variance 1/12; for (d) use factorization theorem for sufficient statistics; for (e) construct Neyman-Pearson test via likelihood ratios. Allocate time proportionally: ~18 min each for (a), (b), (c) and ~12 min each for (d), (e).

  • For (a): Apply P(Eᶜ∪G) = 1 - P(E∩Gᶜ) and use pairwise independence to bound P(E∩G) ≤ 1/12, yielding the inequality
  • For (b): Factor joint MGF to identify X~Poisson(1) and Y~Poisson(2), then compute P(X+Y=2) and P(X+Y=0) using sum of Poissons
  • For (c): Apply CLT with E[X_i]=1/2, Var(X_i)=1/12, standardize to get Φ(0.29)=0.6141 as limit
  • For (d): State factorization theorem, show (X̄, S²) is sufficient for (μ,σ²), and X̄ alone when σ² known
  • For (e): Compute likelihood ratios f₁/f₀, order critical region by ratios, find size 0.03 test using x=4,5,6, compute β=0.88, verify unbiasedness
Q2
50M solve Random variables and distributions

(a) Let X be a continuous random variable having probability density function f(x) = { (2/25)(x+2), -2 ≤ x ≤ 3; 0, otherwise. Find the cumulative distribution function of Y = X² and hence find probability density function of Y. 20 marks (b) The joint probability mass function of two random variables (X, Y) be P(X = x, Y = y) = { (x+1)Cᵧ · ¹⁶Cₓ · (1/6)ʸ · (5/6)ˣ⁺¹⁻ʸ · (1/2)¹⁶, y = 0, 1, 2,..., x+1; x = 0, 1, 2,..., 16; 0, otherwise. Evaluate the following: (i) E(X), Var.(X); (ii) E(Y), Var.(Y); (iii) Cov. (X, Y). 5+5+5=15 marks (c) Let the joint probability density function of (X, Y) be f(x,y) = { 2e^{-(x+y)}, 0 < x < y < ∞; 0, otherwise. Compute the following: (i) P(Y<1); (ii) P(λX<Y), λ>1; (iii) P(Y>3X | Y>2X). 5+5+5=15 marks

हिंदी में पढ़ें

(a) मान लीजिए X एक संतत यादृच्छिक चर है जिसका प्रायिकता घनत्व फलन f(x) = { (2/25)(x+2), -2 ≤ x ≤ 3; 0, अन्यथा है। Y = X² का संचयी बंटन फलन ज्ञात कीजिए और इस प्रकार Y का प्रायिकता घनत्व फलन प्राप्त कीजिए। 20 (b) दो यादृच्छिक चरों (X, Y) का संयुक्त प्रायिकता द्रव्यमान फलन P(X = x, Y = y) = { (x+1)Cᵧ · ¹⁶Cₓ · (1/6)ʸ · (5/6)ˣ⁺¹⁻ʸ · (1/2)¹⁶, y = 0, 1, 2,..., x+1; x = 0, 1, 2,..., 16; 0, अन्यथा है। निम्नलिखित का मान निकालिए : (i) E(X), Var.(X); (ii) E(Y), Var.(Y); (iii) Cov. (X, Y). 5+5+5=15 (c) मान लीजिए कि (X, Y) का संयुक्त प्रायिकता घनत्व फलन निम्नवत है : f(x,y) = { 2e^{-(x+y)}, 0 < x < y < ∞; 0, अन्यथा. निम्नलिखित की गणना कीजिए : (i) P(Y<1); (ii) P(λX<Y), λ>1; (iii) P(Y>3X | Y>2X). 5+5+5=15

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 40% time to part (a) given its 20 marks weightage, and 30% each to parts (b) and (c). Begin with clear identification of the distribution type for each part, show complete derivation steps for transformations in (a), recognize the compound/binomial structure in (b), and carefully handle the constrained region of integration in (c). Conclude each sub-part with boxed final answers and appropriate probability statements.

  • Part (a): Correctly identify support of Y=X² as [0,9] with special handling for Y∈[0,4) where two X-values map to same Y; derive piecewise CDF F_Y(y) = P(X²≤y) by splitting into X∈[-√y,√y] and account for f(x)=0 for x<-2
  • Part (a): Differentiate CDF to obtain PDF f_Y(y) with proper case distinction for y∈[0,4) and y∈[4,9], verifying total probability equals 1
  • Part (b): Recognize X~Binomial(16, 1/2) and Y|X=x~Binomial(x+1, 1/6), then apply laws of total/conditional expectation and variance including E(Y)=E[E(Y|X)] and Var(Y)=E[Var(Y|X)]+Var[E(Y|X)]
  • Part (b): Compute Cov(X,Y)=E[XY]-E[X]E[Y] using E[XY]=E[X·E(Y|X)]=E[X·(x+1)/6] with proper summation or known binomial moments
  • Part (c): Set up correct double integrals over region 0<x<y<∞ with Jacobian handling; for P(Y<1) integrate x from 0 to y then y from 0 to 1
  • Part (c): For P(λX<Y) with λ>1, identify region as 0<x<y/λ<y and evaluate; for conditional probability P(Y>3X|Y>2X) use definition P(Y>3X)/P(Y>2X) with proper region identification
Q3
50M calculate Probability distributions and estimation theory

(a) Let probability of obtaining Head on a biased coin be 4/5 and X be the number of heads obtained in a sequence of 25 independent tosses of the coin. The same coin is tossed again X number of times independently and we obtain Y heads. Compute Var.(X+25Y). (20 marks) (b)(i) Let {6, –8, 3, 2, 7, 5, 4, 9} be a random sample from a population with probability density function f(x, θ) = ½ exp(–|x–θ|), –∞<x, θ<∞. Obtain maximum likelihood estimate of θ. (5 marks) (b)(ii) Let X₁, X₂, ..., Xₙ be a random sample from Bernoulli distribution b(1, θ), 0<θ<1. Find the lower bound for the variance of an unbiased estimator of θ based on this data. Find uniformly minimum variance unbiased estimator of θ and show that it attains Cramer-Rao lower bound. (10 marks) (c) Let X₁, X₂, ..., Xₙ be a random sample from beta distribution of first kind β₍₁, θ₎, θ>0. Find consistent estimator of θ, and its variance also. (15 marks)

हिंदी में पढ़ें

(a) मान लीजिए एक अभिनत सिक्के में चित आने की प्रायिकता 4/5 है, और X सिक्के की 25 स्वतंत्र उछालों में प्राप्त चितों की संख्या को दर्शाता है। उसी सिक्के को पुनः X बार स्वतंत्र रूप से उछालने पर हमें Y चित प्राप्त होते हैं। Var.(X+25Y) की गणना कीजिए। (20 अंक) (b)(i) मान लीजिए {6, −8, 3, 2, 7, 5, 4, 9}, प्रायिकता घनत्व फलन f(x, θ) = 1/2 exp(-|x-θ|), -∞ < x, θ < ∞ वाली एक समष्टि से लिया गया एक यादृच्छिक प्रतिदर्श है। θ का अधिकतम संभाविता आकलक प्राप्त कीजिए। (5 अंक) (b)(ii) मान लीजिए X₁, X₂, ..., Xₙ बर्नौली-बंटन b(1, θ), 0<θ<1 से लिया गया एक यादृच्छिक प्रतिदर्श है । इन आँकड़ों पर आधारित, θ के एक अनभिनत आकलक के प्रसरण के लिए निम्न परिबंध ज्ञात कीजिए । θ का एक समान न्यूनतम प्रसरण अनभिनत आकलक ज्ञात कीजिए तथा दर्शाइए कि यह क्रामर-राव निम्न परिबंध प्राप्त करता है । (10 अंक) (c) मान लीजिए X₁, X₂, ..., Xₙ प्रथम प्रकार के बीटा बंटन β₍₁, θ₎, θ>0 से लिया गया एक यादृच्छिक प्रतिदर्श है । θ का संगत आकलक ज्ञात कीजिए और इसका प्रसरण भी निकालिए । (15 अंक)

Answer approach & key points

Calculate the required quantities systematically across all three parts. For part (a), identify distributions and apply variance decomposition for compound random variables; for (b)(i), derive the MLE using the Laplace distribution's median property; for (b)(ii), establish the Cramer-Rao bound and verify attainment; for (c), use method of moments or MLE for consistency. Allocate approximately 40% time to part (a) given its 20 marks, 30% to part (c) for 15 marks, and 30% to part (b) combining 5+10 marks. Present derivations stepwise with clear notation before substituting numerical values.

  • Part (a): Correctly identify X ~ Binomial(25, 4/5) and Y|X=x ~ Binomial(x, 4/5), then apply law of total variance to find Var(X+25Y) using E[Var(Y|X)] + Var(E[Y|X]) components
  • Part (b)(i): Recognize f(x,θ) as Laplace distribution with location parameter θ, hence MLE of θ equals the sample median (4.5 or between 4 and 5)
  • Part (b)(ii): Derive Cramer-Rao lower bound as θ(1-θ)/n, identify sample mean as UMVUE, and prove it attains the bound by showing equality in Cauchy-Schwarz
  • Part (c): For Beta(1,θ), derive method of moments estimator θ̂ = (1-X̄)/X̄ or MLE, prove consistency via weak law of large numbers, and compute asymptotic variance
  • Correct application of variance formulas: Var(X) = np(1-p) for binomial, and careful handling of the 25Y scaling factor in part (a)
  • Proper justification of why sample mean is UMVUE in (b)(ii) using completeness and sufficiency of T = ΣXi, or direct variance calculation
  • Verification that estimator in (c) is consistent by showing plim(θ̂) = θ as n → ∞, with explicit variance expression involving θ and n
Q4
50M derive Sequential probability ratio test and non-parametric tests

(a) Let X₁, X₂, ... be a sequence of random variables from Bernoulli distribution with mean θ, 0<θ<1. Derive SPRT for testing H₀ : θ = θ₀ versus H₁ : θ = θ₁ = 1 – θ₀, 0<θ₀<1. Also obtain expressions for OC function and ASN function. (20 marks) (b) A random sample of size n is taken from the exponential distribution with mean θ>0. Given that n₁ observations out of n observations are less than 'a'. Show that minimum Chi-square estimate and maximum likelihood estimate of θ are same. (15 marks) (c) The life of 6 items of brand-A and 6 items of brand-B are given below: A : 40 62 55 35 48 88 B : 50 70 65 30 45 92 Using Kolmogorov-Smirnov test, test whether the distribution of life of both the brands are same or not at 5% level of significance. [Given that D₍₆, ₆, ₀.₀₅₎ = 2/3] (15 marks)

हिंदी में पढ़ें

(a) मान लीजिए X₁, X₂, ... माध्य θ, 0<θ<1 वाले बर्नौली-बंटन से लिए गए यादृच्छिक चरों का एक अनुक्रम है । H₀ : θ = θ₀ विरुद्ध H₁ : θ = θ₁ = 1 – θ₀, 0<θ₀<1, के परीक्षण के लिए SPRT व्युत्पन्न कीजिए । इस परीक्षण के OC फलन तथा ASN फलन के व्यंजकों को भी प्राप्त कीजिए । (20 अंक) (b) माध्य θ>0 वाले चरघातांकी बंटन से आमाप n का एक यादृच्छिक प्रतिदर्श लिया गया है । दिया गया है कि, n प्रेक्षणों में से n₁ प्रेक्षण 'a' से छोटे हैं । दर्शाइए कि θ का न्यूनतम काई-वर्ग आकलक तथा अधिकतम संभाविता आकलक समान है । (15 अंक) (c) ब्रांड-A की 6 वस्तुओं और ब्रांड-B की 6 वस्तुओं के जीवनकाल नीचे दिये गये हैं : A : 40 62 55 35 48 88 B : 50 70 65 30 45 92 कोलमोगोरोव-स्मिरनोव परीक्षण का उपयोग करते हुए, 5% सार्थकता स्तर पर परीक्षण कीजिए कि दोनों ब्रांड का जीवन का बंटन समान है या नहीं । [दिया गया है कि D₍₆, ₆, ₀.₀₅₎ = ²/₃] (15 अंक)

Answer approach & key points

Derive the SPRT for Bernoulli in part (a) with proper likelihood ratio development, spending ~40% time on this 20-mark component; for (b) prove equivalence of MCSE and MLE through differentiation of chi-square and likelihood functions (~30%); for (c) apply K-S test with correct empirical CDF construction and comparison against critical value D₍₆,₆,₀.₀₅₎ = 2/3 (~30%). Structure: state hypotheses → show derivations/computations → conclude with statistical decisions.

  • Part (a): Derive Wald's SPRT using likelihood ratio Λₙ = (θ₁/θ₀)^ΣXᵢ · ((1-θ₁)/(1-θ₀))^(n-ΣXᵢ) with continuation region A < Λₙ < B
  • Part (a): Obtain OC function L(θ) ≈ (A^(h(θ))-1)/(A^(h(θ))-B^(h(θ))) and ASN Eθ(N) ≈ L(θ)lnA + (1-L(θ))lnB / Eθ(Z) where Z = ln[f₁(X)/f₀(X)]
  • Part (b): Set up Pearson's chi-square with cells [0,a) and [a,∞), minimize Σ(Oᵢ-Eᵢ)²/Eᵢ to get MCSE, show same equation as MLE score function
  • Part (c): Construct ordered empirical distribution functions F₆(x) and G₆(x) for both brands, compute D₆,₆ = sup|F₆(x)-G₆(x)|
  • Part (c): Compare calculated D statistic with critical value 2/3, conclude whether to reject H₀ of identical distributions at 5% level

B

Q5
50M Compulsory derive Linear regression, multivariate normal, sample statistics, sample size, experimental design

(a) For a two variable linear regression model Yᵢ = a + bXᵢ + eᵢ, where E(eᵢ) = 0, Var(eᵢ) = σ²ₑ, Cov(eᵢ, eⱼ) = 0 for i ≠ j, (i,j) ∈ {1, 2, ..., n}, if â and b̂ are least square estimators of a and b respectively, derive expressions for Var(â), Var(b̂) and Cov(â, b̂). 10 marks (b) Let X = (X₁ X₂ X₃)' ~ N₃(μ, Σ), where μ = (1 2 1)' and Σ = (9 2 2 / 2 3 0 / 2 0 2). Find the joint distribution of Y₁ = X₁ + X₂ + X₃ and Y₂ = X₂ - X₃. 10 marks (c) If X₁, X₂, ..., Xₙ is a random sample from a standard normal population, then using quadratic forms show that the sample mean X̄ = (1/n)∑ⱼ₌₁ⁿ Xⱼ and sample variance S² = [1/(n-1)]∑ⱼ₌₁ⁿ(Xⱼ - X̄)² are stochastically independent. 10 marks (d) Assume that in a population of very large number of items, proportion of defective items is 0·30. What should be the size of the sample, if a simple random sample is to be drawn from this population to estimate the percent defective within 2 percent of the true value with 95·5 percent probability? [Given P(0 ≤ Z ≤ 1·96) = 0·475; and P(0 ≤ Z ≤ 2·005) = 0·4775]. 10 marks (e) How do the size and shape of plots and blocks effect the results of field experiments? 10 marks

हिंदी में पढ़ें

(a) द्विचर रैखिक समाश्रयन निदर्श Yᵢ = a + bXᵢ + eᵢ जहाँ E(eᵢ) = 0, Var(eᵢ) = σ²ₑ, Cov(eᵢ, eⱼ) = 0, i ≠ j, (i,j) ∈ {1, 2, ..., n}, के लिए यदि â और b̂ क्रमशः a और b के न्यूनतम वर्ग आकलक हैं, तो Var(â), Var(b̂) तथा Cov(â, b̂) के लिए व्यंजकों को व्युत्पन्न कीजिए। 10 अंक (b) मान लीजिए X = (X₁ X₂ X₃)' ~ N₃(μ, Σ), जहाँ μ = (1 2 1)' तथा Σ = (9 2 2 / 2 3 0 / 2 0 2) है। Y₁ = X₁ + X₂ + X₃ और Y₂ = X₂ - X₃ का संयुक्त बंटन ज्ञात कीजिए। 10 अंक (c) यदि X₁, X₂, ..., Xₙ एक मानक प्रसामान्य समष्टि से लिया गया एक यादृच्छिक प्रतिदर्श है, तो द्विघात रूपों का उपयोग करके दर्शाइए कि प्रतिदर्श माध्य X̄ = (1/n)∑ⱼ₌₁ⁿ Xⱼ और प्रतिदर्श प्रसरण S² = [1/(n-1)]∑ⱼ₌₁ⁿ(Xⱼ - X̄)² प्रसामान्य रूप से स्वतंत्र हैं। 10 अंक (d) मान लें कि वस्तुओं की बहुत बड़ी संख्या वाली समष्टि में दोष पूर्ण वस्तुओं का अनुपात 0·30 है। इस समष्टि से एक सरल यादृच्छिक प्रतिदर्श निकाले जाने पर प्रतिदर्श का आमाप क्या होना चाहिए ताकि 95·5 प्रतिशत प्रायिकता के साथ वास्तविक मान के 2% के भीतर दोष प्रतिशत का आकलन किया जा सके? [दिया गया है P(0 ≤ Z ≤ 1·96) = 0·475; तथा P(0 ≤ Z ≤ 2·005) = 0·4775]। 10 अंक (e) भूखंडों और खंडकों के आमाप और आकार खेत प्रयोगों के परिणामों को कैसे प्रभावित करते हैं? 10 अंक

Answer approach & key points

This question demands rigorous derivation and calculation across five statistical sub-parts. Allocate approximately 20% time each: for (a) derive Var(â), Var(b̂), Cov(â,b̂) using matrix or summation approach; for (b) apply linear transformation of multivariate normal; for (c) use quadratic forms and Cochran's theorem; for (d) solve the sample size formula for proportions; for (e) discuss experimental design principles with Indian agricultural examples like IARI field trials. Present each part distinctly with clear labeling.

  • Part (a): Derivation of Var(â) = σ²ₑ[∑Xᵢ²/(n∑(Xᵢ-X̄)²)], Var(b̂) = σ²ₑ/∑(Xᵢ-X̄)², and Cov(â,b̂) = -σ²ₑX̄/∑(Xᵢ-X̄)² using normal equations or matrix approach
  • Part (b): Application of linear transformation Y = AX where A = [[1,1,1],[0,1,-1]] to obtain Y ~ N₂(Aμ, AΣA') with computed mean (4,1)' and covariance matrix [[17,1],[1,5]]
  • Part (c): Decomposition of total sum of squares using idempotent matrices, showing Q₁ = nX̄² and Q₂ = (n-1)S² are independent via rank additivity and Cochran's theorem
  • Part (d): Sample size calculation n = Z²ₐ/₂ p(1-p)/d² with p=0.30, d=0.02, Z=2.005 yielding n ≈ 2102 or appropriate rounding
  • Part (e): Discussion of plot size effects on soil heterogeneity control, shape effects on border bias, and block arrangement for local control with reference to Indian agricultural experiments like varietal trials at IARI
Q6
50M derive Bivariate normal, joint distributions, conditional expectation, generalized least squares

(a)(i) If (X, Y) follows bivariate normal BN(μ₁, μ₂, σ₁², σ₂², ρ), then obtain (A) E(e^X) (B) E(e^(X+Y)) (C) Var(e^X) and (D) Correlation between e^X and e^Y. 3+3+3+3=12 marks (a)(ii) If (X, Y) have the joint probability density function g(x,y) = y e^(-y(x+1)), for x ≥ 0, y ≥ 0; 0 elsewhere, then find the regression curve of X on Y and comment on the nature of the curve. 8 marks (b) Let X = (X₁, X₂, X₃)' ~ N₃(μ, Σ), in which μ = (2 1 3)' and Σ = (9 2 -2 / 2 2 -3 / -2 -3 9). Obtain (i) E{X₁ | X₂ = x₂, X₃ = x₃} and (ii) Var{X₁ | X₂ = x₂, X₃ = x₃}. 15 marks (c) Consider the model: Y = X θ + ε, where ε is an n×1 vector of unobservable random variables such that E(ε) = 0 and D(ε) = σ²Ω, σ>0 unknown, Ω is a positive definite matrix of known constants and rank(X) = k<n. Then (i) Derive least square estimator of θ and (ii) Derive an unbiased estimator of σ². 9+6=15 marks

हिंदी में पढ़ें

(a)(i) यदि (X, Y) द्विचर प्रसामान्य BN(μ₁, μ₂, σ₁², σ₂², ρ) का अनुसरण करता है, तो (A) E(e^X) (B) E(e^(X+Y)) (C) Var(e^X) तथा (D) e^X और e^Y के बीच सहसंबंध ज्ञात कीजिए। 3+3+3+3=12 अंक (a)(ii) यदि (X, Y) का संयुक्त प्रायिकता घनत्व फलन निम्नवत है: g(x,y) = y e^(-y(x+1)), x ≥ 0, y ≥ 0; 0 अन्यथा, तो X का Y पर समाश्रयन वक्र ज्ञात कीजिए तथा वक्र की प्रकृति पर टिप्पणी कीजिए। 8 अंक (b) मान लीजिए कि X = (X₁, X₂, X₃)' ~ N₃(μ, Σ), जिसमें μ = (2 1 3)' तथा Σ = (9 2 -2 / 2 2 -3 / -2 -3 9) है। ज्ञात कीजिए (i) E{X₁ | X₂ = x₂, X₃ = x₃} और (ii) Var{X₁ | X₂ = x₂, X₃ = x₃}। 15 अंक (c) निदर्श पर विचार कीजिए: Y = X θ + ε, जहाँ ε अलक्ष्य यादृच्छिक चरों का एक n×1 सदिश इस प्रकार है कि E(ε) = 0 और D(ε) = σ²Ω, σ > 0 अज्ञात है, Ω ज्ञात स्थिरांकों का एक धनात्मक निश्चित आव्यूह है तथा कोटि (X) = k < n है। तब: (i) θ का न्यूनतम वर्ग आकलक व्युत्पन्न कीजिए और (ii) σ² का एक अनभिनत आकलक व्युत्पन्न कीजिए। 9+6=15 अंक

Answer approach & key points

Derive all required quantities systematically across three parts: spend ~35% time on (a) covering MGF technique for lognormal moments and regression curve derivation; ~30% on (b) for conditional multivariate normal using Schur complement; and ~35% on (c) for GLS estimator derivation via Aitken transformation and unbiased variance estimation. Begin each part with appropriate distribution assumptions, show complete derivation steps, and conclude with explicit final expressions.

  • Part (a)(i): Use MGF of bivariate normal to derive E(e^X)=exp(μ₁+σ₁²/2), E(e^(X+Y))=exp(μ₁+μ₂+(σ₁²+σ₂²+2ρσ₁σ₂)/2), Var(e^X), and Corr(e^X,e^Y) using lognormal properties
  • Part (a)(ii): Obtain marginal of Y, conditional density of X|Y, derive E(X|Y=y)=1/y showing hyperbolic regression curve with negative association
  • Part (b): Apply conditional multivariate normal formula with Σ₂₂ partition, compute Σ₁₂Σ₂₂⁻¹ for conditional mean and Σ₁₁-Σ₁₂Σ₂₂⁻¹Σ₂₁ for conditional variance
  • Part (c)(i): Derive GLS estimator θ̂=(X'Ω⁻¹X)⁻¹X'Ω⁻¹Y via Aitken transformation or direct minimization of generalized sum of squares
  • Part (c)(ii): Derive unbiased estimator σ̂²=(Y-Xθ̂)'Ω⁻¹(Y-Xθ̂)/(n-k) using trace properties and idempotent matrix arguments
  • Correct handling of positive definiteness conditions for Ω and invertibility requirements throughout
  • Proper verification that E(θ̂)=θ (unbiasedness) and E(σ̂²)=σ² in part (c)
Q7
50M analyse Latin square design and factorial experiments

(a) Analyse and interpret the following data concerning output of wheat per field obtained as a result of experiment conducted to test four varieties of wheat A, B, C and D under a Latin square design at 5% level of significance. [Given F(3, 6) = 4·76; F(4, 7) = 4·12] (20 marks) (b)(i) Explain the need of factorial experiments with an example from pharmaceutical study. (6 marks) (b)(ii) Divide the 16 treatments of 2⁴ factorial experiment into 4 blocks of 4 treatments each, confounding the interaction effect AB and CD completely with blocks. Which other interaction is automatically confounded in this design ? (9 marks) (c) Define Horvitz-Thompson estimator for estimating the population total, and show that it is unbiased for probability proportional to size sampling without replacement. Also find its sampling variance. (15 marks)

हिंदी में पढ़ें

(a) गेहूँ की चार किस्मों A, B, C और D के परीक्षण के लिए किये गये प्रयोग के परिणाम स्वरूप प्रति खेत गेहूँ के उत्पादन से संबंधित निम्नलिखित आँकड़ों का विश्लेषण और व्याख्या कीजिए, जो 5% सार्थकता स्तर पर एक लैटिन वर्ग अभिकल्पना के अंतर्गत किया गया हो । [दिया गया है F(3, 6) = 4·76; F(4, 7) = 4·12] (20 अंक) (b)(i) बहु-उपादानी प्रयोगों की आवश्यकता की, एक औषध अध्ययन के उदाहरण के साथ, व्याख्या कीजिए। (6 अंक) (b)(ii) 2⁴ बहु-उपादानी प्रयोग के 16 उपचारों को 4 समूहों में, प्रत्येक में 4 उपचारों के साथ, विभाजित कीजिए, जिसमें अन्योन्य क्रिया प्रभाव AB और CD को समूहों के साथ पूरी तरह से संकरण किया गया है। इस अभिकल्पना में कौन सी अन्य अन्योन्य क्रिया स्वचालित रूप से संकरित होती है ? (9 अंक) (c) हारविट्ज-थॉम्पसन आकलक को समष्टि योग का आकलन करने के लिए परिभाषित कीजिए, और दर्शाइए कि यह आकार के समानुपात प्रायिकता वाले प्रतिचयन, प्रतिस्थापन रहित, के लिए अनभिनत है। इस का प्रतिचयन प्रसरण भी ज्ञात कीजिए। (15 अंक)

Answer approach & key points

Begin with the directive 'analyse' by breaking down the Latin square data in part (a) systematically—set up ANOVA table, compute F-statistic, and compare with critical value. Allocate approximately 40% of effort to part (a) (20 marks), 30% to part (b) combining theoretical explanation of factorial experiments with pharmaceutical example and confounding construction (15 marks), and 30% to part (c) for rigorous derivation of Horvith-Thompson estimator properties (15 marks). Structure as: (a) complete ANOVA with hypothesis testing, (b)(i) conceptual explanation with Indian pharmaceutical context like drug efficacy trials, (b)(ii) systematic block construction using confounding pattern, (c) formal definition followed by unbiasedness proof and variance derivation.

  • For (a): Correct ANOVA setup for 4×4 Latin square with rows, columns, treatments; proper calculation of correction factor, total SS, row SS, column SS, treatment SS, error SS; correct F-test for varieties with df (3,6); comparison with given critical value F(3,6)=4.76; clear conclusion on significance
  • For (b)(i): Explanation of factorial experiments need—simultaneous study of multiple factors, detection of interactions, efficiency over single-factor experiments; pharmaceutical example such as 2² factorial on drug dosage and administration timing effects on patient recovery in Indian clinical trials
  • For (b)(ii): Construction of 2⁴ factorial in 4 blocks using AB and CD as confounded effects; identification of generalized interaction AB×CD = ABCD as automatically confounded; systematic block composition using even-odd rule or modulo 2 arithmetic on defining contrasts
  • For (c): Formal definition of Horvitz-Thompson estimator as Σ(yᵢ/πᵢ) where πᵢ is inclusion probability; proof of unbiasedness showing E(Ŷ_HT) = Y using PPSWOR properties with πᵢ = npᵢ; derivation of variance formula involving πᵢ and πᵢⱼ using Yates-Grundy-Sen approach or alternative
  • Cross-cutting: Appropriate use of statistical notation, clear statement of assumptions, and logical flow connecting theoretical derivations to practical experimental contexts
Q8
50M solve Principal components and missing value analysis

(a)(i) What are principal components ? Show that the principal components are uncorrelated. (10 marks) (a)(ii) Obtain the principal components and the amount of variation explained by each principal component associated with the following dispersion matrix : Σ = $\begin{pmatrix} 4 & 2 & 1 \\ 2 & 3 & 1 \\ 1 & 1 & 2 \end{pmatrix}$ Comment on the results. (10 marks) (b) For the given data, the yield of the treatment B in the second block is missing and is denoted as 'y'. Estimate the missing value, and analyse the data by assuming the level of significance = 0·05. [Given that F(3, 4) = 6·59; and F(2, 3) = 9·55] (20 marks) (c) Distinguish between Sampling and Non-sampling Errors. What are their sources ? How these errors can be controlled ? (10 marks)

हिंदी में पढ़ें

(a)(i) मुख्य घटक क्या हैं ? दर्शाइए कि मुख्य घटक असहसंबंधित हैं। (10 अंक) (a)(ii) निम्नलिखित प्रकीर्णन आव्यूह से संबंधित मुख्य घटकों को प्राप्त कीजिए तथा प्रत्येक मुख्य घटक द्वारा स्पष्ट की गई परिवर्तन की मात्रा प्राप्त कीजिए : Σ = $\begin{pmatrix} 4 & 2 & 1 \\ 2 & 3 & 1 \\ 1 & 1 & 2 \end{pmatrix}$ परिणामों पर टिप्पणी कीजिए। (10 अंक) (b) दिए गए आंकड़ों के लिए, दूसरे खंड में उपचार B की उपज लुप्त है और इसे 'y' से दर्शाया गया है। लुप्त मान का आकलन कीजिए, और आंकड़ों का सार्थकता स्तर 0·05 पर विश्लेषण कीजिए। [दिया गया है F(3, 4) = 6·59; और F(2, 3) = 9·55] (20 अंक) (c) प्रतिचयन और अप्रतिचयन त्रुटियों के बीच अंतर कीजिए। उनके स्रोत क्या हैं ? इन त्रुटियों को कैसे नियंत्रित किया जा सकता है ? (10 अंक)

Answer approach & key points

This is a multi-part numerical and theoretical question requiring proof, computation, and analysis. Allocate approximately 40% time to part (a) covering PCA theory and computation, 40% to part (b) for missing value estimation and ANOVA analysis, and 20% to part (c) for conceptual comparison of errors. Begin with definitions and proofs in (a), proceed to systematic eigenvalue computation, then handle missing value estimation using Yates' method followed by complete ANOVA, and conclude with structured comparison for (c).

  • Part (a)(i): Define principal components as linear combinations maximizing variance; prove uncorrelatedness using orthogonal transformation property (Z = Γ'X where Γ is eigenvector matrix)
  • Part (a)(ii): Compute eigenvalues of Σ (characteristic equation: -λ³ + 9λ² - 21λ + 13 = 0), obtain eigenvectors, calculate proportion of variance explained by each PC, comment on dimensionality reduction
  • Part (b): Estimate missing value y using Yates' formula for RBD: y = (rB + tT - G)/((r-1)(t-1)), reconstruct ANOVA table with adjusted degrees of freedom, compare calculated F with given critical values
  • Part (c): Distinguish sampling error (random, measurable, decreases with n) vs non-sampling error (systematic, non-measurable); list sources (coverage, non-response, measurement, processing, frame errors); control methods (probability sampling, pre-testing, training, validation, imputation techniques)
  • Correct application of spectral decomposition theorem and verification that trace equals sum of eigenvalues

Practice Statistics 2025 Paper I answer writing

Pick any question above, write your answer, and get a detailed AI evaluation against UPSC's standard rubric.

Start free evaluation →