Statistics 2025 Paper I 50 marks Compulsory Derive

Q5

(a) For a two variable linear regression model Yᵢ = a + bXᵢ + eᵢ, where E(eᵢ) = 0, Var(eᵢ) = σ²ₑ, Cov(eᵢ, eⱼ) = 0 for i ≠ j, (i,j) ∈ {1, 2, ..., n}, if â and b̂ are least square estimators of a and b respectively, derive expressions for Var(â), Var(b̂) and Cov(â, b̂). 10 marks (b) Let X = (X₁ X₂ X₃)' ~ N₃(μ, Σ), where μ = (1 2 1)' and Σ = (9 2 2 / 2 3 0 / 2 0 2). Find the joint distribution of Y₁ = X₁ + X₂ + X₃ and Y₂ = X₂ - X₃. 10 marks (c) If X₁, X₂, ..., Xₙ is a random sample from a standard normal population, then using quadratic forms show that the sample mean X̄ = (1/n)∑ⱼ₌₁ⁿ Xⱼ and sample variance S² = [1/(n-1)]∑ⱼ₌₁ⁿ(Xⱼ - X̄)² are stochastically independent. 10 marks (d) Assume that in a population of very large number of items, proportion of defective items is 0·30. What should be the size of the sample, if a simple random sample is to be drawn from this population to estimate the percent defective within 2 percent of the true value with 95·5 percent probability? [Given P(0 ≤ Z ≤ 1·96) = 0·475; and P(0 ≤ Z ≤ 2·005) = 0·4775]. 10 marks (e) How do the size and shape of plots and blocks effect the results of field experiments? 10 marks

हिंदी में प्रश्न पढ़ें

(a) द्विचर रैखिक समाश्रयन निदर्श Yᵢ = a + bXᵢ + eᵢ जहाँ E(eᵢ) = 0, Var(eᵢ) = σ²ₑ, Cov(eᵢ, eⱼ) = 0, i ≠ j, (i,j) ∈ {1, 2, ..., n}, के लिए यदि â और b̂ क्रमशः a और b के न्यूनतम वर्ग आकलक हैं, तो Var(â), Var(b̂) तथा Cov(â, b̂) के लिए व्यंजकों को व्युत्पन्न कीजिए। 10 अंक (b) मान लीजिए X = (X₁ X₂ X₃)' ~ N₃(μ, Σ), जहाँ μ = (1 2 1)' तथा Σ = (9 2 2 / 2 3 0 / 2 0 2) है। Y₁ = X₁ + X₂ + X₃ और Y₂ = X₂ - X₃ का संयुक्त बंटन ज्ञात कीजिए। 10 अंक (c) यदि X₁, X₂, ..., Xₙ एक मानक प्रसामान्य समष्टि से लिया गया एक यादृच्छिक प्रतिदर्श है, तो द्विघात रूपों का उपयोग करके दर्शाइए कि प्रतिदर्श माध्य X̄ = (1/n)∑ⱼ₌₁ⁿ Xⱼ और प्रतिदर्श प्रसरण S² = [1/(n-1)]∑ⱼ₌₁ⁿ(Xⱼ - X̄)² प्रसामान्य रूप से स्वतंत्र हैं। 10 अंक (d) मान लें कि वस्तुओं की बहुत बड़ी संख्या वाली समष्टि में दोष पूर्ण वस्तुओं का अनुपात 0·30 है। इस समष्टि से एक सरल यादृच्छिक प्रतिदर्श निकाले जाने पर प्रतिदर्श का आमाप क्या होना चाहिए ताकि 95·5 प्रतिशत प्रायिकता के साथ वास्तविक मान के 2% के भीतर दोष प्रतिशत का आकलन किया जा सके? [दिया गया है P(0 ≤ Z ≤ 1·96) = 0·475; तथा P(0 ≤ Z ≤ 2·005) = 0·4775]। 10 अंक (e) भूखंडों और खंडकों के आमाप और आकार खेत प्रयोगों के परिणामों को कैसे प्रभावित करते हैं? 10 अंक

Directive word: Derive

This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.

See our UPSC directive words guide for a full breakdown of how to respond to each command word.

How this answer will be evaluated

Approach

This question demands rigorous derivation and calculation across five statistical sub-parts. Allocate approximately 20% time each: for (a) derive Var(â), Var(b̂), Cov(â,b̂) using matrix or summation approach; for (b) apply linear transformation of multivariate normal; for (c) use quadratic forms and Cochran's theorem; for (d) solve the sample size formula for proportions; for (e) discuss experimental design principles with Indian agricultural examples like IARI field trials. Present each part distinctly with clear labeling.

Key points expected

  • Part (a): Derivation of Var(â) = σ²ₑ[∑Xᵢ²/(n∑(Xᵢ-X̄)²)], Var(b̂) = σ²ₑ/∑(Xᵢ-X̄)², and Cov(â,b̂) = -σ²ₑX̄/∑(Xᵢ-X̄)² using normal equations or matrix approach
  • Part (b): Application of linear transformation Y = AX where A = [[1,1,1],[0,1,-1]] to obtain Y ~ N₂(Aμ, AΣA') with computed mean (4,1)' and covariance matrix [[17,1],[1,5]]
  • Part (c): Decomposition of total sum of squares using idempotent matrices, showing Q₁ = nX̄² and Q₂ = (n-1)S² are independent via rank additivity and Cochran's theorem
  • Part (d): Sample size calculation n = Z²ₐ/₂ p(1-p)/d² with p=0.30, d=0.02, Z=2.005 yielding n ≈ 2102 or appropriate rounding
  • Part (e): Discussion of plot size effects on soil heterogeneity control, shape effects on border bias, and block arrangement for local control with reference to Indian agricultural experiments like varietal trials at IARI

Evaluation rubric

DimensionWeightMax marksExcellentAveragePoor
Setup correctness20%10Correctly states all model assumptions for (a), identifies transformation matrix for (b), specifies standard normal setup for (c), identifies parameters p=0.30, d=0.02, Z=2.005 for (d), and defines plot/block terminology for (e)States most assumptions correctly but misses independence of errors in (a) or uses wrong Z-value in (d); vague on experimental design terms in (e)Wrong model specification, confuses parameters, or fundamentally misunderstands the statistical setup in multiple parts
Method choice20%10Uses optimal methods: normal equations/matrix algebra for (a), linear transformation property of MVN for (b), Cochran's theorem/quadratic forms for (c), standard sample size formula for (d), and principles of local control/randomization for (e)Correct general approach but inefficient or partially correct methods; e.g., direct differentiation instead of matrix approach in (a), or missing Cochran's theorem in (c)Wrong methodological approach such as treating (b) as univariate, ignoring quadratic forms in (c), or using descriptive statistics instead of design principles in (e)
Computation accuracy20%10All algebraic manipulations correct: proper handling of ∑(Xᵢ-X̄)² terms in (a), accurate matrix multiplication AΣA' in (b), correct idempotent matrix ranks in (c), precise calculation n=2101.875→2102 in (d)Minor computational errors like arithmetic mistakes in matrix multiplication or rounding errors in sample size, but methodologically soundMajor computational errors: wrong variance expressions, incorrect covariance matrix elements, or sample size off by factor of 10 or more
Interpretation20%10Interprets negative covariance in (a) as precision trade-off, explains why Y₁,Y₂ are correlated in (b), clarifies why independence requires normality in (c), discusses practical implications of large sample in (d), and relates plot size to fertility gradients in Indian soils for (e)Basic interpretation without deeper insight; mentions correlation exists but doesn't explain geometric intuition or practical consequencesNo interpretation provided or completely wrong interpretation of statistical results across parts
Final answer & units20%10All final answers clearly stated: explicit variance-covariance formulas in (a), complete N₂ distribution specification in (b), clear independence demonstration in (c), sample size n=2102 with justification in (d), and actionable recommendations for field experiment design in (e)Final answers present but incomplete or poorly formatted; missing distribution parameters or vague recommendationsMissing final answers, wrong units (e.g., percent vs proportion confusion in d), or incomplete distribution specifications

Practice this exact question

Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.

Evaluate my answer →

More from Statistics 2025 Paper I