Q5
(a) For a simple linear regression model Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n (i) Derive the least square estimators of β₀ and β₁, clearly stating the conditions assumed. (ii) For eᵢ = Yᵢ - Ŷᵢ where Ŷᵢ is the fitted value, show that 1. Σᵢ₌₁ⁿ eᵢ = 0 2. Σᵢ₌₁ⁿ Yᵢ = Σᵢ₌₁ⁿ Ŷᵢ 3. Σᵢ₌₁ⁿ Xᵢeᵢ = 0 4. Σᵢ₌₁ⁿ Ŷᵢeᵢ = 0 5. The regression line passes through (X̄, Ȳ). 5+5 (b) In usual notations, if v, b, r, k and λ are the parameters of a Balanced Incomplete Block Design, then show that : (i) b ≥ r + 1 ≥ λ + 2 (ii) v ≤ b ≤ (r² - 1)/λ 10 (c) For the multiple linear regression model with two predictor variables X₁ and X₂, show that the estimate of regression coefficient of X₁ is unchanged when X₂ is added to the regression model, whenever X₁ and X₂ are uncorrelated. 10 (d) A sample of size n is drawn from a population having N units by simple random sampling without replacement. A sub-sample of n₁ units is drawn from the n units by simple random sampling without replacement. Let ȳ₁ denote the mean based on n₁ units and ȳ₂, the mean based on n₂ = n - n₁ units. Consider the estimator of the population mean Ȳₙ given by : Ŷₙ = wȳ₁ + (1-w)ȳ₂ ; 0 < w < 1 Show that E(Ŷₙ) = Ȳₙ, and obtain its variance. 10 (e) How is the efficiency of a design measured ? Derive the expression to measure the efficiency of a Randomised Block Design over a Completely Randomised Design. 10
हिंदी में प्रश्न पढ़ें
(a) एक साधारण रैखिक समाश्रयण निदर्श Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n के लिए (i) माने गए प्रतिबंधों को स्पष्ट लिखते हुए, β₀ और β₁ के न्यूनतम वर्ग आकलकों को व्युत्पन्न कीजिए। (ii) eᵢ = Yᵢ - Ŷᵢ जहाँ Ŷᵢ आसंजित मान है, के लिए दर्शाइए कि 1. Σᵢ₌₁ⁿ eᵢ = 0 2. Σᵢ₌₁ⁿ Yᵢ = Σᵢ₌₁ⁿ Ŷᵢ 3. Σᵢ₌₁ⁿ Xᵢeᵢ = 0 4. Σᵢ₌₁ⁿ Ŷᵢeᵢ = 0 5. समाश्रयण रेखा (X̄, Ȳ) से गुजरती है। 5+5 (b) प्रचलित संकेतों में, यदि v, b, r, k और λ किसी संतुलित अपूर्ण खंडक अभिकल्पना के प्राचल हैं, तो दर्शाइए कि : (i) b ≥ r + 1 ≥ λ + 2 (ii) v ≤ b ≤ (r² - 1)/λ 10 (c) एक बहु रैखिक समाश्रयण निदर्श जिसमें X₁ और X₂ दो प्रावकता चर हैं, के लिए दर्शाइए कि जब भी X₁ और X₂ असहसंबंधित होंगे, समाश्रयण निदर्श में X₂ को जोड़ने पर X₁ के समाश्रयण गुणांक का आकलक अपरिवर्तित रहेगा । 10 (d) प्रतिस्थापन रहित सरल यादृच्छिक प्रतिचयन द्वारा समष्टि की N इकाइयों से n आकार का एक प्रतिदर्श चुना गया । प्रतिस्थापन रहित सरल यादृच्छिक प्रतिचयन द्वारा n इकाइयों से n₁ इकाई का एक उप-प्रतिदर्श चुना गया । माना कि n₁ इकाइयों पर आधारित माध्य को ȳ₁ और n₂ = n - n₁ इकाइयों पर आधारित माध्य को ȳ₂ से व्यक्त किया गया । समष्टि माध्य Ȳₙ का आकलक दिया गया है : Ŷₙ = wȳ₁ + (1-w)ȳ₂ ; 0 < w < 1 दर्शाइए कि E(Ŷₙ) = Ȳₙ, और इसका प्रसरण प्राप्त कीजिए । 10 (e) किसी अभिकल्पना की दक्षता कैसे मापी जाती है ? पूर्णतः यादृच्छिकीकृत अभिकल्पना पर यादृच्छिकीकृत खंडक अभिकल्पना की दक्षता को मापने का व्यंजक व्युत्पन्न कीजिए। 10
Directive word: Derive
This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
Derive requires rigorous step-by-step mathematical proofs with clear logical progression. Allocate time proportionally: ~20% for (a)(i)-(ii) on SLR properties, ~20% for (b) on BIBD inequalities, ~20% for (c) on multiple regression orthogonality, ~20% for (d) on two-phase sampling variance, and ~20% for (e) on design efficiency. Begin each sub-part by stating assumptions, proceed with systematic derivation, and conclude with the required result clearly boxed.
Key points expected
- (a)(i) Correct setup of normal equations minimizing Σ(Yᵢ - β₀ - β₁Xᵢ)²; explicit statement of Gauss-Markov conditions (E(εᵢ)=0, Var(εᵢ)=σ², Cov(εᵢ,εⱼ)=0)
- (a)(ii) All five residual properties proved using normal equations: Σeᵢ=0 from first normal equation; ΣXᵢeᵢ=0 from second; ΣŶᵢeᵢ=0 via substitution; (X̄,Ȳ) on regression line verified
- (b) BIBD parameter relationships: bk=vr, r(k-1)=λ(v-1) used to prove b≥v (Fisher's inequality) and hence b≥r+1≥λ+2; second inequality using r(k-1)=λ(v-1) and k≤v
- (c) Multiple regression: β̂₁ = (S₁₁S₂₂ - S₁₂S₂₂)/(S₁₁S₂₂ - S₁₂²) or equivalent; when S₁₂=0, β̂₁ reduces to S₁₀/S₁₁ = simple regression coefficient
- (d) Two-phase sampling: E(ȳ₁)=Ȳ, E(ȳ₂)=Ȳ shown; E(Ŷₙ)=wȲ+(1-w)Ȳ=Ȳ; variance derived using Var(ȳ₁)=σ²/n₁, Var(ȳ₂)=σ²/n₂ and independence
- (e) Efficiency defined as ratio of variances (or precision); E = Var(CRD)/Var(RBD) = [(σ²+σ²ᵦ)/σ²] × adjustment; derivation using E(MSE) for both designs
- Proper mathematical notation throughout: summation limits, subscripts, expectation and variance operators clearly distinguished
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 10 | All sub-parts open with correct model specification: SLR assumptions explicit in (a), BIBD defining relations bk=vr and r(k-1)=λ(v-1) stated in (b), multiple regression matrix/algebraic form in (c), SRSWOR without replacement noted in (d), and linear model with block effects in (e). | Most setups correct but some assumptions missing (e.g., Gauss-Markov conditions omitted in (a), or sampling scheme not explicitly stated in (d)). | Fundamental setup errors: wrong objective function for least squares, incorrect BIBD relations, or failure to specify with/without replacement in sampling. |
| Method choice | 20% | 10 | Optimal methods selected: normal equations for (a), algebraic manipulation of symmetric BIBD relations for (b), partitioned regression/frisch-waugh for (c), conditional expectation law for (d), and expected mean square decomposition for (e). | Correct but inefficient methods (e.g., direct expansion instead of matrix results in (c), or variance derived without using independence in (d)). | Inappropriate methods: using calculus of variations where simple differentiation suffices, or ignoring design structure in (b)-(e). |
| Computation accuracy | 20% | 10 | All derivations algebraically flawless: normal equations solved correctly, inequalities in (b) rigorously chained, orthogonal case simplification exact in (c), two-phase variance combines within and between components correctly, efficiency ratio simplified to (1+ρ(k-1)) form. | Minor algebraic slips (sign errors, lost terms in summation) that don't affect final conclusions, or correct final answers with gaps in intermediate steps. | Serious computational errors: incorrect solutions to normal equations, wrong variance formula in (d) treating dependent samples as independent, or efficiency expression dimensionally inconsistent. |
| Interpretation | 20% | 10 | Each result interpreted: residual properties show least squares geometry; BIBD inequalities reveal combinatorial constraints; orthogonality in (c) explains additive model; two-phase estimator shows weighting by precision; efficiency quantifies blocking gain. | Some interpretations provided but uneven—mechanical derivation without explaining why results matter (e.g., noting b≥v without calling it Fisher's inequality). | Pure symbol manipulation with zero interpretation; or misinterpretation (e.g., claiming efficiency >1 always, or confusing biased/unbiased estimators). |
| Final answer & units | 20% | 10 | All required results explicitly stated and boxed: β̂₀, β̂₁ formulas in (a)(i); five numbered properties in (a)(ii); both inequalities proved in (b); β̂₁|X₂ = β̂₁|simple shown in (c); E(Ŷₙ)=Ȳ and Var(Ŷₙ) formula in (d); efficiency E = (1+ρ(k-1)) or equivalent in (e). | Final answers present but poorly organized—scattered through text, or some parts complete while others trail off without conclusion. | Missing final answers: derivations without stating what was to be proved, or answers to wrong quantities (e.g., deriving covariance when variance asked). |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2021 Paper I
- Q1 (a) A production unit manufacturing surgical masks is concerned about the quality of their masks. A random sample of n masks are inspected…
- Q2 (a) Let Y₁, Y₂, Y₃, ... be independent and identical Poisson random variables with parameter 1. Use central limit theorem to establish $$n!…
- Q3 (a) Let X and Y be two independent random variables following exponential distribution with mean $\frac{1}{\lambda}$ and $\frac{1}{\mu}$ re…
- Q4 (a) Let X₁, X₂, ..., Xₙ be a random sample from Poisson distribution with mean λ > 0. Define a statistic W = (1 − 1/n)^T, T = Σᵢ₌₁ⁿ Xᵢ (i)…
- Q5 (a) For a simple linear regression model Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n (i) Derive the least square estimators of β₀ and β₁, clearly sta…
- Q6 (a) For a multiple linear regression model with three covariates X₁, X₂ and X₃, let rᵢⱼ denote the correlation coefficient between Xᵢ and X…
- Q7 (a) (i) What is confounding in factorial experiments ? (ii) A $2^6$ factorial experiment is conducted in blocks of size $2^3$. Write the co…
- Q8 (a) (i) In stratified sampling under optimum allocation, how will you proceed to select units from different strata, if one or more nᵢ's ha…