(a) For a simple linear regression model Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n

(i) Derive the least square estimators of β₀ and β₁, clearly stating the conditions assumed.

(ii) For eᵢ = Yᵢ - Ŷᵢ where Ŷᵢ is the fitted value, show that
1. Σᵢ₌₁ⁿ eᵢ = 0
2.

Question

(a) For a simple linear regression model Y = β₀ + β₁Xᵢ + εᵢ, i = 1, ..., n

(i) Derive the least square estimators of β₀ and β₁, clearly stating the conditions assumed.

(ii) For eᵢ = Yᵢ - Ŷᵢ where Ŷᵢ is the fitted value, show that
1. Σᵢ₌₁ⁿ eᵢ = 0
2. Σᵢ₌₁ⁿ Yᵢ = Σᵢ₌₁ⁿ Ŷᵢ
3. Σᵢ₌₁ⁿ Xᵢeᵢ = 0
4. Σᵢ₌₁ⁿ Ŷᵢeᵢ = 0
5. The regression line passes through (X̄, Ȳ). 5+5

(b) In usual notations, if v, b, r, k and λ are the parameters of a Balanced Incomplete Block Design, then show that :
(i) b ≥ r + 1 ≥ λ + 2
(ii) v ≤ b ≤ (r² - 1)/λ
10

(c) For the multiple linear regression model with two predictor variables X₁ and X₂, show that the estimate of regression coefficient of X₁ is unchanged when X₂ is added to the regression model, whenever X₁ and X₂ are uncorrelated.
10

(d) A sample of size n is drawn from a population having N units by simple random sampling without replacement. A sub-sample of n₁ units is drawn from the n units by simple random sampling without replacement. Let ȳ₁ denote the mean based on n₁ units and ȳ₂, the mean based on n₂ = n - n₁ units. Consider the estimator of the population mean Ȳₙ given by :
Ŷₙ = wȳ₁ + (1-w)ȳ₂ ; 0 < w < 1
Show that E(Ŷₙ) = Ȳₙ, and obtain its variance.
10

(e) How is the efficiency of a design measured ? Derive the expression to measure the efficiency of a Randomised Block Design over a Completely Randomised Design. 10

UPSC Answer Check · Accepted Answer

Derive requires rigorous step-by-step mathematical proofs with clear logical progression. Allocate time proportionally: ~20% for (a)(i)-(ii) on SLR properties, ~20% for (b) on BIBD inequalities, ~20% for (c) on multiple regression orthogonality, ~20% for (d) on two-phase sampling variance, and ~20% for (e) on design efficiency. Begin each sub-part by stating assumptions, proceed with systematic derivation, and conclude with the required result clearly boxed.
- (a)(i) Correct setup of normal equations minimizing Σ(Yᵢ - β₀ - β₁Xᵢ)²; explicit statement of Gauss-Markov conditions (E(εᵢ)=0, Var(εᵢ)=σ², Cov(εᵢ,εⱼ)=0)
- (a)(ii) All five residual properties proved using normal equations: Σeᵢ=0 from first normal equation; ΣXᵢeᵢ=0 from second; ΣŶᵢeᵢ=0 via substitution; (X̄,Ȳ) on regression line verified
- (b) BIBD parameter relationships: bk=vr, r(k-1)=λ(v-1) used to prove b≥v (Fisher's inequality) and hence b≥r+1≥λ+2; second inequality using r(k-1)=λ(v-1) and k≤v
- (c) Multiple regression: β̂₁ = (S₁₁S₂₂ - S₁₂S₂₂)/(S₁₁S₂₂ - S₁₂²) or equivalent; when S₁₂=0, β̂₁ reduces to S₁₀/S₁₁ = simple regression coefficient
- (d) Two-phase sampling: E(ȳ₁)=Ȳ, E(ȳ₂)=Ȳ shown; E(Ŷₙ)=wȲ+(1-w)Ȳ=Ȳ; variance derived using Var(ȳ₁)=σ²/n₁, Var(ȳ₂)=σ²/n₂ and independence
- (e) Efficiency defined as ratio of variances (or precision); E = Var(CRD)/Var(RBD) = [(σ²+σ²ᵦ)/σ²] × adjustment; derivation using E(MSE) for both designs
- Proper mathematical notation throughout: summation limits, subscripts, expectation and variance operators clearly distinguished

Dimension	Weight	Max marks	Excellent	Average	Poor
Setup correctness	20%	10	All sub-parts open with correct model specification: SLR assumptions explicit in (a), BIBD defining relations bk=vr and r(k-1)=λ(v-1) stated in (b), multiple regression matrix/algebraic form in (c), SRSWOR without replacement noted in (d), and linear model with block effects in (e).	Most setups correct but some assumptions missing (e.g., Gauss-Markov conditions omitted in (a), or sampling scheme not explicitly stated in (d)).	Fundamental setup errors: wrong objective function for least squares, incorrect BIBD relations, or failure to specify with/without replacement in sampling.
Method choice	20%	10	Optimal methods selected: normal equations for (a), algebraic manipulation of symmetric BIBD relations for (b), partitioned regression/frisch-waugh for (c), conditional expectation law for (d), and expected mean square decomposition for (e).	Correct but inefficient methods (e.g., direct expansion instead of matrix results in (c), or variance derived without using independence in (d)).	Inappropriate methods: using calculus of variations where simple differentiation suffices, or ignoring design structure in (b)-(e).
Computation accuracy	20%	10	All derivations algebraically flawless: normal equations solved correctly, inequalities in (b) rigorously chained, orthogonal case simplification exact in (c), two-phase variance combines within and between components correctly, efficiency ratio simplified to (1+ρ(k-1)) form.	Minor algebraic slips (sign errors, lost terms in summation) that don't affect final conclusions, or correct final answers with gaps in intermediate steps.	Serious computational errors: incorrect solutions to normal equations, wrong variance formula in (d) treating dependent samples as independent, or efficiency expression dimensionally inconsistent.
Interpretation	20%	10	Each result interpreted: residual properties show least squares geometry; BIBD inequalities reveal combinatorial constraints; orthogonality in (c) explains additive model; two-phase estimator shows weighting by precision; efficiency quantifies blocking gain.	Some interpretations provided but uneven—mechanical derivation without explaining why results matter (e.g., noting b≥v without calling it Fisher's inequality).	Pure symbol manipulation with zero interpretation; or misinterpretation (e.g., claiming efficiency >1 always, or confusing biased/unbiased estimators).
Final answer & units	20%	10	All required results explicitly stated and boxed: β̂₀, β̂₁ formulas in (a)(i); five numbered properties in (a)(ii); both inequalities proved in (b); β̂₁\|X₂ = β̂₁\|simple shown in (c); E(Ŷₙ)=Ȳ and Var(Ŷₙ) formula in (d); efficiency E = (1+ρ(k-1)) or equivalent in (e).	Final answers present but poorly organized—scattered through text, or some parts complete while others trail off without conclusion.	Missing final answers: derivations without stating what was to be proved, or answers to wrong quantities (e.g., deriving covariance when variance asked).

Q5

Directive word: Derive

How this answer will be evaluated

Approach

Key points expected

Evaluation rubric

Practice this exact question

More from Statistics 2021 Paper I