(a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata under Neyman allocation are n'_1 and n'_2, and under other type of allocation are n_1 and n_2. Define r = n'_1/n'_2

Question

(a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata under Neyman allocation are n'_1 and n'_2, and under other type of allocation are n_1 and n_2. Define r = n'_1/n'_2 and μ = n_1/(rn_2). Then prove that the efficiency of stratified random sampling with respect to stratified random sampling under Neyman allocation is given by e = μ(r+1)²/((μr + 1)(μ + r)). (20 marks)

(b) A bank has 40000 clients in its computer files, divided into 4000 branches, each managing exactly 10 clients. To estimate the proportion of clients for whom the bank has granted loan, a simple random sample of 40 branches is selected. From the selected sample, for each branch i, a list of clients (A_i) having a loan is prepared; i = 1, 2, ..., 40. The data observed from the selected sample are Σ(i=1 to 40) A_i = 200 and Σ(i=1 to 40) A_i² = 1156.
(i) What type of sampling is this? (3 marks)
(ii) State the expression of the parameter to estimate and obtain its unbiased estimate. (6 marks)
(iii) Estimate the variance of the unbiased estimator obtained in part (ii). (6 marks)

(c) (i) Verify whether the following BIBD are possible: (1) v = b = 22, r = k = 7, λ = 2; (2) v = 10, b = 18, r = 9, k = 5, λ = 4. Given that the design is resolvable.
(ii) Given below is the incidence matrix (N) of a block design. Find the degrees of freedom associated with the adjusted treatment sum of squares and the degrees of freedom for the error sum of squares.

UPSC Answer Check · Accepted Answer

This question demands rigorous mathematical derivation and proof for part (a), followed by applied numerical analysis for parts (b) and (c). Spend approximately 35% of time on part (a) given its 20 marks and proof complexity; allocate 25% to part (b) covering cluster sampling identification, unbiased estimation and variance calculation; and 40% to part (c) on BIBD verification and degrees of freedom computation. Structure as: (a) state assumptions and derive efficiency ratio step-by-step; (b) identify two-stage/cluster sampling, construct appropriate estimators using given sums; (c) verify necessary conditions for BIBD existence and compute rank of C-matrix for degrees of freedom.
- Part (a): Define stratum variances S₁², S₂² and sample sizes under Neyman allocation n'₁ = nS₁/(S₁+S₂), n'₂ = nS₂/(S₁+S₂), then express r = S₁/S₂ and derive Var(ȳ_st) under both allocations to obtain the efficiency formula
- Part (b)(i): Identify this as two-stage sampling (or cluster sampling) where branches are primary units and clients are secondary units, with 4000 first-stage units and 10 second-stage units per cluster
- Part (b)(ii): Parameter is population proportion P = ΣA_i/(MN) where M=10, N=4000; unbiased estimator is p̂ = (ΣA_i)/(mM) = 200/(40×10) = 0.5 where m=40
- Part (b)(iii): Variance estimator requires between-cluster mean square s_b² = [ΣA_i² - (ΣA_i)²/m]/(m-1) = [1156 - 1000]/39 = 4, then v(p̂) = (N-n)s_b²/(NnM²) with finite population correction
- Part (c)(i): Verify BIBD conditions: vr = bk, λ(v-1) = r(k-1), and for resolvable designs b ≥ v + r - 1; Design (1) fails as 22×7 ≠ 22×7 check shows λ(v-1)=42 ≠ r(k-1)=42 actually holds but resolvability requires b≥v+r-1=28 which fails; Design (2) verify 10×9=18×5=90, λ(v-1)=36=r(k-1)=36, and resolvability check
- Part (c)(ii): For given incidence matrix N, compute C = rI_v - Nk⁻¹N' or treatment information matrix, find rank(C) = v-1 for connected design giving adjusted treatment SS df = v-1, error df = n-v-b+1 or appropriate based on design parameters

Dimension	Weight	Max marks	Excellent	Average	Poor
Setup correctness	20%	12	Correctly defines all notation for part (a): stratum sizes N₁, N₂, variances S₁², S₂², and establishes Neyman allocation formulas n'₁ = nN₁S₁/(N₁S₁+N₂S₂) with proper simplification for proportional case; for (b) correctly identifies two-stage sampling structure with M=10, N=4000, m=40; for (c) states all BIBD necessary conditions and resolvability criterion	Defines most quantities correctly but misses finite population correction factor in (b) or confuses stratum allocation formulas in (a); states some BIBD conditions in (c) but omits resolvability check	Incorrect setup: treats (b) as simple random sampling, confuses Neyman with proportional allocation in (a), or fails to identify design parameters in (c)
Method choice	20%	12	For (a) chooses correct approach: express Var(ȳ_st\|Neyman) and Var(ȳ_st\|other) using given ratio μ, then algebraically manipulate to required efficiency form; for (b) applies Hansen-Hurwitz or appropriate two-stage estimator with correct variance decomposition; for (c) uses standard BIBD parametric relations and computes C-matrix rank properly	Uses correct general methods but applies wrong variance formula for two-stage sampling (e.g., uses simple random sampling variance) or makes algebraic shortcuts in (a) that skip critical steps	Wrong methodological approach: attempts direct proof without defining variance expressions, uses cluster sampling formulas for stratified problem, or attempts to verify BIBD by inspection without parametric conditions
Computation accuracy	20%	12	Part (a): flawless algebraic derivation showing Var_Neyman = (S₁+S₂)²/n and Var_other with substitution leading to final e formula; Part (b): accurate calculation p̂=0.5, s_b²=4, and correct variance estimate with FPC; Part (c): precise arithmetic verification of all BIBD conditions and correct rank computation	Minor computational slips: arithmetic error in s_b² calculation (e.g., gets 156/39=4.15 instead of 4), or algebraic manipulation in (a) that nearly reaches form but with sign/coefficient errors; BIBD verification with one condition unchecked	Major computational errors: incorrect p̂ calculation, wrong variance formula application yielding negative variance, or fundamental errors in BIBD parameter verification
Interpretation	20%	12	Interprets efficiency e in (a) showing when e<1 Neyman allocation is superior and discusses boundary cases μ=1, r=1; for (b) interprets sampling design as cost-efficient alternative to SRS given administrative structure; for (c) explains why resolvability imposes additional constraints and interprets degrees of freedom in ANOVA context	Provides minimal interpretation: states efficiency formula without discussing its implications, or identifies sampling type without explaining why two-stage is appropriate for bank structure	No interpretation: leaves results as bare numbers without explaining what efficiency means, why particular sampling was used, or significance of BIBD verification outcome
Final answer & units	20%	12	All final answers boxed/clearly stated: efficiency formula e=μ(r+1)²/((μr+1)(μ+r)) exactly as required; (b)(i) 'Two-stage/cluster sampling', (ii) p̂=0.5 or 50%, (iii) variance estimate with proper expression; (c)(i) clear verdict on each design's possibility with reasons, (ii) specific df values; all proportions dimensionless, variances properly scaled	Final answers present but poorly formatted: efficiency formula correct but not simplified to required form, or variance estimate missing finite population correction factor expression	Missing or wrong final answers: incomplete efficiency expression, wrong sampling type identification, or no degrees of freedom values provided for (c)(ii)

Q7

Directive word: Prove

How this answer will be evaluated

Approach

Key points expected

Evaluation rubric

Practice this exact question

More from Statistics 2024 Paper I