Q7
(a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata under Neyman allocation are n'_1 and n'_2, and under other type of allocation are n_1 and n_2. Define r = n'_1/n'_2 and μ = n_1/(rn_2). Then prove that the efficiency of stratified random sampling with respect to stratified random sampling under Neyman allocation is given by e = μ(r+1)²/((μr + 1)(μ + r)). (20 marks) (b) A bank has 40000 clients in its computer files, divided into 4000 branches, each managing exactly 10 clients. To estimate the proportion of clients for whom the bank has granted loan, a simple random sample of 40 branches is selected. From the selected sample, for each branch i, a list of clients (A_i) having a loan is prepared; i = 1, 2, ..., 40. The data observed from the selected sample are Σ(i=1 to 40) A_i = 200 and Σ(i=1 to 40) A_i² = 1156. (i) What type of sampling is this? (3 marks) (ii) State the expression of the parameter to estimate and obtain its unbiased estimate. (6 marks) (iii) Estimate the variance of the unbiased estimator obtained in part (ii). (6 marks) (c) (i) Verify whether the following BIBD are possible: (1) v = b = 22, r = k = 7, λ = 2; (2) v = 10, b = 18, r = 9, k = 5, λ = 4. Given that the design is resolvable. (ii) Given below is the incidence matrix (N) of a block design. Find the degrees of freedom associated with the adjusted treatment sum of squares and the degrees of freedom for the error sum of squares.
हिंदी में प्रश्न पढ़ें
(a) एक बहुत बड़ी समष्टि को दो स्तरों में विभाजित किया गया है। नेमन नियतन के अनुसार, दो स्तरों के लिए, आमाप n के स्तरीत यादृच्छिक प्रतिदर्श की इकाइयों के नियतन n'_1 और n'_2 हैं और दूसरे प्रकार की नियतन विधि के अनुसार n_1 तथा n_2 हैं। r = n'_1/n'_2 तथा μ = n_1/(rn_2) को परिभाषित कीजिए। तब सिद्ध कीजिए कि नेमन नियतन के अंतर्गत स्तरीत यादृच्छिक प्रतिचयन के सापेक्ष स्तरीत यादृच्छिक प्रतिचयन की दक्षता e = μ(r+1)²/((μr + 1)(μ + r)) है। (20 अंक) (b) एक बैंक में इसके कम्प्यूटर की फाइलों में 40000 ग्राहक हैं, जिनको 4000 शाखाओं में बाँटा गया है, प्रत्येक शाखा ठीक 10 ग्राहकों का प्रबंध करती है। जिन ग्राहकों को बैंक का ऋण दिया गया है, उनके अनुपात का आकलन करने के लिए 40 शाखाओं का एक सरल यादृच्छिक प्रतिदर्श चुना गया है। चुने गये प्रतिदर्श में से, प्रत्येक शाखा i के लिए ग्राहकों, जिन्होंने ऋण लिया है, उनकी एक सूची (A_i) तैयार की गई है; i = 1, 2, ..., 40 है। चयनित प्रतिदर्श से प्रेक्षित आँकड़े Σ(i=1 से 40) A_i = 200 और Σ(i=1 से 40) A_i² = 1156 प्राप्त हुए हैं। (i) यह किस प्रकार का प्रतिचयन है? (3 अंक) (ii) प्राचल जिसका आकलन करना है, उसके लिए व्यंजक (एक्सप्रेशन) लिखिए और उसका अनभिनत आकलक ज्ञात कीजिए। (6 अंक) (iii) भाग (ii) में प्राप्त अनभिनत आकलक के विचरण का आकलन कीजिए। (6 अंक) (c) (i) सत्यापित कीजिए कि क्या नीचे दिये गये BIBD संभव हैं: (1) v = b = 22, r = k = 7, λ = 2; (2) v = 10, b = 18, r = 9, k = 5, λ = 4। दिया गया है कि अभिकल्पना विभोज्य है। (ii) नीचे एक खंडक अभिकल्पना का आपतन आव्यूह (N) दिया गया है। समायोजित उपचार वर्गों के योग से संबद्ध स्वातंत्र्य-कोटियाँ और त्रुटि वर्गों के योग के लिए स्वातंत्र्य-कोटियाँ ज्ञात कीजिए।
Directive word: Prove
This question asks you to prove. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
This question demands rigorous mathematical derivation and proof for part (a), followed by applied numerical analysis for parts (b) and (c). Spend approximately 35% of time on part (a) given its 20 marks and proof complexity; allocate 25% to part (b) covering cluster sampling identification, unbiased estimation and variance calculation; and 40% to part (c) on BIBD verification and degrees of freedom computation. Structure as: (a) state assumptions and derive efficiency ratio step-by-step; (b) identify two-stage/cluster sampling, construct appropriate estimators using given sums; (c) verify necessary conditions for BIBD existence and compute rank of C-matrix for degrees of freedom.
Key points expected
- Part (a): Define stratum variances S₁², S₂² and sample sizes under Neyman allocation n'₁ = nS₁/(S₁+S₂), n'₂ = nS₂/(S₁+S₂), then express r = S₁/S₂ and derive Var(ȳ_st) under both allocations to obtain the efficiency formula
- Part (b)(i): Identify this as two-stage sampling (or cluster sampling) where branches are primary units and clients are secondary units, with 4000 first-stage units and 10 second-stage units per cluster
- Part (b)(ii): Parameter is population proportion P = ΣA_i/(MN) where M=10, N=4000; unbiased estimator is p̂ = (ΣA_i)/(mM) = 200/(40×10) = 0.5 where m=40
- Part (b)(iii): Variance estimator requires between-cluster mean square s_b² = [ΣA_i² - (ΣA_i)²/m]/(m-1) = [1156 - 1000]/39 = 4, then v(p̂) = (N-n)s_b²/(NnM²) with finite population correction
- Part (c)(i): Verify BIBD conditions: vr = bk, λ(v-1) = r(k-1), and for resolvable designs b ≥ v + r - 1; Design (1) fails as 22×7 ≠ 22×7 check shows λ(v-1)=42 ≠ r(k-1)=42 actually holds but resolvability requires b≥v+r-1=28 which fails; Design (2) verify 10×9=18×5=90, λ(v-1)=36=r(k-1)=36, and resolvability check
- Part (c)(ii): For given incidence matrix N, compute C = rI_v - Nk⁻¹N' or treatment information matrix, find rank(C) = v-1 for connected design giving adjusted treatment SS df = v-1, error df = n-v-b+1 or appropriate based on design parameters
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 12 | Correctly defines all notation for part (a): stratum sizes N₁, N₂, variances S₁², S₂², and establishes Neyman allocation formulas n'₁ = nN₁S₁/(N₁S₁+N₂S₂) with proper simplification for proportional case; for (b) correctly identifies two-stage sampling structure with M=10, N=4000, m=40; for (c) states all BIBD necessary conditions and resolvability criterion | Defines most quantities correctly but misses finite population correction factor in (b) or confuses stratum allocation formulas in (a); states some BIBD conditions in (c) but omits resolvability check | Incorrect setup: treats (b) as simple random sampling, confuses Neyman with proportional allocation in (a), or fails to identify design parameters in (c) |
| Method choice | 20% | 12 | For (a) chooses correct approach: express Var(ȳ_st|Neyman) and Var(ȳ_st|other) using given ratio μ, then algebraically manipulate to required efficiency form; for (b) applies Hansen-Hurwitz or appropriate two-stage estimator with correct variance decomposition; for (c) uses standard BIBD parametric relations and computes C-matrix rank properly | Uses correct general methods but applies wrong variance formula for two-stage sampling (e.g., uses simple random sampling variance) or makes algebraic shortcuts in (a) that skip critical steps | Wrong methodological approach: attempts direct proof without defining variance expressions, uses cluster sampling formulas for stratified problem, or attempts to verify BIBD by inspection without parametric conditions |
| Computation accuracy | 20% | 12 | Part (a): flawless algebraic derivation showing Var_Neyman = (S₁+S₂)²/n and Var_other with substitution leading to final e formula; Part (b): accurate calculation p̂=0.5, s_b²=4, and correct variance estimate with FPC; Part (c): precise arithmetic verification of all BIBD conditions and correct rank computation | Minor computational slips: arithmetic error in s_b² calculation (e.g., gets 156/39=4.15 instead of 4), or algebraic manipulation in (a) that nearly reaches form but with sign/coefficient errors; BIBD verification with one condition unchecked | Major computational errors: incorrect p̂ calculation, wrong variance formula application yielding negative variance, or fundamental errors in BIBD parameter verification |
| Interpretation | 20% | 12 | Interprets efficiency e in (a) showing when e<1 Neyman allocation is superior and discusses boundary cases μ=1, r=1; for (b) interprets sampling design as cost-efficient alternative to SRS given administrative structure; for (c) explains why resolvability imposes additional constraints and interprets degrees of freedom in ANOVA context | Provides minimal interpretation: states efficiency formula without discussing its implications, or identifies sampling type without explaining why two-stage is appropriate for bank structure | No interpretation: leaves results as bare numbers without explaining what efficiency means, why particular sampling was used, or significance of BIBD verification outcome |
| Final answer & units | 20% | 12 | All final answers boxed/clearly stated: efficiency formula e=μ(r+1)²/((μr+1)(μ+r)) exactly as required; (b)(i) 'Two-stage/cluster sampling', (ii) p̂=0.5 or 50%, (iii) variance estimate with proper expression; (c)(i) clear verdict on each design's possibility with reasons, (ii) specific df values; all proportions dimensionless, variances properly scaled | Final answers present but poorly formatted: efficiency formula correct but not simplified to required form, or variance estimate missing finite population correction factor expression | Missing or wrong final answers: incomplete efficiency expression, wrong sampling type identification, or no degrees of freedom values provided for (c)(ii) |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2024 Paper I
- Q1 (a) Two events A and B are such that P(A) = 1/3, P(B) = 1/4 and P(A|B) + P(B|A) = 2/3. Evaluate the following: (i) P(A^c ∪ B^c) (5 marks) (…
- Q2 (a) Let the joint probability density function of two random variables X and Y be f(x, y) = x/3, for 0 < 2x < 3y < 6; 0, otherwise. Compute…
- Q3 (a) Let moment generating function of random variable X exist in the neighbourhood of zero and if $$E(X^n) = \frac{1}{5} + (-1)^n \frac{2}{…
- Q4 (a) Find the most powerful test of size α(= 0·05) for testing H₀: μ = 0 vs. H₁: μ = 1, given a random sample of size 25 from N(μ, 16) popul…
- Q5 (a) How will you justify the usage of the principle of least squares in estimating the parameters of a linear regression model? With usual…
- Q6 (a) (X, Y) has bivariate normal distribution BN(μ₁, μ₂, σ₁², σ₂², ρ). (i) Show that X and Y are independent if and only if ρ = 0. (6 marks)…
- Q7 (a) A very big population is divided into two strata. The allocation of units of stratified random sample of size n for the two strata unde…
- Q8 (a) A 2²-factorial design was used to develop the yield of a crop. Two factors A and B were used at two levels: low (–1) and high (+1). The…