Statistics 2021 Paper I 50 marks Solve

Q8

(a) (i) In stratified sampling under optimum allocation, how will you proceed to select units from different strata, if one or more nᵢ's happens to be greater than Nᵢ (i ≥ 2) ? (ii) A sample survey was conducted in a certain district of Himachal Pradesh. Four strata A, B, C and D of villages were formed according to the acreage of fruit trees as obtained from revenue records. A random sample of villages was selected from each stratum and the number of apple orchards in each selected village was noted. The data are shown below : | Stratum | Total number of villages (Nᵢ) | Number of villages in sample (nᵢ) | Number of orchards in the selected villages | |---------|------------------------------|-----------------------------------|---------------------------------------------| | A (0 – 3 acres) | 275 | 15 | 2, 5, 1, 9, 6, 7, 0, 4, 7, 0, 5, 0, 0, 3, 0 | | B (3 – 6 acres) | 146 | 10 | 21, 11, 7, 5, 6, 19, 5, 24, 30, 24 | | C (6 – 15 acres) | 93 | 12 | 3, 10, 4, 11, 38, 11, 4, 46, 4, 18, 1, 39 | | D (15 acres and above) | 62 | 11 | 30, 42, 20, 38, 29, 22, 31, 28, 66, 14, 15 | Estimate the number of orchards in the district. (b) (i) For a second order polynomial model with one predictor variable, derive the least squares normal equations clearly stating the conditions assumed. How will you interpret the parameters in this model ? (ii) Describe why it is recommended to work with predictor variables centred around the mean. Comment on fitted values of the response variable in this case. Prove your claim. (c) What are split-plot designs ? When do you recommend the use of such designs ? If e₁ and e₂ are the main plot and sub-plot errors respectively, both estimated in units of a single sub-plot, explain why e₁ is expected to be larger than e₂.

हिंदी में प्रश्न पढ़ें

(a) (i) स्तरीत प्रतिचयन में अनुकूलतम नियतन के अंतर्गत यदि एक या अधिक nᵢ, Nᵢ (i ≥ 2) से ज्यादा बड़े हैं, तो आप विभिन्न स्तरों से इकाइयों का चयन किस प्रकार करेंगे ? (ii) हिमाचल प्रदेश के किसी जिले में एक प्रतिदर्श सर्वेक्षण किया गया । राजस्व अभिलेखों द्वारा प्राप्त फलदार पेड़ों के क्षेत्रफल के आधार पर गाँवों के चार स्तर A, B, C और D बनाए गए । प्रत्येक स्तर से गाँवों का एक यादृच्छिक प्रतिदर्श चुना गया और प्रत्येक चुने गए गाँव से सेब के बगीचों की संख्या लिखी गई । आँकड़े नीचे दर्शाए गए हैं : | स्तर | गाँवों की कुल संख्या (Nᵢ) | प्रतिदर्श में गाँवों की संख्या (nᵢ) | चुने गए गाँवों में बगीचों की संख्या | |-----|------------------------|-------------------------------|--------------------------------| | A (0 – 3 एकड़) | 275 | 15 | 2, 5, 1, 9, 6, 7, 0, 4, 7, 0, 5, 0, 0, 3, 0 | | B (3 – 6 एकड़) | 146 | 10 | 21, 11, 7, 5, 6, 19, 5, 24, 30, 24 | | C (6 – 15 एकड़) | 93 | 12 | 3, 10, 4, 11, 38, 11, 4, 46, 4, 18, 1, 39 | | D (15 एकड़ और अधिक) | 62 | 11 | 30, 42, 20, 38, 29, 22, 31, 28, 66, 14, 15 | जिले में बगीचों की संख्या का आकलन कीजिए । (b) (i) द्विघातीय बहुपद निर्देश जिसमें एक प्रावकता चर है, के लिए माने गए प्रतिबंधों को स्पष्ट लिखते हुए, न्यूनतम वर्ग प्रसामान्य समीकरण व्युत्पन्न कीजिए । आप इस निर्देश में प्राचलों की व्याख्या कैसे करेंगे ? (ii) वर्णन कीजिए कि क्यों माध्य के परितः केंद्रित प्रावकता चरों को संस्तुत किया जाता है । इस विषय में अनुक्रिया चर के आसंगित मानों पर टिप्पणी लिखिए । अपने दावे को सिद्ध कीजिए । (c) विभक्त-क्षेत्र अभिकल्पनाएँ क्या हैं ? आप इन अभिकल्पनाओं के उपयोग को कब संस्तुत करेंगे ? यदि e₁ और e₂ क्रमशः मुख्य क्षेत्र और उप-क्षेत्र त्रुटियाँ हैं, दोनों ही एकल उप-क्षेत्र इकाइयों में आकलित हैं, तो स्पष्ट कीजिए कि क्यों e₁, e₂ से अधिक बड़ा अनुमानित होता है ।

Directive word: Solve

This question asks you to solve. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.

See our UPSC directive words guide for a full breakdown of how to respond to each command word.

How this answer will be evaluated

Approach

This multi-part question demands solving numerical problems alongside theoretical derivations and explanations. Allocate approximately 35% effort to part (a) combining optimum allocation adjustment and stratified estimation with Himachal Pradesh data; 35% to part (b) covering polynomial regression derivation, centering benefits, and proof; and 30% to part (c) explaining split-plot designs with error comparison. Structure as: brief theoretical setup → step-by-step calculations/derivations → interpretation of results in context.

Key points expected

  • For (a)(i): Explain the iterative adjustment procedure when nᵢ > Nᵢ in optimum allocation—set nᵢ = Nᵢ for such strata, recompute allocation for remaining strata using revised formula, and repeat until all nᵢ ≤ Nᵢ
  • For (a)(ii): Calculate stratum means, apply Neyman or proportional allocation weights, compute stratified estimate Ŷ = ΣNᵢȳᵢ with standard error, and present final estimate of total orchards in the district
  • For (b)(i): Derive normal equations for Y = β₀ + β₁X + β₂X² by minimizing Σ(Yᵢ - β₀ - β₁Xᵢ - β₂Xᵢ²)²; interpret β₀ as response at X=0, β₁ as linear rate of change, β₂ as curvature/acceleration
  • For (b)(ii): Explain that centering (X - X̄) eliminates correlation between linear and quadratic terms, stabilizes variance-covariance matrix; prove fitted values remain identical using algebraic expansion showing predicted Y unchanged
  • For (c): Define split-plot designs as experiments with two sizes of experimental units where whole plots receive one factor and sub-plots receive another; recommend when one factor is harder/costlier to change; explain e₁ > e₂ due to additional whole-plot error component from main plot-to-main plot variation

Evaluation rubric

DimensionWeightMax marksExcellentAveragePoor
Setup correctness20%10Correctly identifies all components: for (a) recognizes optimum allocation violation condition and applies iterative adjustment; for (b) properly specifies polynomial model assumptions (independence, homoscedasticity, normality of errors); for (c) accurately distinguishes whole-plot and sub-plot error structuresBasic identification of components with minor errors in stating conditions or missing one assumption; partial recognition of the nᵢ > Nᵢ problem without clear resolution stepsMisidentifies key elements—confuses optimum with proportional allocation, omits essential assumptions for least squares, or fails to distinguish error types in split-plot design
Method choice20%10Selects appropriate methodology throughout: iterative Neyman allocation adjustment for (a)(i), stratified mean estimation with correct weighting for (a)(ii), matrix/algebraic derivation approach for (b), and clear experimental design principles for (c)Generally correct methods with suboptimal choices—e.g., using proportional instead of optimum allocation, or descriptive rather than rigorous proof approach for centeringIncorrect methods selected—simple random sampling formulas for stratified data, ordinary regression without polynomial terms, or confused error structure explanation
Computation accuracy20%10Precise calculations: correct stratum means (A: 3.27, B: 15.2, C: 15.75, D: 30.45), accurate weighted total estimate (~8,500-9,000 orchards), correct normal equation derivation with proper partial derivatives, and valid algebraic proof for centering invarianceMinor computational slips—arithmetic errors in stratum totals, slightly incorrect weights, or algebraic omissions in derivation that don't invalidate core logicMajor computational errors—wrong stratum means, incorrect finite population correction application, fundamentally flawed normal equations, or invalid proof structure
Interpretation20%10Rich contextual interpretation: explains why iterative allocation preserves optimality, interprets Himachal Pradesh estimate in agricultural planning context, clarifies practical meaning of polynomial parameters (turning points, rates), and relates e₁ > e₂ to precision trade-offs in agricultural experimentsAdequate interpretation with limited context—mechanical parameter definitions without practical insight, or generic statements about split-plots without experimental relevanceMissing or incorrect interpretation—fails to explain what parameters mean for policy/management, or provides no practical significance for error differences
Final answer & units20%10Complete, labeled answers: explicit final orchard estimate with standard error and confidence interpretation for (a); clear statement that fitted values are invariant under centering for (b); precise recommendation conditions and error inequality justification for (c)Present but incomplete answers—numerical estimate without uncertainty measure, stated invariance without proof completion, or partial error comparisonMissing final answers, wrong units (orchards vs. orchards per village), or completely unjustified claims about error relationships

Practice this exact question

Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.

Evaluate my answer →

More from Statistics 2021 Paper I