Statistics 2022 Paper I 50 marks Derive

Q8

(a) (i) What are orthogonal polynomials ? How do you fit an orthogonal polynomial of degree 'p' ? (ii) For the model Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, where X_(n×k) is a matrix of rank k (k < n), find out the value of E[Y'(I_n - X(X'X)⁻¹X')Y]. 10+10=20 (b) Consider an artificial population of three farms. Their selection probabilities and the wheat production (in '000 tons) are as follows : Farm unit (i) : 1 2 3; Selection probability (pᵢ) : 0·3 0·2 0·5; Wheat production (yᵢ) : 11 6 25. Draw all possible samples of size 2 with replacement (order is to be considered). Show that Horvitz-Thompson estimator of total wheat production is unbiased. 15 (c) What is a missing plot technique ? Derive the missing value formula for a Latin Square Design. How would you proceed to analyse such a design ? 15

हिंदी में प्रश्न पढ़ें

(a) (i) लांबिक बहुपद क्या हैं ? 'p' घातीय लांबिक बहुपद का आसंजन आप कैसे करेंगे ? (ii) निर्देश Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, जहाँ X_(n×k) (k < n) का एक आयुः है, के लिए E[Y'(I_n - X(X'X)⁻¹X')Y] का मान ज्ञात कीजिए । 10+10=20 (b) तीन फार्मों की एक कृत्रिम समष्टि पर विचार कीजिए । उनकी चयन प्रायिकताएँ और गेहूँ उत्पादन ('000 टन में) निम्न प्रकार हैं : फार्म इकाई (i) : 1 2 3; चयन प्रायिकता (pᵢ) : 0·3 0·2 0·5; गेहूँ उत्पादन (yᵢ) : 11 6 25। आकार 2 के सभी संभावित प्रतिदर्शों को प्रतिस्थापन सहित निकालिए (क्रम पर विचार किया जाना है) । दर्शाइए कि कुल गेहूँ उत्पादन का हॉर्विट्ज़-थॉम्पसन आकलक अनभिनत है । 15 (c) लुप्त खंड तकनीक क्या है ? किसी लैटिन वर्ग अभिकल्पना के लिए लुप्त मान सूत्र व्युत्पन्न कीजिए । ऐसी अभिकल्पना का विश्लेषण करने के लिए आप कैसे अग्रसर होंगे ? 15

Directive word: Derive

This question asks you to derive. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.

See our UPSC directive words guide for a full breakdown of how to respond to each command word.

How this answer will be evaluated

Approach

Begin with (a)(i) defining orthogonal polynomials with the orthogonality condition Σφᵢ(x)φⱼ(x)=0 for i≠j, then describe the recurrence relation method for fitting degree p. For (a)(ii), recognize the residual sum of squares form and apply E[u'Mu]=σ²tr(M) to obtain (n-k)σ². In (b), enumerate all 9 ordered samples with replacement, compute πᵢ=1-(1-pᵢ)² for inclusion probabilities, verify E[Ŷ_HT]=Y. For (c), derive the Latin Square missing value formula ŷ=(R+C+T-2G)/((t-1)(t-2)) and outline the adjusted ANOVA procedure. Allocate ~40% time to (a), ~30% each to (b) and (c).

Key points expected

  • (a)(i) Definition: orthogonal polynomials satisfy Σφᵢ(x)φⱼ(x)=0 for i≠j over the point set; fitting uses recurrence φᵣ₊₁(x)=(x-aᵣ)φᵣ(x)-bᵣφᵣ₋₁(x) with specific coefficient formulas
  • (a)(ii) Recognition that Iₙ-X(X'X)⁻¹X' is the residual maker matrix M; E[Y'MY]=E[u'Mu]=σ²tr(M)=σ²(n-k) using tr(Iₙ)=n and tr(X(X'X)⁻¹X')=k
  • (b) Enumeration of 9 ordered samples: (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3) with their probabilities; calculation of first-order inclusion probabilities πᵢ=1-(1-pᵢ)²; verification that Σ(yᵢ/πᵢ)·πᵢ/Σ1 = Y
  • (c) Missing plot technique: Yates' method for estimating missing observations by minimizing error sum of squares; derivation using ∂SSE/∂y=0 for Latin Square layout
  • (c) Latin Square missing value formula: ŷ = (tRᵢ + tCⱼ + tTₖ - 2G) / [(t-1)(t-2)] where R,C,T are respective totals and G is grand total; analysis proceeds with reduced degrees of freedom and bias correction in treatment SS

Evaluation rubric

DimensionWeightMax marksExcellentAveragePoor
Setup correctness20%10Correctly defines orthogonal polynomials with orthogonality condition in (a)(i); properly identifies M=I-X(X'X)⁻¹X' as idempotent projection matrix in (a)(ii); accurately sets up 9 ordered samples with correct probability structure in (b); correctly states Latin Square model with row, column, treatment effects in (c)Defines orthogonal polynomials vaguely without orthogonality condition; recognizes residual form but misidentifies matrix properties; lists samples but misses some combinations or probability calculations; mentions Latin Square but confuses with RCBDConfuses orthogonal polynomials with simple polynomial regression; fails to identify the matrix form entirely; fundamental errors in sample enumeration; no understanding of missing plot purpose
Method choice20%10Uses recurrence relation method for orthogonal polynomial fitting; applies trace operator property tr(AB)=tr(BA) for (a)(ii); employs Horvitz-Thompson estimator with correct inclusion probabilities in (b); uses calculus-based minimization of SSE for missing value derivation in (c)Describes fitting without clear recurrence; attempts direct expansion for expectation; uses wrong estimator or confuses with SRS; states formula without derivationNo systematic fitting method; brute force matrix multiplication without simplification; completely wrong estimator choice; no derivation attempt for missing value
Computation accuracy20%10Accurate trace calculation yielding (n-k)σ²; correct π₁=0.51, π₂=0.36, π₃=0.75 and verification E[Ŷ_HT]=42; precise algebraic manipulation leading to ŷ=(R+C+T-2G)/((t-1)(t-2)) with correct denominator (t-1)(t-2)Minor arithmetic errors in trace calculation; small errors in inclusion probabilities or verification; algebraic slips in missing value derivation but correct structureMajor computational errors in expectation; wrong probabilities summing incorrectly; fundamentally wrong missing value formula
Interpretation20%10Explains why orthogonal polynomials reduce multicollinearity and computational instability; interprets (n-k)σ² as expected residual SS with n-k degrees of freedom; explains why H-T estimator is design-unbiased for varying probabilities; clearly distinguishes between error df reduction and treatment SS adjustment in missing plot analysisStates computational advantages without explaining multicollinearity; notes answer without degrees of freedom interpretation; states unbiasedness without design-based reasoning; mentions df reduction without explaining whyNo interpretation of computational benefits; no statistical meaning attached to results; no understanding of design-based inference; confused about analysis adjustments
Final answer & units20%10Clear final answers: (a)(ii) E[·]=(n-k)σ²; (b) verified Ŷ_HT unbiased with total production 42 ('000 tons); (c) complete missing value formula with analysis steps including adjusted treatment SS and reduced error df by 1Correct answers but missing units or incomplete final expressions; partial analysis description; missing some components of the final answerMissing or wrong final answers; no units where relevant; incomplete or incorrect analysis procedure

Practice this exact question

Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.

Evaluate my answer →

More from Statistics 2022 Paper I