Statistics

UPSC Statistics 2023 — Paper II

All 8 questions from UPSC Civil Services Mains Statistics 2023 Paper II (400 marks total). Every stem reproduced in full, with directive-word analysis, marks, word limits, and answer-approach pointers.

8Questions

400Total marks

2023Year

Paper IIPaper

Topics covered

Statistical Quality Control, Reliability, Linear Programming, Game Theory, Markov Chains (1)Control Charts for Fraction Defective and Replacement Policy (1)Linear programming, assignment problem, EOQ model (1)Transportation problem, normal distribution, queuing theory (1)Psychometrics, population growth, epidemiological rates (1)Time series analysis, least squares, heteroscedasticity (1)Life table functions and standard scores (1)Agricultural statistics, fertility rates, and econometric model identification (1)

A

50M Compulsory solve Statistical Quality Control, Reliability, Linear Programming, Game Theory, Markov Chains

(a) What do you understand by Statistical Quality Control (SQC)? Discuss briefly its need and utility in Industry. Discuss the causes of variation in quality. (10 marks) (b) Consider an item with failure rate $Z(t) = \frac{t}{t+1}$. Write down the survivor function $R(t)$ and hence evaluate Mean Time To Failure (MTTF). Also obtain the conditional survival function and Mean Residual Life (MRL). (10 marks) (c) Solve the following linear programming problem by using graphical approach: Minimize $4x_1 + 5x_2 + 6x_3$ Subject to $x_1 + x_2 \geq 11$ $x_1 - x_2 \leq 5$ $x_3 - x_1 - x_2 = 0$ $7x_1 + 12x_2 \geq 35$ $x_1 \geq 0, x_2 \geq 0, x_3 \geq 0$ (10 marks) (d) In a two-person zero-sum game, write the payoff matrix in general notation. Consider the two-person zero-sum game where each player tosses an unbiased coin simultaneously. Player B pays ₹7 to A if {H, H} occurs or {T, T} occurs otherwise player A pays ₹3 to B. Write down A's payoff matrix. Explain the Max Min criterion for player A and hence define the saddle point. (10 marks) (e) Let Xₜ be the state of a flea at time t Find the transition Matrix P. Also obtain Pᵣ[X₂ = 3 | X₀ = 1]. (10 marks)

हिंदी में पढ़ें

(a) सांख्यिकी गुणवत्ता नियंत्रण (एस. क्यू. सी.) से आप क्या समझते हैं ? उद्योग में इसकी आवश्यकता एवं उपयोगिता पर संक्षेप में चर्चा कीजिए । गुणवत्ता में परिवर्तन के कारणों पर चर्चा कीजिए । (10 अंक) (b) विफलता दर $Z(t) = \frac{t}{t+1}$ वाले किसी वस्तु (आइटम) पर विचार कीजिए । उत्तरजीविता फलन $R(t)$ लिखिए और इस तरह विफलता तक माध्य काल (एम.टी.टी.एफ.) ज्ञात कीजिए । सप्रतिबन्ध उत्तरजीविता फलन एवं औसत अवशिष्ट जीवन (एम.आर.एल.) भी ज्ञात कीजिए । (10 अंक) (c) निम्नलिखित रैखिक प्रोग्रामन समस्या को ग्राफी विधि का उपयोग करके हल कीजिए : न्यूनतमीकरण $4x_1 + 5x_2 + 6x_3$ निम्न प्रतिबन्धों के अन्तर्गत $x_1 + x_2 \geq 11$ $x_1 - x_2 \leq 5$ $x_3 - x_1 - x_2 = 0$ $7x_1 + 12x_2 \geq 35$ $x_1 \geq 0, x_2 \geq 0, x_3 \geq 0$ (10 अंक) (d) द्वि-व्यक्ति शून्य-योगी खेल में, सामान्य संकेतन में भुगतान आव्यूह लिखिए । द्वि-व्यक्ति शून्य-योगी खेल पर विचार करें जहाँ प्रत्येक खिलाड़ी एक साथ ही एक निष्पक्ष सिक्का उछालता है । खिलाड़ी $B$, $A$ को 7 रुपये का भुगतान करता है यदि $\{H, H\}$ घटित होता है या $\{T, T\}$ घटित होता है अन्यथा खिलाड़ी $A$, $B$ को 3 रुपये का भुगतान करता है । $A$ का भुगतान आव्यूह लिखिए । खिलाड़ी $A$ के लिए अधिकतम-न्यूनतम (मैक्स मिन) निकष की व्याख्या कीजिए और इस तरह पल्यायन बिन्दु को परिभाषित कीजिए । (10 अंक) (e) मान लीजिए कि समय t पर Xₜ एक पिस्सू की अवस्था है। संक्रमण आव्यूह P ज्ञात कीजिए। Pᵣ[X₂ = 3 | X₀ = 1] भी प्राप्त कीजिए। (10 अंक)

Answer approach & key points

This multi-part question requires solving five distinct problems: (a) discuss SQC concepts with industrial applications, (b) derive reliability functions from given failure rate, (c) solve LP graphically, (d) construct and analyze game matrix, and (e) compute Markov chain probabilities. Allocate approximately 15-18 minutes per part, with extra attention to (c) and (e) where computational errors are common. Begin each part clearly labeled, show all derivation steps, and conclude with boxed final answers.

Part (a): Define SQC as statistical methods for maintaining quality standards; explain need (mass production complexity, cost reduction, customer satisfaction) and utility (process control, acceptance sampling); classify variations into chance causes (random, inherent) and assignable causes (identifiable, correctable)
Part (b): Derive R(t) = exp(-∫Z(t)dt) = (t+1)e^(-t); compute MTTF = ∫R(t)dt = 2; obtain conditional survival R(x|t) = R(t+x)/R(t) and MRL = ∫R(x|t)dx
Part (c): Use constraint x₃ = x₁ + x₂ to reduce to 2-variable problem; minimize 10x₁ + 11x₂; identify feasible region vertices from intersection of x₁+x₂≥11, x₁-x₂≤5, 7x₁+12x₂≥35; optimal solution at (3,8) with value 118
Part (d): General payoff matrix [aᵢⱼ] where i=1,...,m strategies for A, j=1,...,n for B; specific matrix with entries +7 for (H,H) and (T,T), -3 otherwise; apply Max Min: maximize minimum row payoff; saddle point exists if Max Min = Min Max
Part (e): Construct transition matrix P from flea movement probabilities (typically given in diagram); compute P² and extract P[X₂=3|X₀=1] = (P²)₁₃ using Chapman-Kolmogorov equations

Open full rubric & evaluate →

50M explain Control Charts for Fraction Defective and Replacement Policy

(a) What do you understand by control chart for fraction defective? Explain its construction. Give the theoretical distribution on which the control limits are based. (15 marks) (b) Each day a sample of 50 items from the production process was examined. The number of defectives found in each sample was as follows: | Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |-----|---|---|---|---|---|---|---|---|---|----|----|----| | No. of Defectives | 6 | 2 | 5 | 1 | 2 | 2 | 3 | 5 | 3 | 4 | 12 | 4 | | Day | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | |-----|----|----|----|----|----|----|----|----|----|----|----|----| | No. of Defectives | 4 | 1 | 3 | 5 | 4 | 1 | 4 | 3 | 5 | 4 | 2 | 3 | Draw a suitable control chart and check for control. What control limits would you suggest for subsequent use? (15 marks) (c) A factory has 1000 bulbs installed. Cost of individual replacement is US $3 while cost of that of group replacement is US $1 per bulb respectively. It is decided to replace all the bulbs simultaneously at fixed interval and also to replace the individual bulbs that fall in between. Determine the optimum replacement policy. Failure probability are given below: | Week | 1 | 2 | 3 | 4 | 5 | |------|-----|------|------|------|------| | Failure probability(p) | 0·10 | 0·25 | 0·50 | 0·70 | 1·00 | (20 marks)

हिंदी में पढ़ें

(a) दुष्पितानुपात के लिए नियंत्रण सांचित्र से आप क्या समझते हैं? इसके निर्माण की व्याख्या करें। सैद्धांतिक बंटन को बताइए जिस पर नियंत्रण सीमाएं आधारित होती हैं। (15 अंक) (b) प्रत्येक दिन उत्पादन प्रक्रिया से 50 वस्तुओं के प्रतिदर्श की जांच की गई। प्रत्येक प्रतिदर्श में दोषपूर्ण उत्पाद की संख्या निम्नांकित पाई गई: | दिन | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | | दोषपूर्ण की संख्या | 6 | 2 | 5 | 1 | 2 | 2 | 3 | 5 | 3 | 4 | 12 | 4 | | दिन | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | | दोषपूर्ण की संख्या | 4 | 1 | 3 | 5 | 4 | 1 | 4 | 3 | 5 | 4 | 2 | 3 | एक उपयुक्त नियंत्रण सांचित्र बनाइए और नियंत्रण के लिए जांच कीजिए। कौन सी नियंत्रण सीमाएं आप पर्वर्ती उपयोग के लिए सुझाएंगे? (15 अंक) (c) एक फैक्ट्री में 1000 बल्ब लगे हैं । व्यक्तिगत प्रतिस्थापन की लागत अमरीकी डालर $3 है जबकि समूह प्रतिस्थापन की लागत अमरीकी डालर $1 प्रति बल्ब है । निश्चित अंतराल पर सभी बल्बों को एक साथ बदलने का निर्णय लिया गया और इसके अतिरिक्त बीच में फ्यूज होने वाले अलग-अलग बल्बों को बदलने के लिए भी निर्णय लिया गया । इष्टतम प्रतिस्थापन नीति निर्धारित कीजिए । विफलता प्रायिकता नीचे दी गई है : | सप्ताह | 1 | 2 | 3 | 4 | 5 | |--------|-----|------|------|------|------| | विफलता प्रायिकता(p) | 0·10 | 0·25 | 0·50 | 0·70 | 1·00 | (20 अंक)

Answer approach & key points

Explain the theoretical foundations of p-charts in part (a), then solve the numerical problems in (b) and (c) with systematic working. Allocate approximately 25-30% time to (a) as it requires conceptual elaboration, 30-35% to (b) for control chart construction and interpretation, and 35-40% to (c) as it carries the highest marks and involves multi-step replacement policy optimization. Present calculations in tabular format where possible and conclude with clear managerial recommendations.

Part (a): Definition of control chart for fraction defective (p-chart), construction steps using sample proportion p̂ = d/n, and identification of Binomial distribution as the theoretical basis with Normal approximation for large samples
Part (b): Calculation of center line (CL = p̄), control limits UCL/LCL = p̄ ± 3√[p̄(1-p̄)/n], plotting of 24 sample points, identification of Day 11 as out-of-control, and revised limits after removing assignable cause
Part (c): Computation of expected failures Np, N·q·p₂, etc., individual replacement cost, group replacement cost for each policy period, and determination of optimal replacement interval at minimum average cost per week
Correct handling of variable control limits when sample sizes differ (though here n=50 constant), and recognition that p-chart is appropriate for attribute data with varying sample sizes
Economic interpretation: trade-off between individual replacement flexibility and group replacement economies of scale, with explicit cost comparison across weeks 1-5

Open full rubric & evaluate →

50M solve Linear programming, assignment problem, EOQ model

(a) Solve the following Linear Programming problem using Two Phase method : Maximize Z = 3x₁ - x₂ Subject to 2x₁ + x₂ ≥ 2, x₁ + 3x₂ ≤ 2, x₂ ≤ 4, x₁ ≥ 0, x₂ ≥ 0 (b)(i) Solve the above assignment problem. Cell values represent cost of assigning job A, B, C and D to the machines I, II, III and IV. (b)(ii) Write down the dual for the given primal problem. Max Z = 6x₁ - 5x₂ + 7x₃ + x₄ Subject to 2x₁ + 4x₂ - x₃ + x₄ ≤ 4, x₁ - x₂ + 6x₃ + 7x₄ ≥ 5, 2x₁ + 2x₂ + 4x₃ + 5x₄ = 6, x₁ + 8x₂ + x₃ = 7; x₁ and x₄ unrestricted, x₂ ≥ 0, x₃ ≥ 0 (c) What is a basic Economic Order Quantity (EOQ) model in Inventory Control and state the assumption made. A Company estimates that it will sell 12000 units of its products for the forthcoming year. The ordering cost is ₹100 per order and the carrying cost per year is 20% of the purchase price per unit. The purchase price per unit is ₹50. Find (i) EOQ (ii) Number of orders per year (iii) Time between successive orders.

हिंदी में पढ़ें

(a) द्विप्रावस्था विधि का उपयोग करके निम्नलिखित रैखिक प्रोग्रामन समस्या को हल कीजिए : अधिकतमीकरण Z = 3x₁ - x₂ निम्न प्रतिबंधों के अंतर्गत 2x₁ + x₂ ≥ 2, x₁ + 3x₂ ≤ 2, x₂ ≤ 4, x₁ ≥ 0, x₂ ≥ 0 (b)(i) निम्नलिखित नियत समस्या को हल कीजिए । प्रत्येक मान मशीनों I, II, III और IV को कार्य A, B, C और D सौंपने की लागतों को दर्शाता है । (b)(ii) दी गई आद्य समस्या के लिए द्वैति लिखिए : अधिकतमीकरण Z = 6x₁ - 5x₂ + 7x₃ + x₄ निम्न प्रतिबंधों के अंतर्गत 2x₁ + 4x₂ - x₃ + x₄ ≤ 4, x₁ - x₂ + 6x₃ + 7x₄ ≥ 5, 2x₁ + 2x₂ + 4x₃ + 5x₄ = 6, x₁ + 8x₂ + x₃ = 7; x₁ और x₄ अप्रतिबंधित, x₂ ≥ 0, x₃ ≥ 0 (c) तालिका नियंत्रण में मूलभूत आर्थिक आदेश मात्रा (ई.ओ.क्यू.) मॉडल क्या है और इसमें ली गई अभिधारणा को बताइए । एक कंपनी का अनुमान है कि वह अपने उत्पादों की 12000 इकाइयाँ आगामी वर्ष में बेचेगी । आदेश लागत 100 रुपये प्रति आदेश है और प्रति वर्ष ले जाने की लागत खरीद मूल्य का 20 प्रतिशत प्रति इकाई है । खरीद मूल्य 50 रुपये प्रति इकाई है । ज्ञात कीजिए (i) आर्थिक आदेश मात्रा (ई.ओ.क्यू.) (ii) प्रति वर्ष आदेशों (ऑर्डरों) की संख्या (iii) क्रमिक आदेशों के बीच का समय ।

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 40% time to part (a) Two-Phase method as it requires extensive tableau iterations, 35% to part (c) EOQ calculations with clear formula application, and 25% to part (b) covering both assignment problem and dual formulation. Begin with clear problem identification for each part, show complete step-by-step working with proper tableaus for (a), cost matrix reduction for (b)(i), systematic dual conversion rules for (b)(ii), and standard EOQ model derivation followed by substitution for (c). Conclude each part with boxed final answers and appropriate units.

Part (a): Convert LPP to standard form by introducing surplus, slack and artificial variables; set up Phase I objective to minimize artificial variable; execute simplex iterations till feasibility; proceed to Phase II with original objective; identify optimal solution at x₁ = 2, x₂ = 0, Z = 6
Part (b)(i): Apply Hungarian algorithm to 4×4 cost matrix — row reduction, column reduction, minimum lines to cover zeros, adjust matrix, make optimal assignments; state final assignment with minimum total cost
Part (b)(ii): Convert primal to dual by transforming maximization to minimization, reversing inequality signs for ≥ constraints, handling equality with unrestricted dual variables, and noting primal unrestricted variables become dual equality constraints
Part (c): Define EOQ as optimal order quantity minimizing total inventory cost; list assumptions: constant demand, instantaneous replenishment, no stockouts, fixed ordering cost, constant carrying cost percentage
Calculate EOQ = √(2×12000×100)/(0.20×50) = 490 units (or √480000 ≈ 693 if using 20% as 0.20 directly); number of orders = 12000/490 ≈ 24.5; time between orders = 365/24.5 ≈ 14.9 days

Open full rubric & evaluate →

50M solve Transportation problem, normal distribution, queuing theory

(a) A Company ships truckloads of grain from three silos to four mills. The supply (in truckloads) and the demand (also in truckloads) together with the unit transportation costs per truckload on the different routes are summarized in the following table : Purpose is to find the minimum-cost shipping schedule between the silos and the mills. Use any method. Obtain the starting basic feasible solution. (b)(i) Suppose that the life in hours of an electric Gadget manufactured by a certain process is normally distributed with parameters μ = 160 hours and some σ. What would be the maximum allowable value of σ if the life X of the gadget is to have a probability 0.80 of being between 120 hours and 200 hours ? (Normal distribution Table is given at the end). (b)(ii) Let the compressive strength X of concrete be log-normally distributed with parameters μY = 3 MPa and σY = 0.2 MPa where Y = logeX. What is the probability that the strength is less than or equal to 10 MPa ? (Normal distribution Table is given at the end) (c) A departmental store operates with three checkout counters. To determine the number of counters in operation based on the number of customers, the manager uses the following schedule : | Number of customers in store | Number of customers in operation | |---|---| | 1 to 3 | 1 | | 4 to 6 | 2 | | More than 6 | 3 | Customers arrive in the counter(s) according to a Poisson distribution with a mean rate of 10 customers/hour. The average checkout time per customer is exponential with mean 12 minutes. Determine the steady state probability pn of n customers in the checkout area.

हिंदी में पढ़ें

(a) एक कंपनी अनाज से भरे ट्रकों को 3 भूमिगत कक्षों (सिलोस) से 4 फैक्ट्रियों (मिल्स) को जहाजों से भेजती है। आपूर्ति (भरे ट्रकों में) और मांग (भी भरे ट्रकों में), विभिन्न मांगों पर इकाई परिवहन लागत प्रति भरा ट्रक के साथ, निम्नलिखित सारणी में संक्षेप में दिये गये हैं : उद्देश्य यह है कि न्यूनतम लागत शिपिंग अनुसूची भूमिगत कक्षों (सिलोस) और फैक्ट्रियों (मिल्स) के बीच में ज्ञात कीजिए। कोई भी विधि का उपयोग करें। प्रारंभिक आधारी सुसंगत हल प्राप्त कीजिए। (b)(i) मान लीजिए कि एक निश्चित प्रक्रिया द्वारा निर्मित एक इलेक्ट्रिक गैजेट का जीवनकाल (घंटों में) प्रसामान्यतः बंटित है, जिसके प्राचल μ = 160 घंटे और σ कोई एक मान है। σ का अधिकतम स्वीकार्य मान क्या होगा यदि गैजेट के जीवनकाल X के 120 घंटे और 200 घंटे के बीच होने की प्रायिकता 0.80 है ? (प्रसामान्य बंटन सारणी पृष्ठ के अंत में दी गई है) (b)(ii) मान लीजिए X कंक्रीट की संपीडक शक्ति है जो लघुगणकीय प्रसामान्यतः बंटित है जिसके प्राचल μY = 3 MPa और σY = 0.2 MPa है जबकि Y = logeX है। क्षमता (शक्ति) 10 MPa से कम हो या इसके बराबर हो की प्रायिकता क्या है ? (प्रसामान्य बंटन के लिये सारणी आखरी पृष्ठ में दी है) (c) एक डिपार्टमेंटल स्टोर तीन चेकआउट काउंटरों के साथ संचालित होता है। ग्राहकों की संख्या के आधार पर संचालन में काउंटरों की संख्या निर्धारित करने के लिए, प्रबंधक निम्नलिखित अनुसूची का उपयोग करता है : | भंडार में ग्राहकों की संख्या | संचालन में ग्राहकों की संख्या | |---|---| | 1 से 3 | 1 | | 4 से 6 | 2 | | 6 से अधिक | 3 | प्वासों बंटन के अनुसार ग्राहक काउंटर पर पहुंचते हैं जिसका माध्य दर 10 ग्राहक प्रति घंटा है। औसत चेकआउट समय प्रति ग्राहक एक चरघातीय बंटन है जिसका माध्य 12 मिनट है। चेक आउट क्षेत्र में n ग्राहकों की स्थायी अवस्था प्रायिकता pn ज्ञात कीजिए।

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 40% time to part (a) transportation problem as it requires complete solution methodology, 35% to part (b) probability calculations involving normal and log-normal distributions, and 25% to part (c) queuing theory steady-state probabilities. Begin with clear problem setup for each part, show all computational steps with proper formulae, and conclude with interpreted final answers in correct units.

For (a): Correctly set up the balanced transportation problem (check if supply equals demand, add dummy if needed), apply Vogel's Approximation Method or Least Cost Method to obtain degenerate/non-degenerate basic feasible solution with (m+n-1) allocations
For (b)(i): Set up P(120 < X < 200) = 0.80, convert to standard normal Z-scores, use symmetry property to find z₀ such that P(-z₀ < Z < z₀) = 0.80, hence Φ(z₀) = 0.90, interpolate from table to find z₀ ≈ 1.28, then solve σ = 40/1.28 = 31.25 hours
For (b)(ii): Transform log-normal to normal: P(X ≤ 10) = P(Y ≤ ln10) = P(Y ≤ 2.3026), calculate Z = (2.3026-3)/0.2 = -3.487, use table to find Φ(-3.49) ≈ 0.0002 or precise interpolation
For (c): Identify this as M/M/3 queuing system with state-dependent service rates (μ, 2μ, 3μ for n=1,2,3+), use λ=10/hr, μ=5/hr, ρ=λ/3μ=2/3, apply birth-death process balance equations for steady-state probabilities p₀, p₁, p₂, and general formula for pₙ when n≥3
Verify all calculations: check transportation cost arithmetic, confirm normal table reading with interpolation, validate queuing traffic intensity ρ < 1 for steady state existence, and ensure probability sum equals 1

Open full rubric & evaluate →

B

50M Compulsory explain Psychometrics, population growth, epidemiological rates

(c) What do you mean by reliability and validity of tests ? What is the difference between reliability and validity of a test ? If the reliability of a test is raised from 0·80 to 0·90 by lengthening the test, a validity coefficient of 0·60 for this test would be expected to increase to what value ? 10 marks (d) The rate of increase of a population at time t is r(t) = 0·01 + 0·0001 t². If the population totals 1,000,000 at time t = 0, what is the population at t = 30 ? 10 marks (e) Suggest which of the two measures : Morbidity Incidence rate (MIR) and Morbidity Prevalence rate (MPR) should be used to decide on the amount of medicine to be sent to a Malaria affected area. Cite an example where the other rate can be useful. 10 marks

हिंदी में पढ़ें

(c) परीक्षणों की विश्वसनीयता और वैधता से आप क्या समझते हैं ? एक परीक्षण की विश्वसनीयता और वैधता के बीच क्या अंतर है ? यदि एक परीक्षण की विश्वसनीयता को, परीक्षण को लंबा करके, 0·80 से 0·90 तक बढ़ाया गया, तो इस परीक्षण के लिए वैधता गुणांक 0·60 से बढ़कर कितने मान तक जाना प्रत्याशित होगा ? 10 (d) जनसंख्या वृद्धि दर समय t पर है r(t) = 0·01 + 0·0001 t² यदि समय t = 0 पर कुल जनसंख्या 1,000,000 है, तो समय t = 30 पर जनसंख्या क्या होगी ? 10 (e) दोनों में से कौन-सा उपाय सुझाएं : रुग्णता घटना दर (एम.आई.आर.) और रुग्णता व्यापकता दर (एम.पी.आर.), दवा की मात्रा तय करने के लिए, इसका उपयोग किया जाना चाहिए जो एक मलेरिया प्रभावित क्षेत्र में भेजी जाएगी । एक उदाहरण उद्धृत करें जहाँ अन्य दर उपयोगी हो सकता है । 10

Answer approach & key points

This question requires explaining three distinct statistical concepts across psychometrics, demography, and epidemiology. Allocate approximately 35% time to part (c) covering reliability, validity definitions, their distinction, and the Spearman-Brown prophecy formula application; 35% to part (d) setting up and solving the differential equation for population growth; and 30% to part (e) comparing MIR and MPR with practical Indian public health examples. Begin with clear conceptual definitions, proceed to mathematical derivations where required, and conclude with contextual interpretations.

Part (c): Define reliability (consistency/stability of test scores) and validity (extent test measures what it claims to measure); distinguish reliability as necessary but not sufficient for validity; apply Spearman-Brown prophecy formula to calculate new validity coefficient ≈ 0.67
Part (c): Correctly identify that validity coefficient increases proportionally to square root of reliability ratio: r_new = r_old × √(0.90/0.80) = 0.60 × 1.0607 ≈ 0.636 or 0.64
Part (d): Set up differential equation dP/dt = P×r(t) = P(0.01 + 0.0001t²); integrate ln(P) = ∫(0.01 + 0.0001t²)dt = 0.01t + 0.0001t³/3 + C
Part (d): Apply initial condition P(0) = 1,000,000 to find C = ln(10⁶); compute P(30) = 10⁶ × exp[0.01(30) + 0.0001(27000)/3] = 10⁶ × e^1.2 ≈ 3,320,117
Part (e): Recommend MIR (incidence rate) for medicine allocation as it measures new cases over time, directly indicating current disease burden and transmission dynamics requiring immediate intervention
Part (e): Cite MPR usefulness for chronic disease planning like diabetes or hypertension prevalence studies in India where total existing cases matter for long-term healthcare infrastructure and resource allocation

Open full rubric & evaluate →

50M explain Time series analysis, least squares, heteroscedasticity

(a) Explain the principle of least squares. How it is used in fitting trend in time series analysis ? Explain the fitting of trend for the curve $y=ab^tc^{t^2}$. 15 marks (b) Define stationary time series. How would you test the stationarity of the given time series ? Write the importance of stationary time series. Check the following time series for stationarity. (i) $Y_t = Y_{t-1} + U_t$ (ii) $Y_t = \delta + Y_{t-1} + U_t$ (iii) $Y_t = \delta Y_{t-1} + U_t$ ; $-1 \leq \delta \leq 1$ 15 marks (c) State the different methods of detecting the presence of heteroscedasticity. Explain in brief the Goldfeld-Quandt Test for detecting the presence of heteroscedasticity. Also write the assumption required to apply this test. For a data on consumption expenditure in relation to income for a cross section of 30 families, after dropping the middle 4 observations, the OLS regression based on the first 13 and the last 13 observations and their associated residual sum of squares are as follows : Regression based on the first 13 observations : $\hat{Y}_i = 3.4094 + 0.6968 X_i$ $(r^2 = 0.8887, RSS_1 = 377.17, df = 11)$ Regression based on the last 13 observations : $\hat{Y}_i = -28.0272 + 0.7941 X_i$ $(r^2 = 0.7681, RSS_2 = 1536.8, df = 11)$ Check the presence of heteroscedasticity for the above given results and write your conclusion. $(F_{(11, 11, 5\%)} = 2.82, F_{(11, 11, 1\%)} = 4.46, F_{(13, 13, 5\%)} = 2.53, F_{(13, 13, 1\%)} = 3.82)$ 20 marks

हिंदी में पढ़ें

(a) न्यूनतम वर्ग के सिद्धांत को समझाइये । काल श्रेणी विश्लेषण में इसका उपयोग प्रवृत्ति समंजन में कैसे किया जाता है ? वक्र $y=ab^tc^{t^2}$ के लिए प्रवृत्ति के समंजन को समझाइए । 15 (b) अनुपन्न काल श्रेणी को परिभाषित कीजिए । एक दी हुई काल श्रेणी की स्थावरता की जाँच (परीक्षण) कैसे करेंगे ? अनुपन्न काल श्रेणी के महत्व को लिखिए । निम्नलिखित काल श्रेणियों की स्थावरता की जाँच कीजिए । (i) $Y_t = Y_{t-1} + U_t$ (ii) $Y_t = \delta + Y_{t-1} + U_t$ (iii) $Y_t = \delta Y_{t-1} + U_t$ ; $-1 \leq \delta \leq 1$ 15 (c) विषम विचलितता (हैट्रोसिडास्टिसिटी) की उपस्थिति का पता लगाने की विभिन्न विधियों को बताइए । विषम विचलितता की उपस्थिति पता लगाने के लिए गोल्डफेल्ड-क्वांड्ट (Goldfeld-Quandt) के परीक्षण को संक्षेप में समझाइए । इस परीक्षण को लागू करने के लिए आवश्यक अभिधारणा भी लिखें । उपभोग व्यय पर डेटा के लिए, जो 30 परिवारों के क्रॉस-सेक्शन की आय से संबंधित है, बीच में 4 अवलोकनों को हटाने के बाद, प्रथम 13 प्रेक्षणों और अंतिम 13 प्रेक्षणों के आधार पर साधारण न्यूनतम वर्ग (ओ.एल.एस.) समाश्रयण और उनके संबद्ध वर्गों का अवशिष्ट योग निम्नांकित है : पहले 13 प्रेक्षणों के आधार पर समाश्रयण : $\hat{Y}_i = 3.4094 + 0.6968 X_i$ $(r^2 = 0.8887, RSS_1 = 377.17, df = \text{स्वतंत्रकोटि} = 11)$ पिछले (या बाद के) 13 प्रेक्षणों के आधार पर समाश्रयण : $\hat{Y}_i = -28.0272 + 0.7941 X_i$ $(r^2 = 0.7681, RSS_2 = 1536.8, \text{स्वतंत्रकोटि (df)} = 11)$ उपरोक्त दिये गये परिणामों के लिए विषम विचलितता की उपस्थिति की जाँच करें और अपना निष्कर्ष लिखें । $(F_{(11, 11, 5\%)} = 2.82, F_{(11, 11, 1\%)} = 4.46, F_{(13, 13, 5\%)} = 2.53, F_{(13, 13, 1\%)} = 3.82)$ 20

Answer approach & key points

Explain the theoretical foundations first, then demonstrate computational application. Allocate ~30% time to part (a) on least squares and trend fitting, ~30% to part (b) on stationarity concepts and testing the three given models, and ~40% to part (c) on heteroscedasticity detection with complete Goldfeld-Quandt test execution. Structure as: theoretical exposition → mathematical derivation → numerical computation → statistical inference.

Part (a): Principle of least squares (minimizing sum of squared residuals), its application in linear and non-linear trend fitting, and complete working for y=ab^tc^{t^2} using logarithmic transformation to linear form
Part (b): Formal definition of weak/strong stationarity (constant mean, variance, autocovariance), Dickey-Fuller or graphical methods for testing, importance for valid inference, and classification of (i) random walk (non-stationary), (ii) random walk with drift (non-stationary), (iii) AR(1) process (stationary when |δ|<1)
Part (c): Listing detection methods (graphical, Park test, Glejser test, White test, Goldfeld-Quandt test), complete Goldfeld-Quandt procedure with assumptions (normality, homoscedasticity under null, increasing/decreasing variance pattern)
Correct computation of F-statistic = RSS2/RSS1 = 1536.8/377.17 = 4.075 with proper degrees of freedom (11,11)
Proper hypothesis testing conclusion: F_calculated (4.075) > F_critical at 5% (2.82), reject null, heteroscedasticity present; also note significance at 1% level since 4.075 < 4.46 is false—actually 4.075 < 4.46, so not significant at 1%
Recognition that RSS2 > RSS1 indicates increasing variance with income, confirming heteroscedasticity in consumption expenditure data

Open full rubric & evaluate →

50M derive Life table functions and standard scores

7.(a) Derive, by starting from a suitable functional form for $l_x$, the formula (i) $L_x = \dfrac{l_x + l_{x+1}}{2}$ and (ii) $L_x = \dfrac{l_x - l_{x+1}}{(\log l_x - \log l_{x+1})} = -\dfrac{d_x}{\log p_x}$ (iii) $e_x^0 = \dfrac{1}{2} + \sum\limits_{i=1}^{\infty} \dfrac{i d_{x+i}}{l_x}$ where $l_x$ = members of the cohort alive at age $x$ $L_x$ = number of years lived, in the aggregate, by the cohort of $l_0$ persons between age $x$ and $(x+1)$ $d_x$ = number of persons dying between age $x$ and $(x+1)$ $= l_x - l_{x+1}$ $p_x$ = probability that a person of age $x$ will survive till age $(x+1)$ $e_x^0$ = expectation of life at age $x$ 7.(b) (i) 400 students are given a test. The average is 60 and the standard deviation is 12. Obtain the Z-score and the standard scores equivalent to raw scores. The raw scores are given by | Raw scores | 84 | 78 | 72 | 66 | 60 | 54 | 48 | 42 | 36 | (ii) Convert the ten scores 1, 2, ..., 10 into standard scores with mean 50 and standard deviation 10. 7.(c) On the life table with $l_x = \dfrac{100-x}{190}$, $5 \leq x \leq 100$, Find (i) the chance that a child who has reached age 5 will live to age 60. (ii) the chance that a man of age 30 will live until age 80. (iii) the probability of dying within 5 years for a man aged 40. (iv) the expectation of life at age 40. (v) the chance that of the three men aged 30 at least one survives till age 80.

हिंदी में पढ़ें

7.(a) $l_x$ के लिए एक उपयुक्त फलनिक रूप से शुरू करके निम्नलिखित सूत्र को व्युत्पन्न कीजिए (i) $L_x = (l_x + l_{x+1})/2$ और (ii) $L_x = (l_x – l_{x+1})/(\log l_x – \log l_{x+1}) = – dx/\log p_x$ (iii) $e°_x = 1/2 + \sum_{i=1}^\infty (i d_{x+i})/l_x$ जहाँ $l_x$ = जत्था (कोहोर्ट) के सदस्य जो आयु $x$ तक जीवित हैं $L_x$ = जितने वर्ष जीवित रहे, सकल में $l_0$ व्यक्तियों के जत्थों द्वारा आयु $x$ और आयु $(x+1)$ के बीच $d_x$ = व्यक्तियों की संख्या जिनकी मृत्यु आयु $x$ और $(x+1)$ के बीच में होती है $= l_x - l_{x+1}$ $p_x$ = आयु $x$ के एक व्यक्ति के आयु $(x+1)$ तक जीवित रहने की प्रायिकता है $e_x^0$ = आयु $x$ पर जीवन की प्रत्याशा 7.(b) (i) 400 विद्यार्थियों ने एक परीक्षा दी है। औसत 60 है और मानक विचलन 12 है। Z-समंक और मानक समंकों को प्राप्त कीजिए जो कि यथाप्रास समंकों के तुल्य हैं। यथाप्रास समंक नीचे दिये गये हैं। | यथाप्रास समंक | 84 | 78 | 72 | 66 | 60 | 54 | 48 | 42 | 36 | (ii) दस समंकों 1, 2, ..., 10 को मानक समंकों में बदलो जिनका माध्य 50 और मानक विचलन 10 है। 7.(c) वय-सारणी में $l_x = \dfrac{100-x}{190}$ के साथ, $5 \leq x \leq 100$, ज्ञात कीजिए (i) प्रायिकता कि एक बच्चा जो आयु 5 पर पहुँच गया है, वह आयु 60 तक जीवित रहेगा। (ii) प्रायिकता कि एक व्यक्ति जिसकी आयु 30 वर्ष है वह आयु 80 तक जीवित रहेगा। (iii) प्रायिकता कि एक व्यक्ति जिसकी आयु 40 वर्ष है, वह 5 वर्ष के अन्दर मर जायेगा। (iv) आयु 40 पर जीवन की प्रत्याशा। (v) प्रायिकता कि 30 वर्ष की आयु वाले तीन व्यक्तियों में से कमसे कम एक आयु 80 तक जीवित रहे।

Answer approach & key points

Begin with clear statement of assumptions for each derivation in 7(a), showing step-by-step integration for L_x formulas and summation manipulation for e_x^0. For 7(b), apply Z-score formula z = (X-μ)/σ systematically, then demonstrate linear transformation for standard scores with mean 50, SD 10. For 7(c), substitute given l_x = (100-x)/190 into survival probabilities, death probabilities, and life expectancy formulas, computing each numerical value with proper fraction handling. Allocate approximately 35% time to derivations in (a), 25% to standard score calculations in (b), and 40% to life table computations in (c) given its five sub-parts.

7(a)(i): Assume l_x linear in [x, x+1], integrate l(t)dt from x to x+1 to obtain (l_x + l_{x+1})/2
7(a)(ii): Assume l_x exponential (constant force of mortality), use l_t = l_x·e^{-μt} and integrate to derive harmonic mean form -d_x/log(p_x)
7(a)(iii): Express T_x = Σ L_{x+i} using linear assumption, substitute L_x = (l_x + l_{x+1})/2, rearrange to obtain e_x^0 = 1/2 + Σ i·d_{x+i}/l_x
7(b): Calculate Z-scores as (X-60)/12 for each raw score, then apply linear transformation 50 + 10·z for standard scores; for 1-10 scores, first find mean=5.5, SD=√8.25, then transform
7(c): Compute survival probabilities as l_60/l_5, l_80/l_30; death probability as 1-l_45/l_40; e_40^0 = T_40/l_40 with T_40 = Σ L_{40+i}; binomial probability for at least one survivor among three men

Open full rubric & evaluate →

50M explain Agricultural statistics, fertility rates, and econometric model identification

8.(a) Explain the method of collection of agriculture data. Describe the (i) official publications for data collection and (ii) statistics collected by the various official agencies pertaining to agriculture production. 8.(b) Distinguish between GFR and TFR. What is meant by TFR = 3.29 ? Discuss the merits and demerits of TFR. Construct the relationship between GRR and TFR. Interpret GRR when GRR >1, <1 or =1. 8.(c) State the order and rank conditions to check the identifiability of the given system of simultaneous equations. Consider the following extended Keynesian model of income determination : Consumption function : $C_t = \beta_1 + \beta_2 Y_t - \beta_3 T_t + U_{1t}$ Investment function : $I_t = \alpha_0 + \alpha_1 Y_{t-1} + U_{2t}$ Taxation function : $T_t = \gamma_0 + \gamma_1 Y_t + U_{3t}$ Income Identity : $Y_t = C_t + I_t + G_t$ Where C = Consumption expenditure Y = Income I = Investment T = Taxes G = Government expenditure and U's = the disturbance terms. In the model the endogenous variables are C, I, T and Y and the predetermined variables are G and $Y_{t-1}$. By applying the order condition, check the identifiability of each of the equations in the system and of the system as a whole. Write your conclusion.

हिंदी में पढ़ें

8.(a) कृषि आँकड़ों के संग्रह की विधि की व्याख्या कीजिए। (i) आँकड़ों के संग्रह के लिए राजकीय प्रकाशनों का वर्णन कीजिए। (ii) कृषि उत्पादन से संबंधित विभिन्न राजकीय एजेंसियों द्वारा एकत्र किये गये आँकड़ों का वर्णन कीजिए। 8.(b) सामान्य प्रजनन दर (जी.एफ.आर.) और संपूर्ण प्रजनन दर (टी.एफ.आर.) में अंतर स्पष्ट करें । टी.एफ.आर. = 3.29 से क्या समझते हैं ? संपूर्ण प्रजनन दर (टी.एफ.आर.) के गुणों और अवगुणों के बारे में समझाइए । सकल जनन दर (जी.आर.आर.) और संपूर्ण प्रजनन दर (टी.एफ.आर.) के बीच में संबंध निकालें । जी.आर.आर.(GRR) की व्याख्या कीजिए जब GRR(जी.आर.आर.) >1, <1 या =1 हो । 8.(c) दी गई युगपत् समीकरणों की प्रणाली की अभिज्ञेयता की जाँच करने के लिए क्रम और कोटि प्रतिबंधों को बताइए । निम्नलिखित विस्तारित किनेसियन मॉडल, जो कि आय निर्धारण के लिए है, पर विचार कीजिए : उपभोग फलन : $C_t = \beta_1 + \beta_2 Y_t - \beta_3 T_t + U_{1t}$ निवेश फलन : $I_t = \alpha_0 + \alpha_1 Y_{t-1} + U_{2t}$ कर लगना या कर-निर्धारण फलन : $T_t = \gamma_0 + \gamma_1 Y_t + U_{3t}$ आय पहचान : $Y_t = C_t + I_t + G_t$ जहाँ C = उपभोग व्यय Y = आय I = निवेश T = कर G = सरकार व्यय और U's = त्रुटि (डिस्टर्बेंस टर्म्स) मॉडल में अंतर्जात चर (endogenous variables) C, I, T और Y हैं और पूर्वनिर्धारित चर (predetermined variable) G और $Y_{t-1}$ हैं । क्रम शर्त लागू करके प्रणाली में प्रत्येक समीकरण की अभिज्ञेयता (आइडेंटिफाइबिलिटी) की जाँच करें, और समग्र रूप से प्रणाली की जाँच करें । अपना निष्कर्ष लिखें ।

Answer approach & key points

The question demands explanation across three distinct domains: agricultural data systems, fertility measures, and econometric identification. Allocate approximately 35% (15-18 marks) to part (a) covering data collection methods and official publications; 35% (15-18 marks) to part (b) distinguishing GFR/TFR with mathematical relationships and demographic interpretation; and 30% (12-15 marks) to part (c) applying order/rank conditions to the Keynesian model. Structure with clear sectional headings, begin each part with definitions, proceed to methodological details, and conclude with synthesis—ensuring the econometric section explicitly shows matrix calculations for identifiability.

Part (a): Enumeration of agricultural data collection methods (census, sample surveys, administrative records) with specific Indian examples—Land Records, Agricultural Census, NSSO rounds; identification of official publications (Agricultural Statistics at a Glance, State Statistical Abstracts, FAO reports, DES publications)
Part (a)(ii): Classification of statistics by agency—MoA&FW (crop area, yield, production), CSO (national income from agriculture), RBI (agricultural credit), NABARD (rural credit, WDRA data), State Directorates of Economics and Statistics
Part (b): Precise distinction between GFR (age-specific births per 1000 women) and TFR (average births per woman completing reproductive span); interpretation of TFR=3.29 as replacement-level analysis; merits/demerits covering data requirements, period sensitivity, and cross-population comparability
Part (b): Mathematical derivation TFR = 5 × GFR (assuming 5-year age groups) or TFR = Σ ASFR; GRR-TFR relationship via sex ratio at birth; GRR interpretation with population stability implications—GRR>1 (growing), GRR<1 (declining), GRR=1 (stationary, ignoring mortality)
Part (c): Correct statement of order condition (K ≥ M-1, where K=excluded predetermined, M=endogenous in equation) and rank condition; systematic application to four-equation Keynesian model identifying endogenous (C,I,T,Y) and predetermined (G,Yt-1) variables; construction of coefficient matrix and explicit identifiability verdict for each equation

Open full rubric & evaluate →

Practice Statistics 2023 Paper II answer writing

Pick any question above, write your answer, and get a detailed AI evaluation against UPSC's standard rubric.

Start free evaluation →