Statistics

UPSC Statistics 2025

All 16 questions from the 2025 Civil Services Mains Statistics paper across 2 papers — 800 marks in total. Each question comes with a detailed evaluation rubric, directive word analysis, and model answer points.

16Questions
800Total marks
2Papers
2025Exam year

Paper I

8 questions · 400 marks
Q1
50M Compulsory prove Probability theory and distributions

(a) Let E, F and G be three pairwise independent events such that P(E∩F) = 0·1 and P(F∩G) = 0·3. Prove that P(Eᶜ∪G) ≥ 11/12. 10 marks (b) If X and Y are non-negative independent random variables and their joint moment generating function is given by M_{X,Y}(t₁,t₂) = e^{t₁+2e^{t₂}-3}; t₁ > 0, t₂ > 0, then show that 2P(X+Y=2) = 9P(X+Y=0). 10 marks (c) If X₁, X₂, ... be a sequence of i.i.d. U(0, 1) random variables, then find the value of Lim_{n→∞} P(∑_{i=1}^{n} X_i ≤ n/2 + √(n/144)) [use √3 = 1·74, Φ(0·29) = 0·6141, Φ(1) = 0·8413]. 10 marks (d) Let X₁, X₂, ..., Xₙ be a random sample from normal N(μ, σ²) distribution. Obtain sufficient statistic for parameters (μ, σ²) when both the parameters are unknown. If σ² is known, what will be sufficient statistic for parameter μ ? 10 marks (e) A random variable X has the following distribution under H₀ and H₁ : x : 1 2 3 4 5 6 f₀(x) : 0·01 0·01 0·01 0·01 0·01 0·95 f₁(x) : 0·05 0·04 0·03 0·02 0·01 0·85 Find the best test of size 0·03 and its probability of type-II error for testing H₀ : f = f₀ versus H₁ : f = f₁. Is it unbiased test ? Why ? 10 marks

Answer approach & key points

This question demands rigorous mathematical proofs and derivations across five sub-parts. Begin with (a) by establishing bounds using pairwise independence and probability inequalities; for (b) extract marginal MGFs to identify distributions and compute probabilities; for (c) apply CLT with given variance 1/12; for (d) use factorization theorem for sufficient statistics; for (e) construct Neyman-Pearson test via likelihood ratios. Allocate time proportionally: ~18 min each for (a), (b), (c) and ~12 min each for (d), (e).

  • For (a): Apply P(Eᶜ∪G) = 1 - P(E∩Gᶜ) and use pairwise independence to bound P(E∩G) ≤ 1/12, yielding the inequality
  • For (b): Factor joint MGF to identify X~Poisson(1) and Y~Poisson(2), then compute P(X+Y=2) and P(X+Y=0) using sum of Poissons
  • For (c): Apply CLT with E[X_i]=1/2, Var(X_i)=1/12, standardize to get Φ(0.29)=0.6141 as limit
  • For (d): State factorization theorem, show (X̄, S²) is sufficient for (μ,σ²), and X̄ alone when σ² known
  • For (e): Compute likelihood ratios f₁/f₀, order critical region by ratios, find size 0.03 test using x=4,5,6, compute β=0.88, verify unbiasedness
Q2
50M solve Random variables and distributions

(a) Let X be a continuous random variable having probability density function f(x) = { (2/25)(x+2), -2 ≤ x ≤ 3; 0, otherwise. Find the cumulative distribution function of Y = X² and hence find probability density function of Y. 20 marks (b) The joint probability mass function of two random variables (X, Y) be P(X = x, Y = y) = { (x+1)Cᵧ · ¹⁶Cₓ · (1/6)ʸ · (5/6)ˣ⁺¹⁻ʸ · (1/2)¹⁶, y = 0, 1, 2,..., x+1; x = 0, 1, 2,..., 16; 0, otherwise. Evaluate the following: (i) E(X), Var.(X); (ii) E(Y), Var.(Y); (iii) Cov. (X, Y). 5+5+5=15 marks (c) Let the joint probability density function of (X, Y) be f(x,y) = { 2e^{-(x+y)}, 0 < x < y < ∞; 0, otherwise. Compute the following: (i) P(Y<1); (ii) P(λX<Y), λ>1; (iii) P(Y>3X | Y>2X). 5+5+5=15 marks

Answer approach & key points

Solve this multi-part numerical problem by allocating approximately 40% time to part (a) given its 20 marks weightage, and 30% each to parts (b) and (c). Begin with clear identification of the distribution type for each part, show complete derivation steps for transformations in (a), recognize the compound/binomial structure in (b), and carefully handle the constrained region of integration in (c). Conclude each sub-part with boxed final answers and appropriate probability statements.

  • Part (a): Correctly identify support of Y=X² as [0,9] with special handling for Y∈[0,4) where two X-values map to same Y; derive piecewise CDF F_Y(y) = P(X²≤y) by splitting into X∈[-√y,√y] and account for f(x)=0 for x<-2
  • Part (a): Differentiate CDF to obtain PDF f_Y(y) with proper case distinction for y∈[0,4) and y∈[4,9], verifying total probability equals 1
  • Part (b): Recognize X~Binomial(16, 1/2) and Y|X=x~Binomial(x+1, 1/6), then apply laws of total/conditional expectation and variance including E(Y)=E[E(Y|X)] and Var(Y)=E[Var(Y|X)]+Var[E(Y|X)]
  • Part (b): Compute Cov(X,Y)=E[XY]-E[X]E[Y] using E[XY]=E[X·E(Y|X)]=E[X·(x+1)/6] with proper summation or known binomial moments
  • Part (c): Set up correct double integrals over region 0<x<y<∞ with Jacobian handling; for P(Y<1) integrate x from 0 to y then y from 0 to 1
  • Part (c): For P(λX<Y) with λ>1, identify region as 0<x<y/λ<y and evaluate; for conditional probability P(Y>3X|Y>2X) use definition P(Y>3X)/P(Y>2X) with proper region identification
Q3
50M calculate Probability distributions and estimation theory

(a) Let probability of obtaining Head on a biased coin be 4/5 and X be the number of heads obtained in a sequence of 25 independent tosses of the coin. The same coin is tossed again X number of times independently and we obtain Y heads. Compute Var.(X+25Y). (20 marks) (b)(i) Let {6, –8, 3, 2, 7, 5, 4, 9} be a random sample from a population with probability density function f(x, θ) = ½ exp(–|x–θ|), –∞<x, θ<∞. Obtain maximum likelihood estimate of θ. (5 marks) (b)(ii) Let X₁, X₂, ..., Xₙ be a random sample from Bernoulli distribution b(1, θ), 0<θ<1. Find the lower bound for the variance of an unbiased estimator of θ based on this data. Find uniformly minimum variance unbiased estimator of θ and show that it attains Cramer-Rao lower bound. (10 marks) (c) Let X₁, X₂, ..., Xₙ be a random sample from beta distribution of first kind β₍₁, θ₎, θ>0. Find consistent estimator of θ, and its variance also. (15 marks)

Answer approach & key points

Calculate the required quantities systematically across all three parts. For part (a), identify distributions and apply variance decomposition for compound random variables; for (b)(i), derive the MLE using the Laplace distribution's median property; for (b)(ii), establish the Cramer-Rao bound and verify attainment; for (c), use method of moments or MLE for consistency. Allocate approximately 40% time to part (a) given its 20 marks, 30% to part (c) for 15 marks, and 30% to part (b) combining 5+10 marks. Present derivations stepwise with clear notation before substituting numerical values.

  • Part (a): Correctly identify X ~ Binomial(25, 4/5) and Y|X=x ~ Binomial(x, 4/5), then apply law of total variance to find Var(X+25Y) using E[Var(Y|X)] + Var(E[Y|X]) components
  • Part (b)(i): Recognize f(x,θ) as Laplace distribution with location parameter θ, hence MLE of θ equals the sample median (4.5 or between 4 and 5)
  • Part (b)(ii): Derive Cramer-Rao lower bound as θ(1-θ)/n, identify sample mean as UMVUE, and prove it attains the bound by showing equality in Cauchy-Schwarz
  • Part (c): For Beta(1,θ), derive method of moments estimator θ̂ = (1-X̄)/X̄ or MLE, prove consistency via weak law of large numbers, and compute asymptotic variance
  • Correct application of variance formulas: Var(X) = np(1-p) for binomial, and careful handling of the 25Y scaling factor in part (a)
  • Proper justification of why sample mean is UMVUE in (b)(ii) using completeness and sufficiency of T = ΣXi, or direct variance calculation
  • Verification that estimator in (c) is consistent by showing plim(θ̂) = θ as n → ∞, with explicit variance expression involving θ and n
Q4
50M derive Sequential probability ratio test and non-parametric tests

(a) Let X₁, X₂, ... be a sequence of random variables from Bernoulli distribution with mean θ, 0<θ<1. Derive SPRT for testing H₀ : θ = θ₀ versus H₁ : θ = θ₁ = 1 – θ₀, 0<θ₀<1. Also obtain expressions for OC function and ASN function. (20 marks) (b) A random sample of size n is taken from the exponential distribution with mean θ>0. Given that n₁ observations out of n observations are less than 'a'. Show that minimum Chi-square estimate and maximum likelihood estimate of θ are same. (15 marks) (c) The life of 6 items of brand-A and 6 items of brand-B are given below: A : 40 62 55 35 48 88 B : 50 70 65 30 45 92 Using Kolmogorov-Smirnov test, test whether the distribution of life of both the brands are same or not at 5% level of significance. [Given that D₍₆, ₆, ₀.₀₅₎ = 2/3] (15 marks)

Answer approach & key points

Derive the SPRT for Bernoulli in part (a) with proper likelihood ratio development, spending ~40% time on this 20-mark component; for (b) prove equivalence of MCSE and MLE through differentiation of chi-square and likelihood functions (~30%); for (c) apply K-S test with correct empirical CDF construction and comparison against critical value D₍₆,₆,₀.₀₅₎ = 2/3 (~30%). Structure: state hypotheses → show derivations/computations → conclude with statistical decisions.

  • Part (a): Derive Wald's SPRT using likelihood ratio Λₙ = (θ₁/θ₀)^ΣXᵢ · ((1-θ₁)/(1-θ₀))^(n-ΣXᵢ) with continuation region A < Λₙ < B
  • Part (a): Obtain OC function L(θ) ≈ (A^(h(θ))-1)/(A^(h(θ))-B^(h(θ))) and ASN Eθ(N) ≈ L(θ)lnA + (1-L(θ))lnB / Eθ(Z) where Z = ln[f₁(X)/f₀(X)]
  • Part (b): Set up Pearson's chi-square with cells [0,a) and [a,∞), minimize Σ(Oᵢ-Eᵢ)²/Eᵢ to get MCSE, show same equation as MLE score function
  • Part (c): Construct ordered empirical distribution functions F₆(x) and G₆(x) for both brands, compute D₆,₆ = sup|F₆(x)-G₆(x)|
  • Part (c): Compare calculated D statistic with critical value 2/3, conclude whether to reject H₀ of identical distributions at 5% level
Q5
50M Compulsory derive Linear regression, multivariate normal, sample statistics, sample size, experimental design

(a) For a two variable linear regression model Yᵢ = a + bXᵢ + eᵢ, where E(eᵢ) = 0, Var(eᵢ) = σ²ₑ, Cov(eᵢ, eⱼ) = 0 for i ≠ j, (i,j) ∈ {1, 2, ..., n}, if â and b̂ are least square estimators of a and b respectively, derive expressions for Var(â), Var(b̂) and Cov(â, b̂). 10 marks (b) Let X = (X₁ X₂ X₃)' ~ N₃(μ, Σ), where μ = (1 2 1)' and Σ = (9 2 2 / 2 3 0 / 2 0 2). Find the joint distribution of Y₁ = X₁ + X₂ + X₃ and Y₂ = X₂ - X₃. 10 marks (c) If X₁, X₂, ..., Xₙ is a random sample from a standard normal population, then using quadratic forms show that the sample mean X̄ = (1/n)∑ⱼ₌₁ⁿ Xⱼ and sample variance S² = [1/(n-1)]∑ⱼ₌₁ⁿ(Xⱼ - X̄)² are stochastically independent. 10 marks (d) Assume that in a population of very large number of items, proportion of defective items is 0·30. What should be the size of the sample, if a simple random sample is to be drawn from this population to estimate the percent defective within 2 percent of the true value with 95·5 percent probability? [Given P(0 ≤ Z ≤ 1·96) = 0·475; and P(0 ≤ Z ≤ 2·005) = 0·4775]. 10 marks (e) How do the size and shape of plots and blocks effect the results of field experiments? 10 marks

Answer approach & key points

This question demands rigorous derivation and calculation across five statistical sub-parts. Allocate approximately 20% time each: for (a) derive Var(â), Var(b̂), Cov(â,b̂) using matrix or summation approach; for (b) apply linear transformation of multivariate normal; for (c) use quadratic forms and Cochran's theorem; for (d) solve the sample size formula for proportions; for (e) discuss experimental design principles with Indian agricultural examples like IARI field trials. Present each part distinctly with clear labeling.

  • Part (a): Derivation of Var(â) = σ²ₑ[∑Xᵢ²/(n∑(Xᵢ-X̄)²)], Var(b̂) = σ²ₑ/∑(Xᵢ-X̄)², and Cov(â,b̂) = -σ²ₑX̄/∑(Xᵢ-X̄)² using normal equations or matrix approach
  • Part (b): Application of linear transformation Y = AX where A = [[1,1,1],[0,1,-1]] to obtain Y ~ N₂(Aμ, AΣA') with computed mean (4,1)' and covariance matrix [[17,1],[1,5]]
  • Part (c): Decomposition of total sum of squares using idempotent matrices, showing Q₁ = nX̄² and Q₂ = (n-1)S² are independent via rank additivity and Cochran's theorem
  • Part (d): Sample size calculation n = Z²ₐ/₂ p(1-p)/d² with p=0.30, d=0.02, Z=2.005 yielding n ≈ 2102 or appropriate rounding
  • Part (e): Discussion of plot size effects on soil heterogeneity control, shape effects on border bias, and block arrangement for local control with reference to Indian agricultural experiments like varietal trials at IARI
Q6
50M derive Bivariate normal, joint distributions, conditional expectation, generalized least squares

(a)(i) If (X, Y) follows bivariate normal BN(μ₁, μ₂, σ₁², σ₂², ρ), then obtain (A) E(e^X) (B) E(e^(X+Y)) (C) Var(e^X) and (D) Correlation between e^X and e^Y. 3+3+3+3=12 marks (a)(ii) If (X, Y) have the joint probability density function g(x,y) = y e^(-y(x+1)), for x ≥ 0, y ≥ 0; 0 elsewhere, then find the regression curve of X on Y and comment on the nature of the curve. 8 marks (b) Let X = (X₁, X₂, X₃)' ~ N₃(μ, Σ), in which μ = (2 1 3)' and Σ = (9 2 -2 / 2 2 -3 / -2 -3 9). Obtain (i) E{X₁ | X₂ = x₂, X₃ = x₃} and (ii) Var{X₁ | X₂ = x₂, X₃ = x₃}. 15 marks (c) Consider the model: Y = X θ + ε, where ε is an n×1 vector of unobservable random variables such that E(ε) = 0 and D(ε) = σ²Ω, σ>0 unknown, Ω is a positive definite matrix of known constants and rank(X) = k<n. Then (i) Derive least square estimator of θ and (ii) Derive an unbiased estimator of σ². 9+6=15 marks

Answer approach & key points

Derive all required quantities systematically across three parts: spend ~35% time on (a) covering MGF technique for lognormal moments and regression curve derivation; ~30% on (b) for conditional multivariate normal using Schur complement; and ~35% on (c) for GLS estimator derivation via Aitken transformation and unbiased variance estimation. Begin each part with appropriate distribution assumptions, show complete derivation steps, and conclude with explicit final expressions.

  • Part (a)(i): Use MGF of bivariate normal to derive E(e^X)=exp(μ₁+σ₁²/2), E(e^(X+Y))=exp(μ₁+μ₂+(σ₁²+σ₂²+2ρσ₁σ₂)/2), Var(e^X), and Corr(e^X,e^Y) using lognormal properties
  • Part (a)(ii): Obtain marginal of Y, conditional density of X|Y, derive E(X|Y=y)=1/y showing hyperbolic regression curve with negative association
  • Part (b): Apply conditional multivariate normal formula with Σ₂₂ partition, compute Σ₁₂Σ₂₂⁻¹ for conditional mean and Σ₁₁-Σ₁₂Σ₂₂⁻¹Σ₂₁ for conditional variance
  • Part (c)(i): Derive GLS estimator θ̂=(X'Ω⁻¹X)⁻¹X'Ω⁻¹Y via Aitken transformation or direct minimization of generalized sum of squares
  • Part (c)(ii): Derive unbiased estimator σ̂²=(Y-Xθ̂)'Ω⁻¹(Y-Xθ̂)/(n-k) using trace properties and idempotent matrix arguments
  • Correct handling of positive definiteness conditions for Ω and invertibility requirements throughout
  • Proper verification that E(θ̂)=θ (unbiasedness) and E(σ̂²)=σ² in part (c)
Q7
50M analyse Latin square design and factorial experiments

(a) Analyse and interpret the following data concerning output of wheat per field obtained as a result of experiment conducted to test four varieties of wheat A, B, C and D under a Latin square design at 5% level of significance. [Given F(3, 6) = 4·76; F(4, 7) = 4·12] (20 marks) (b)(i) Explain the need of factorial experiments with an example from pharmaceutical study. (6 marks) (b)(ii) Divide the 16 treatments of 2⁴ factorial experiment into 4 blocks of 4 treatments each, confounding the interaction effect AB and CD completely with blocks. Which other interaction is automatically confounded in this design ? (9 marks) (c) Define Horvitz-Thompson estimator for estimating the population total, and show that it is unbiased for probability proportional to size sampling without replacement. Also find its sampling variance. (15 marks)

Answer approach & key points

Begin with the directive 'analyse' by breaking down the Latin square data in part (a) systematically—set up ANOVA table, compute F-statistic, and compare with critical value. Allocate approximately 40% of effort to part (a) (20 marks), 30% to part (b) combining theoretical explanation of factorial experiments with pharmaceutical example and confounding construction (15 marks), and 30% to part (c) for rigorous derivation of Horvith-Thompson estimator properties (15 marks). Structure as: (a) complete ANOVA with hypothesis testing, (b)(i) conceptual explanation with Indian pharmaceutical context like drug efficacy trials, (b)(ii) systematic block construction using confounding pattern, (c) formal definition followed by unbiasedness proof and variance derivation.

  • For (a): Correct ANOVA setup for 4×4 Latin square with rows, columns, treatments; proper calculation of correction factor, total SS, row SS, column SS, treatment SS, error SS; correct F-test for varieties with df (3,6); comparison with given critical value F(3,6)=4.76; clear conclusion on significance
  • For (b)(i): Explanation of factorial experiments need—simultaneous study of multiple factors, detection of interactions, efficiency over single-factor experiments; pharmaceutical example such as 2² factorial on drug dosage and administration timing effects on patient recovery in Indian clinical trials
  • For (b)(ii): Construction of 2⁴ factorial in 4 blocks using AB and CD as confounded effects; identification of generalized interaction AB×CD = ABCD as automatically confounded; systematic block composition using even-odd rule or modulo 2 arithmetic on defining contrasts
  • For (c): Formal definition of Horvitz-Thompson estimator as Σ(yᵢ/πᵢ) where πᵢ is inclusion probability; proof of unbiasedness showing E(Ŷ_HT) = Y using PPSWOR properties with πᵢ = npᵢ; derivation of variance formula involving πᵢ and πᵢⱼ using Yates-Grundy-Sen approach or alternative
  • Cross-cutting: Appropriate use of statistical notation, clear statement of assumptions, and logical flow connecting theoretical derivations to practical experimental contexts
Q8
50M solve Principal components and missing value analysis

(a)(i) What are principal components ? Show that the principal components are uncorrelated. (10 marks) (a)(ii) Obtain the principal components and the amount of variation explained by each principal component associated with the following dispersion matrix : Σ = $\begin{pmatrix} 4 & 2 & 1 \\ 2 & 3 & 1 \\ 1 & 1 & 2 \end{pmatrix}$ Comment on the results. (10 marks) (b) For the given data, the yield of the treatment B in the second block is missing and is denoted as 'y'. Estimate the missing value, and analyse the data by assuming the level of significance = 0·05. [Given that F(3, 4) = 6·59; and F(2, 3) = 9·55] (20 marks) (c) Distinguish between Sampling and Non-sampling Errors. What are their sources ? How these errors can be controlled ? (10 marks)

Answer approach & key points

This is a multi-part numerical and theoretical question requiring proof, computation, and analysis. Allocate approximately 40% time to part (a) covering PCA theory and computation, 40% to part (b) for missing value estimation and ANOVA analysis, and 20% to part (c) for conceptual comparison of errors. Begin with definitions and proofs in (a), proceed to systematic eigenvalue computation, then handle missing value estimation using Yates' method followed by complete ANOVA, and conclude with structured comparison for (c).

  • Part (a)(i): Define principal components as linear combinations maximizing variance; prove uncorrelatedness using orthogonal transformation property (Z = Γ'X where Γ is eigenvector matrix)
  • Part (a)(ii): Compute eigenvalues of Σ (characteristic equation: -λ³ + 9λ² - 21λ + 13 = 0), obtain eigenvectors, calculate proportion of variance explained by each PC, comment on dimensionality reduction
  • Part (b): Estimate missing value y using Yates' formula for RBD: y = (rB + tT - G)/((r-1)(t-1)), reconstruct ANOVA table with adjusted degrees of freedom, compare calculated F with given critical values
  • Part (c): Distinguish sampling error (random, measurable, decreases with n) vs non-sampling error (systematic, non-measurable); list sources (coverage, non-response, measurement, processing, frame errors); control methods (probability sampling, pre-testing, training, validation, imputation techniques)
  • Correct application of spectral decomposition theorem and verification that trace equals sum of eigenvalues

Paper II

8 questions · 400 marks
Q1
50M Compulsory solve Statistical Quality Control and Operations Research

(a) State the significance of operating characteristic (OC) curves in control chart analysis. Obtain the general expression for the OC function corresponding to the mean (X̄) chart under the assumption of normal distribution for a quality characteristic. Using the expression, find the probability that a shift will be detected from μ₀ to μ₁ = μ₀ + 2σ, when an X̄ chart is used with 3σ limits, where the subgroup size is n = 6. (Standard normal table is provided.) 10 marks (b) What is meant by rectifying inspection? Explain the measures associated with rectifying inspection and derive the expressions of such measures in the case of a single sampling plan by attributes. 10 marks (c) The lifetime of a semiconductor laser has a log-normal distribution with parameters μ = 10 hours and σ = 1·5 hours. (i) Find the probability that the lifetime exceeds 10000 hours. (ii) What lifetime is exceeded by 99% of lasers? (Standard normal table is provided.) 5+5=10 marks (d) A stockist has to supply 400 units of a product every Monday to his customers. He gets the product at ₹ 50 per unit from the manufacturer. The cost of ordering and transportation from the manufacturer is ₹ 75 per order. The cost of carrying inventory is 7·5% per year of the cost of the product. Find (i) the economic lot size, (ii) the total optimal cost (including the capital cost) and (iii) the total weekly profit, if the item is sold for ₹ 55 per unit. 10 marks (e) On the average, 96 patients per 24-hour day require the service of an emergency clinic. Also, on the average, a patient requires 10 minutes of active attention. Assume that the facility can handle only one emergency at a time. Suppose that it costs the clinic ₹ 1,000 per patient treated to obtain an average serving time of 10 minutes, and that each minute of decrease in this average time would cost the clinic ₹ 100 per patient treated. How much would have to be budgeted by the clinic to decrease the average size of the queue from 1 1/3 patients to 1/2 patient? 10 marks

Answer approach & key points

Solve each sub-part systematically with clear problem identification and step-by-step working. For (a), derive the OC function and compute detection probability; for (b), define rectifying inspection and derive AOQ, AOQL, ATI expressions; for (c), apply log-normal transformation and use standard normal tables; for (d), apply EOQ model with all cost components; for (e), use M/M/1 queuing formulas to find service rate changes and budget implications. Allocate approximately 20% time each to parts (a), (b), (c), (d), and (e) respectively, with extra care on derivations in (a) and (b) where method rigor matters most.

  • (a) Significance of OC curves in assessing Type I/II errors and chart sensitivity; correct derivation of OC function P(|X̄-μ₀|<3σ/√n | μ=μ₁) using normal distribution; calculation of β = P(Z < 1) - P(Z < -5) ≈ 0.1587 for n=6, μ₁-μ₀=2σ
  • (b) Definition of rectifying inspection as 100% inspection of rejected lots; derivation of AOQ = p·Pa·(N-n)/N, AOQL, and ATI = n·Pa + N(1-Pa) for single sampling plan; explanation of process average quality improvement
  • (c)(i) Log-normal transformation: ln(10000)=9.2103, Z=(9.2103-10)/1.5=-0.526, P(T>10000)=1-Φ(-0.526)=0.7009
  • (c)(ii) Find t where P(T>t)=0.99: Φ⁻¹(0.01)=-2.326, ln(t)=10-3.489=6.511, t=671.5 hours
  • (d) EOQ calculation: D=400×52=20800, S=75, H=3.75, EOQ=√(2×20800×75/3.75)=912 units; total cost=₹2,08,000+₹17,100+₹17,100=₹2,42,200; weekly profit=400×5-₹4,658=₹1,342
  • (e) M/M/1 queue: λ=4/hr, current μ=6/hr (Lq=4²/(6×2)=1.33), target μ=8/hr (Lq=16/32=0.5); budget increase from ₹1,000 to ₹1,200 per patient, total budget ₹1,20,000 for 100 patients/day
Q2
50M derive Statistical Quality Control and Reliability Theory

(a) (i) What are control charts by variables and control charts by attributes? 5 marks (ii) Derive the control limits for the construction of control charts for the mean and variability based on sample standard deviation. 15 marks (b) (i) State the assumptions involved under sampling inspection plans by variables and describe the operating procedure of a single sampling plan by variables under the assumption of normal distribution for a quality characteristic. 5 marks (ii) Establish the relationship between the fraction defective and the acceptance probability under a single sampling plan by variables when the quality characteristic follows a normal distribution with mean μ and variance σ², where σ² is unknown, and when an upper specification limit is specified. Using the relationship, obtain the formula for finding the parameters of the sampling plan. 10 marks (c) (i) Given a system consisting of n components, define the state vector and the structure function of the system. What do they indicate? 5 marks (ii) Defining (1) a series system, (2) a parallel system and (3) a k-out-of-n system, obtain the associated expressions for the structure functions and the reliability functions. 10 marks

Answer approach & key points

Begin with clear definitions for (a)(i) distinguishing variables/attributes charts, then rigorously derive control limits for x̄ and s charts using sample standard deviation with proper statistical assumptions. For (b), state assumptions of normality and known/unknown variance, outline the operating procedure, then establish the OC function relationship showing how fraction defective links to acceptance probability via non-central t-distribution when σ² is unknown. For (c), define state vector and structure function mathematically, then derive expressions for series, parallel, and k-out-of-n systems using indicator functions and reliability theory. Allocate approximately 35% time to (a)(ii) derivation, 25% to (b)(ii) relationship establishment, 20% to (c)(ii) system derivations, and remaining 20% to definitional parts.

  • (a)(i) Clear distinction: variables charts for measurable characteristics (x̄, R, s charts) vs attributes charts for countable defects (p, np, c, u charts) with examples from Indian manufacturing
  • (a)(ii) Derivation of x̄ chart limits using s/c₄ as σ estimator: UCL/LCL = x̄̄ ± A₃s̄; s chart limits: UCL = B₄s̄, LCL = B₃s̄ with constants derived from χ² distribution
  • (b)(i) Assumptions: normality, single upper/lower specification limit, known or unknown σ; operating procedure: sample selection, computation of sample mean, comparison with acceptance criterion
  • (b)(ii) Relationship: p = P(X > U) = 1 - Φ((U-μ)/σ) for upper specification; acceptance probability Pa = P(accept|p) via non-central t when σ unknown; derivation of n and k parameters via producer/consumer risk points
  • (c)(i) State vector x = (x₁,...,xₙ) where xᵢ ∈ {0,1} indicates component state; structure function φ(x) ∈ {0,1} indicates system state; φ(x) = 1 iff system functions
  • (c)(ii) Series: φ(x) = Πxᵢ, Rₛ(t) = ΠRᵢ(t); Parallel: φ(x) = 1 - Π(1-xᵢ), Rₚ(t) = 1 - Π(1-Rᵢ(t)); k-out-of-n: φ(x) = 1 if Σxᵢ ≥ k, reliability via binomial/Beta or recursive formula
Q3
50M solve Operations Research and Simulation

(a) A company manufactures 30 items per day. The sale of those items depends upon demand which has the following distribution : | Sale (units) | 27 | 28 | 29 | 30 | 31 | 32 | |-------------|----|----|----|----|----|----| | Probability | 0·10 | 0·15 | 0·20 | 0·35 | 0·15 | 0·05 | The production cost and selling price of each unit are ₹ 400 and ₹ 500 respectively. Any unsold product is to be disposed off at a loss of ₹ 150 per unit. There is a penalty of ₹ 50 per unit if the demand is not met. Use the following random numbers to estimate total profit/loss for the company for the next 10 days : 23, 99, 65, 99, 95, 01, 79, 11, 16, 10 If the company decides to produce 20 items per day, what is the advantage or disadvantage to the company? (15 marks) (b) A company has four plants P₁, P₂, P₃ and P₄ from which it supplies to three markets M₁, M₂ and M₃. Determine the optimal transportation plan from the following data giving the plant to market shifting costs, quantities available at each plant and quantities required at each market : | Market ↓ | P₁ | P₂ | P₃ | P₄ | Required at market | |:---|:---:|:---:|:---:|:---:|:---:| | M₁ | 19 | 14 | 23 | 11 | 11 | | M₂ | 15 | 16 | 12 | 21 | 13 | | M₃ | 30 | 25 | 16 | 39 | 19 | | Available at plant | 6 | 10 | 12 | 15 | 43 | (15 marks) (c) On January 1 (this year), brands A, B and C of a commodity had 40, 40 and 20 percent of the market share. Basing upon a market research, it is compiled that brand A retains 90 percent of its customers, while gaining 5 percent of B's customers and 10 percent of C's customers. Brand B retains 85 percent of its customers, while gaining 5 percent of A's customers and 7 percent of C's customers. Brand C retains 83 percent of its customers and gains 5 percent of A's customers and 10 percent of B's customers. What will be each brand's share on January 1 (next year) and what will be each brand's share in the market at equilibrium? (20 marks)

Answer approach & key points

Solve all three sub-parts systematically: for (a) set up the Monte Carlo simulation with correct random number intervals and profit/loss calculations; for (b) apply the transportation algorithm (VAM for IBFS then MODI/UV method for optimization); for (c) construct the transition probability matrix and compute next year's shares, then solve for steady-state equilibrium using πP = π. Allocate approximately 30% time to (a), 30% to (b), and 40% to (c) given the 20 marks weightage for part (c). Present all working clearly with tabular formats where appropriate.

  • For (a): Correctly establish random number intervals for demand simulation (00-09→27, 10-24→28, 25-44→29, 45-79→30, 80-94→31, 95-99→32) and calculate profit/loss for each of 10 days using given random numbers
  • For (a): Compute total profit for 10 days with production=30, then recompute for production=20 to compare advantage/disadvantage with clear numerical conclusion
  • For (b): Obtain initial basic feasible solution using Vogel's Approximation Method (VAM) and verify degeneracy condition (m+n-1=6 basic cells)
  • For (b): Apply MODI/UV method to test optimality and iterate if needed to reach optimal transportation schedule with minimum total cost
  • For (c): Construct correct transition probability matrix from customer retention and switching data, then compute January 1 next year shares by matrix multiplication
  • For (c): Set up and solve system of linear equations πA=π, πB=π, πC=π with πA+πB+πC=1 to find equilibrium market shares
Q4
50M solve Game Theory, Linear Programming and Quality Control

(a) Solve the game whose payoff matrix is $$ \begin{bmatrix} -1 & -2 & 8 \\ 7 & 5 & -1 \\ 6 & 0 & 12 \end{bmatrix} $$ (15 marks) (b) Use the penalty (Big M) method to solve the following linear programming problem : Minimize Z = 5x₁ + 3x₂ subject to the constraints 2x₁ + 4x₂ ≤ 12 2x₁ + 2x₂ = 10 5x₁ + 2x₂ ≥ 10 x₁, x₂ ≥ 0 (15 marks) (c) (i) Distinguish between a nonconforming unit and a nonconformity. State the appropriate conditions for constructing a control chart for nonconformities and derive the control limits for a control chart based on the average number of nonconformities per inspection unit. (2+8=10 marks) (ii) Describe the operating procedure of unit-by-unit sequential sampling plan by attributes. What is the unique feature of a sequential sampling plan? (5 marks) (iii) The time to failure for an electronic component used in a flat panel display unit is satisfactorily modelled by a Weibull distribution with the shape parameter β = ½ and the scale parameter θ = 5000 hours. Find the mean time to failure and the fraction of component that is expected to survive beyond 20000 hours. (2+3=5 marks)

Answer approach & key points

The directive 'solve' demands complete working with optimal strategies and values for (a) and (b), while (c) requires theoretical exposition with derivations and calculations. Allocate approximately 35-40% time to part (a) given its 15 marks and computational complexity, 30% to part (b) for the Big M method iterations, and 30% to part (c) distributed as 10 marks for (c)(i), 5 marks for (c)(ii), and 5 marks for (c)(iii). Structure with clear part-wise headings, showing all matrix operations, simplex tableaus, and control limit derivations.

  • For (a): Identify the game has no saddle point, check for dominance, reduce using graphical method or solve 2×2 subgames, verify mixed strategy solution with value of game V = 17/5 ≈ 3.4
  • For (b): Convert to standard form by adding slack, surplus and artificial variables; use Big M penalty method with correct simplex iterations showing entering and leaving variables
  • For (c)(i): Define nonconforming unit as item with ≥1 nonconformity vs nonconformity as specific instance of non-fulfilment; state Poisson assumption for c-chart; derive UCL = c̄ + 3√c̄, LCL = max(0, c̄ - 3√c̄)
  • For (c)(ii): Describe sequential sampling with acceptance/rejection/continue regions; unique feature is ASN (average sample number) being smaller than fixed sampling for same protection
  • For (c)(iii): Calculate MTTF = θΓ(1+1/β) = 5000×Γ(3) = 10000 hours; survival probability S(20000) = exp[-(20000/5000)^0.5] = e^(-2) ≈ 0.1353 or 13.53%
Q5
50M Compulsory explain Regression diagnostics and demographic statistics

(a) Explain the multicollinearity problem in a regression model. What are its consequences? State the different indicators of multicollinearity and explain. 10 marks (b) Establish the relationship among crude birthrate, general fertility rate and total fertility rate in the context of continuous data. Also, mention the properties of these fertility rates. 10 marks (c) What are the implications of using stable versus quasi-stable population assumption in demographic modelling? 10 marks (d) Discuss the problem of heteroscedasticity. Given that $Y_i = \alpha + \beta X_i + U_i$ with $E(U_i^2) = K^2X_i^2$, prove that OLS estimates of $\alpha$ and $\beta$ possess greater variance than OLS estimates of the transformed version of original model. 10 marks (e) What does it imply by validity of a test? Distinguish between the concepts of validity and reliability. 10 marks

Answer approach & key points

Begin with a brief introduction acknowledging that regression diagnostics and demographic measures are foundational to applied statistical analysis in Indian economic planning and population studies. Allocate approximately 20% time to each sub-part given equal 10-mark weighting: for (a) explain multicollinearity with consequences and indicators like VIF and condition index; for (b) derive the mathematical relationship CBR = GFR × (P_F/P) and connect to TFR via GFR = TFR × (1/m) where m is mean age of childbearing; for (c) contrast stable population (constant fertility/mortality, fixed age distribution) versus quasi-stable (gradually changing vital rates) with implications for Indian population projections; for (d) prove the variance inequality using weighted least squares transformation with weights 1/X_i; for (e) define validity (measuring what it claims) versus reliability (consistency) with psychometric examples. Conclude by synthesizing how diagnostic rigor ensures robust policy-relevant demographic modeling.

  • For (a): Definition of multicollinearity as near-linear dependence among regressors; consequences including inflated variances, unstable coefficients, t-statistic deflation; indicators—VIF > 10, condition number > 30, high R² but insignificant t-ratios, correlation matrix examination
  • For (b): Derivation showing CBR = GFR × (proportion of women in reproductive ages) = TFR × (1/m) × (P_F/P); properties—CBR is crude and age-structure dependent, GFR refines by restricting to women 15-49, TFR is age-standardized and period synthetic
  • For (c): Stable population implies Lotka's equation with constant rates leading to fixed age distribution and exponential growth; quasi-stable allows slowly changing rates with nearly stable age distribution; implications for Indian Census projections, intercensal estimation, and momentum effects
  • For (d): Heteroscedasticity as non-constant error variance; transformation to Y_i/X_i = α/X_i + β + U_i/X_i with homoscedastic errors; proof that Var(β̂_OLS) > Var(β̂_WLS) using Gauss-Markov theorem or direct variance comparison
  • For (e): Validity as accuracy of measurement (content, criterion, construct validity); reliability as precision/repeatability (test-retest, internal consistency); distinction—validity concerns systematic error, reliability concerns random error; trade-offs in educational testing and NSSO survey instruments
Q6
50M calculate Demographic statistics and econometric identification

(a) On the basis of the figures given below, calculate the age-specific death rates (ASDRs) for all the age groups. Also, calculate the crude death rate (CDR) on the basis of ASDRs: | Age group (in years) | 0-10 | 10-30 | 30-50 | 50-70 | 70 and above | |---|---|---|---|---|---| | Population | 10000 | 18000 | 26000 | 20000 | 5000 | | Number of deaths | 220 | 40 | 62 | 350 | 2000 | It was later discovered that two individuals, aged 47 and 54, were incorrectly recorded as being 37 and 45, while compiling the above table. Recalculate the ASDRs and CDR based on the corrected age data. (All calculations are up to 3 decimals only.) 15 marks (b) Discuss the problem of identification with an example. State the rank and order conditions of identification. Check the identifiability of the following structural model: y₁ = α₁ + β₁₂y₂ + β₁₃y₃ + γ₁₁x₁ + γ₁₂x₂ + u₁ y₂ = α₂ + β₂₃y₃ + γ₂₁x₁ + γ₂₂x₂ + u₂ y₃ = α₃ + β₃₁y₁ + γ₃₁x₁ + γ₃₂x₂ + u₃ y₄ = β₄₁y₁ + β₄₂y₂ + β₄₃x₃ + u₄ 15 marks (c) Prepare a life table for an age group from age 50 to age 60 of a specific population. Assume that there are 10000 persons living at age 50 and the probability of death within age x to x+1 is given as qₓ = 0·001+0·0002x for x = 50, 51, ..., 60. Prepare the life table with columns x, lₓ, qₓ, dₓ and Lₓ for x = 50, 51, 52, ..., 60. 20 marks

Answer approach & key points

Begin with precise calculations for part (a), allocating approximately 30% of time to compute original and corrected ASDRs/CDR with proper data reallocation. Devote 30% to part (b) discussing identification using a concrete econometric example (e.g., supply-demand model), stating order and rank conditions clearly, then systematically checking each equation's identifiability. Reserve 40% for part (c) constructing the complete life table with all five columns, showing iterative calculations for lₓ, dₓ and Lₓ. Present each part separately with clear headings and maintain 3-decimal precision throughout.

  • Part (a): Calculate original ASDRs (deaths/population × 1000) for all five age groups and CDR (total deaths/total population × 1000); then recalculate after correctly reassigning deaths from 30-50 to 50-70 age group (47-year-old to 50-70, 54-year-old from 30-50 to 50-70)
  • Part (b): Explain identification problem using simultaneous equations bias example (e.g., price-quantity in agricultural markets); state order condition (K ≥ k-1) and rank condition (at least one non-zero determinant of order M-1)
  • Part (b): Apply order and rank conditions to check identifiability of all four structural equations, noting M=4 endogenous, K=3 exogenous variables; identify y₁ as overidentified, y₂ and y₃ as unidentified, y₄ as exactly identified
  • Part (c): Construct life table using l₅₀=10000, computing qₓ=0.001+0.0002x for x=50 to 60, then dₓ=lₓ×qₓ, lₓ₊₁=lₓ-dₓ, and Lₓ=(lₓ+lₓ₊₁)/2 for each age
  • Part (c): Present final life table with all five columns (x, lₓ, qₓ, dₓ, Lₓ) showing declining survivorship pattern typical of Indian mortality experience in 50-60 age range
Q7
50M calculate Index numbers, logistic growth model, agricultural statistics

(a) Explain the concept of index number. Calculate the Fisher's ideal index number from the following data and verify that whether it satisfies time reversal and factor reversal tests : (10 marks) (b) The population growth of a city is modelled using logistic growth model with a carrying capacity of K = 10000000. The population data (in thousands) is provided at 2-year intervals from 2014 (taken as t = 0) to 2024 (t = 10) : (i) Estimate the two parameters of the logistic growth model. (16 marks) (ii) Using the estimated model, predict the population of the city for the year 2026. (4 marks) (16+4=20 marks) (c) Discuss the agricultural statistics relating to area and yield in our country. Also, point out the need and importance of agricultural statistics. (15 marks)

Answer approach & key points

Begin with a concise definition of index numbers for part (a), then proceed to calculate Fisher's ideal index with proper data tabulation and test verification. For part (b), set up the logistic model linearization, estimate parameters using regression on transformed data, then predict for 2026. For part (c), structure the discussion around India's agricultural statistical system—mentioning Land Use Statistics, Area and Production Statistics, and agencies like DES and NSSO. Allocate approximately 20% time to (a), 45% to (b), and 35% to (c) based on marks distribution.

  • Part (a): Correct formula for Fisher's ideal index as geometric mean of Laspeyres and Paasche; proper calculation with given data; verification of time reversal (P01 × P10 = 1) and factor reversal (P01 × Q01 = Value ratio)
  • Part (b)(i): Linearization of logistic model as ln[(K-P)/P] = lnβ - αt; estimation of α and β via least squares on transformed variables; correct handling of K=10000 (in thousands)
  • Part (b)(ii): Substitution of t=12 (for year 2026) into estimated logistic equation; proper back-transformation to obtain population prediction
  • Part (c): Discussion of area statistics—gross sown area, net sown area, cropping intensity; yield statistics—yield per hectare, production estimates; mention of Timely Reporting Scheme and Crop Cutting Experiments
  • Part (c): Need for agricultural statistics—food security planning, MSP fixation, crop insurance (PMFBY), export-import policy; importance for Sustainable Development Goals and Doubling Farmers' Income initiative
Q8
50M describe Time series, T-score analysis, 2SLS estimation

(a) Define time series. For a moving-average process with weights {a₁, a₂, ..., aₘ} of random components {eᵢ, i = 1, 2, ...}, where eᵢ's are i.i.d. N(0, σ²), obtain the correlogram function. Find its form, when all the weights are equal and their sum is 1. (15 marks) (b) The marks obtained by student A in Mathematics and Language tests of maximum marks 150 each are 120 and 105 respectively. Find out in which subject, student A is more able as compared to other students based on the measure of T score. The following table gives a sample of marks obtained by 15 students of the same class : Score in Mathematics | Score in Language ---|--- 100 | 67 75 | 63 88 | 73 85 | 77 92 | 60 94 | 53 93 | 50 84 | 48 67 | 38 96 | 73 100 | 36 102 | 45 94 | 47 73 | 39 83 | 56 (15 marks) (c) Describe the 2-stage least squares (2SLS) method of estimation of parameters in linear regression model. Also, state the assumptions and discuss its properties. (20 marks)

Answer approach & key points

The directive 'describe' demands systematic exposition with technical precision. Structure: (a) 30% time/space—define time series rigorously, derive MA(m) autocorrelation structure, and simplify to uniform weights case showing the triangular decay pattern; (b) 30%—calculate sample means and standard deviations, compute T-scores for both subjects, and interpret relative standing; (c) 40%—detail 2SLS algorithm (first-stage reduced form, second-stage structural), list full assumptions (instrument relevance, exogeneity, rank condition), and prove consistency/asymptotic normality. Conclude with comparative assessment of 2SLS vs OLS in simultaneous equations contexts relevant to Indian economic policy evaluation.

  • For (a): Formal definition of time series as ordered sequence of random variables; derivation of autocovariance γ(k) = σ²Σaᵢaᵢ₊ₖ for MA(m) with truncation; correlogram ρ(k) = γ(k)/γ(0); special case aᵢ = 1/m yielding ρ(k) = (m−|k|)/m for |k| < m and zero otherwise
  • For (b): Correct computation of sample mean (x̄_M = 87.4, x̄_L = 54.2) and sample standard deviation (s_M ≈ 10.47, s_L ≈ 12.38); T-score formula T = 50 + 10×(X−X̄)/S; calculation yielding T_M ≈ 81.2 and T_L ≈ 91.1; correct interpretation that higher T-score in Language indicates better relative performance despite lower absolute marks
  • For (c): Complete 2SLS procedure—stage 1 regress endogenous regressors on all exogenous/instrumental variables, stage 2 use fitted values in structural equation; explicit assumptions (linearity, instrument exogeneity E(Z'u)=0, relevance rank E(Z'X) full column, no perfect multicollinearity)
  • For (c): Properties derivation—consistency via law of large numbers and continuous mapping theorem, asymptotic normality with variance σ²(X'P_ZX)⁻¹ where P_Z is projection matrix, comparison with OLS inconsistency under simultaneity
  • For (c): Practical illustration such as estimating agricultural supply response where price is endogenous—using rainfall/transport cost as instruments, relevant to Indian agricultural policy analysis

Practice any of these questions

Write your answer, get it evaluated against UPSC's real rubric in seconds.

Start free evaluation →