Q1 50M Compulsory prove Probability distributions and statistical inference
(a) Let X and Y be independent random variables with exponential distribution having respective means $\frac{1}{\lambda_1}$ and $\frac{1}{\lambda_2}$, $\lambda_1 > 0, \lambda_2 > 0$. Find E [max (X, Y)]. (10 marks)
(b) Using Central Limit Theorem, show that
$$\lim_{n \to \infty} e^{-n} \sum_{k=0}^{n} \frac{n^k}{k!} = \frac{1}{2}$$ (10 marks)
(c) An unbiased six-sided die is thrown twice. Let X denote the smaller of the scores obtained. Then show that the probability mass function (p.m.f.) of X is given by :
$$p_X(x) = \frac{13-2x}{36}, \quad x = 1, 2, ..., 6$$
$$= 0, \quad \text{otherwise.}$$ (10 marks)
(d) Let T₁ and T₂ be two unbiased estimators of θ with Var(T₁) = Var(T₂), then show that
Corr(T₁, T₂) ≥ 2e – 1,
where e is the efficiency of each estimator. (10 marks)
(e) An urn contains 5 marbles of which θ are white and the others black. In order to test null hypothesis H₀ : θ = 3 versus alternative hypothesis H₁ : θ = 4, two marbles are drawn at random. H₀ is rejected if both the marbles are white, otherwise H₀ is accepted.
Show that probability of type I error in case of without replacement and with replacement schemes, both are less than 0·40, but power of the test under with replacement is higher than that of under without replacement scheme. (10 marks)
Answer approach & key points
Prove each of the five results systematically, allocating approximately 2 minutes per mark (20 minutes total). For (a), use the identity E[max(X,Y)] = E[X] + E[Y] - E[min(X,Y)] or direct integration; for (b), recognize the Poisson sum and apply CLT with continuity correction; for (c), enumerate favorable outcomes for minimum value; for (d), apply Cauchy-Schwarz and efficiency definition; for (e), compute hypergeometric vs binomial probabilities. Present each proof with clear statement of assumptions, step-by-step derivation, and boxed final result.
- (a) Correct setup using E[max(X,Y)] = ∫∫ max(x,y)f_X(x)f_Y(y)dxdy or equivalent identity with min(X,Y) ~ Exp(λ₁+λ₂)
- (b) Identification of Poisson(n) probability mass function and application of CLT with continuity correction to show P(S_n ≤ n) → Φ(0) = 1/2
- (c) Enumeration of outcomes where min equals k: (k,k), (k,j) for j>k, (i,k) for i>k, yielding count (13-2k) for each k = 1,...,6
- (d) Use of Var(T₁+T₂) ≥ 0 and efficiency definition e = [CRLB]/Var(T₁) to establish the inequality Corr(T₁,T₂) ≥ 2e-1
- (e) Type I error: P(reject|H₀) = C(3,2)/C(5,2) = 0.3 (without replacement) vs (3/5)² = 0.36 (with replacement); Power: P(reject|H₁) = C(4,2)/C(5,2) = 0.6 vs (4/5)² = 0.64, showing higher power with replacement
Q2 50M construct Sequential probability ratio test and order statistics
(a) Let a random variable X have exponential distribution with mean 1/θ, θ > 0. To test H₀ : θ = 3 against H₁ : θ = 2, construct sequential probability ratio test. Show that probability of terminating the test at the first stage when null hypothesis is true is 1 – 8/27 ((A–B)/AB), where B and A, B < A, are stopping bounds. (20 marks)
(b) Each Sunday a fisherman visits one of three possible locations near his home : he goes to the sea with probability 1/2, to a river with probability 1/4, or to a lake with probability 1/4. If he goes to the sea there is an 80% chance that he will catch fish; corresponding figures for the river and the lake are 40% and 60% respectively.
(i) Find the probability that, on a given Sunday, he catches fish.
(ii) If, on a particular Sunday, he comes home without catching anything, determine the most likely place that he has been to. (5+10=15 marks)
(c) Let X₁ < X₂ < X₃ be the order statistics from uniform population having probability density function
f(x; θ) = 1/θ, 0 < x < θ.
Show that 4X₁ is an unbiased estimator of θ. (15 marks)
Answer approach & key points
Construct the sequential probability ratio test for part (a) by deriving the likelihood ratio and identifying stopping bounds, allocating approximately 40% of effort given its 20 marks. For part (b), apply Bayes' theorem to solve the probability and posterior location problem, spending ~30% of time. For part (c), derive the distribution of the first order statistic and verify unbiasedness, using the remaining ~30%. Present derivations step-by-step with clear probabilistic reasoning throughout.
- Part (a): Derive likelihood ratio Λₙ = (3/2)ⁿ exp(-∑Xᵢ/6) for SPRT with stopping bounds A and B, and show termination probability at first stage under H₀ equals 1 − P(B < (3/2)exp(−X₁/6) < A)
- Part (a): Evaluate P(termination at stage 1 | H₀) = 1 − [exp(−6ln(2A/3)) − exp(−6ln(2B/3))] and simplify to 1 − (8/27)((A−B)/AB) using exponential CDF
- Part (b)(i): Apply total probability theorem: P(catch) = (1/2)(0.8) + (1/4)(0.4) + (1/4)(0.6) = 0.65
- Part (b)(ii): Use Bayes' theorem to find P(sea|no catch) = 0.1/0.35, P(river|no catch) = 0.15/0.35, P(lake|no catch) = 0.1/0.35; identify river as most likely location
- Part (c): Derive PDF of X₍₁₎ as f₍₁₎(x) = 3(θ−x)²/θ³ for 0 < x < θ, compute E(X₍₁₎) = θ/4, and conclude E(4X₍₁₎) = θ proving unbiasedness
Q3 50M solve Probability theory and statistical inference
(a) (i) How large a sample must be taken in order that the probability will be at least 0·90 that the sample mean will be within 0·4 – neighbourhood of the population mean, provided the population standard deviation is 2 ? (8 marks)
(ii) Examine whether the weak law of large numbers holds for the sequence {Xₖ} of independent random variables defined as follows :
$$P(X_k = -1 - \frac{1}{k}) = \frac{1}{2}\left\{1 - \left(1 - \frac{1}{k^2}\right)^{1/2}\right\},$$
$$P(X_k = 1 + \frac{1}{k}) = \frac{1}{2}\left\{1 + \left(1 - \frac{1}{k^2}\right)^{1/2}\right\}.$$ (7 marks)
(b) Theoretical probabilities in the four cells of a multinomial distribution are $\frac{2+\theta}{4}$, $\frac{1-\theta}{4}$, $\frac{1-\theta}{4}$ and $\frac{\theta}{4}$, whereas the observed frequencies are 108, 27, 30 and 8 respectively, then estimate θ by maximum likelihood method. Also, obtain the standard error of the estimate. (20 marks)
(c) If X is a random variable with characteristic function
$$\varphi(t) = \begin{cases} 1-|t|, & |t| \leq 1 \\ 0, & \text{otherwise}, \end{cases}$$
then obtain the corresponding probability density function. (15 marks)
Answer approach & key points
Solve this multi-part numerical problem by allocating approximately 15 minutes to part (a)(i) on sample size determination using CLT, 15 minutes to part (a)(ii) on verifying WLLN conditions, 25 minutes to part (b) on MLE estimation and standard error computation for multinomial data, and 20 minutes to part (c) on deriving PDF from characteristic function via Fourier inversion. Begin each part with clear statement of the statistical principle being applied, show all computational steps explicitly, and conclude with precise numerical answers or definitive conclusions.
- Part (a)(i): Apply Central Limit Theorem with z₀.₉₀ = 1.645 to obtain n ≥ (1.645 × 2/0.4)² = 67.65 → n = 68
- Part (a)(ii): Verify E(Xₖ) = 1/k and Var(Xₖ) = 1 - 1/k², then apply Chebyshev or Kolmogorov's criterion to establish WLLN holds
- Part (b): Formulate multinomial likelihood L(θ), take log-likelihood, solve dℓ/dθ = 0 to get θ̂ = (2×108 - 27 - 30 + 2×8)/(108+27+30+8) = 0.5, then compute Fisher information for SE(θ̂)
- Part (c): Apply Fourier inversion formula f(x) = (1/2π)∫₋₁¹ (1-|t|)e⁻ⁱᵗˣ dt, evaluate to obtain f(x) = (1/πx²)(1 - cos x) = (1/2π)sinc²(x/2) for x ≠ 0, with f(0) = 1/2π
- Demonstrate understanding that characteristic function φ(t) = (1-|t|)₊ corresponds to triangular distribution on [-1,1] in frequency domain, yielding Fejér kernel/sinc² in density domain
Q4 50M discuss Statistical estimation and hypothesis testing
(a) Consider Poisson distribution
$$P_{\theta}(X = j) = \frac{e^{-\theta} \theta^{j}}{j!} = p_{j}, j = 0, 1, 2, ....$$
Let $f_{j}$ be the frequency for X = j and $E(f_{j}) = m_{j} = np_{j}$. Discuss how you obtain minimum chi-square estimate for $\theta$. Does minimum chi-square method necessarily yield a sufficient statistic even if it exists ? (20 marks)
(b) (i) Let the joint probability density function of X and Y be
$$f(x, y) = C . \exp \{-(4x^{2} + 9y^{2} - xy)\},$$
where C is a constant. Find E(X), V(X), E(Y), V(Y) and the correlation coefficient between X and Y. (10 marks)
(ii) If $X_{1}, X_{2}, ..., X_{6}$ are independent random variables such that
$$P(X_{i} = -1) = P(X_{i} = 1) = \frac{1}{2}, i = 1, 2, ..., 6,$$
then obtain the value of
$$P\left[\sum_{i=1}^{6} X_{i} = 4\right].$$ (5 marks)
(c) The following data present the time (in minutes), that a commuter had to wait to catch a bus to reach his destination :
Use the sign-test at 0·05 level of significance to test the claim of the bus operators that commuters do not have to wait for more than 15 minutes before the bus is made available to them.
[Given Z₍₀.₀₂₅₎ = 1·96, Z₍₀.₀₅₎ = 1·645] (15 marks)
Answer approach & key points
The directive 'discuss' in part (a) requires a balanced analytical treatment with derivation and critical evaluation, while parts (b) and (c) are primarily computational. Allocate approximately 40% of effort to part (a) given its 20 marks and theoretical depth, 30% to part (b) covering both (i) bivariate normal properties and (ii) probability calculation, and 30% to part (c) for the non-parametric test. Structure as: brief theoretical exposition for (a), systematic derivations for (b), and complete hypothesis testing procedure for (c).
- For (a): Derivation of minimum chi-square estimator for Poisson parameter by minimizing Σ(f_j - np_j)²/(np_j) with respect to θ, leading to the estimating equation
- For (a): Critical discussion that minimum chi-square method does NOT necessarily yield sufficient statistics—contrast with MLE which preserves sufficiency via factorization theorem; cite example where MCS estimator differs from sufficient statistic
- For (b)(i): Recognition of bivariate normal form, completion of squares to identify μ_x = μ_y = 0, extraction of variances σ_x² = 9/35, σ_y² = 4/35 and covariance to find ρ = 1/6
- For (b)(ii): Identification that ΣX_i follows distribution of (number of +1's) - (number of -1's), equivalent to 2×Binomial(6,½) - 6, yielding P(ΣX_i=4) = P(5 successes) = 6/64 = 3/32
- For (c): Correct application of sign test with null hypothesis H₀: median ≤ 15 vs H₁: median > 15, counting positive signs (values > 15), using normal approximation with continuity correction, and proper conclusion based on Z = 1.645 critical value
Q5 50M Compulsory solve Linear models, multivariate normal, experimental design, sampling
(a) Define general linear model with usual assumptions. If y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, where u₁, u₂, u₃ are mutually independent random variables with mean zero and variance σ², then find the least square estimators of β₁ and β₂. (10 marks)
(b) Given X ~ N₃(μ, Σ), where μ = (2, 4, 3)' and Σ = ⎛8 2 3⎞
⎜2 4 1⎟
⎝3 1 3⎠
(i) find the regression function of X₁ on X₂ and X₃, and (ii) compute the conditional variance of X₁ given X₂ and X₃. (10 marks)
(c) What is a uniformity trial ? Explain how it can be used to determine optimum shape and size. (10 marks)
(d) In a 2⁶ – factorial experiment, the key block is given as : (1), ab, cd, ef, ace, abef, abcd, bce, cdef, acf, ade, abcdef, bde, bcf, adf, bdf. Identify the confounded effects. (10 marks)
(e) If the coefficients of variation of x and y are equal and the correlation coefficient between x and y is ρ = 2/3, compute the efficiency of ratio estimator relative to the mean of a simple random sample. (10 marks)
Answer approach & key points
This is a computational-cum-descriptive question requiring precise derivations and calculations across five sub-parts. Allocate approximately 20% time to part (a) for matrix formulation of GLM and LSE derivation, 20% to part (b) for multivariate normal conditional distributions, 15% to part (c) for explaining uniformity trials with agricultural field trial context, 25% to part (d) for systematic identification of confounded effects in 2⁶ factorial, and 20% to part (e) for ratio estimator efficiency computation. Begin each part with clear statement of method, show all computational steps, and conclude with boxed final answers.
- Part (a): Correct matrix formulation of GLM y = Xβ + u with assumptions E(u)=0, Var(u)=σ²I; proper construction of design matrix X and derivation of LSE β̂ = (X'X)⁻¹X'y yielding β̂₁ = (y₁ - y₂)/2 and β̂₂ = (y₁ + y₂ + 2y₃)/2
- Part (b)(i): Correct partitioning of Σ into Σ₁₁, Σ₁₂, Σ₂₁, Σ₂₂ and computation of regression coefficients β = Σ₁₂Σ₂₂⁻¹ for E(X₁|X₂,X₃) = μ₁ + Σ₁₂Σ₂₂⁻¹(x₂-μ₂, x₃-μ₃)'
- Part (b)(ii): Computation of conditional variance Var(X₁|X₂,X₃) = Σ₁₁ - Σ₁₂Σ₂₂⁻¹Σ₂₁ using Schur complement
- Part (c): Definition of uniformity trial as trial with uniform treatment to assess field variability; explanation of how coefficient of variation and soil heterogeneity index guide selection of plot shape (long narrow for fertility gradient) and size (balancing variance reduction vs cost)
- Part (d): Systematic identification of confounded effects by finding generalized interaction of defining contrasts; recognition that key block corresponds to I = ABCDEF or equivalent 6-factor interaction confounding
- Part (e): Application of ratio estimator efficiency formula RE = (1-ρ²)/(Cₓ²/Cᵧ² + 1 - 2ρCₓ/Cᵧ) with Cₓ = Cᵧ yielding simplified computation; final numerical answer for efficiency
Q6 50M derive ANOVA, sampling techniques, multivariate analysis
(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks)
(b) Describe with examples the technique of two-stage sampling. Obtain the variance of the sample mean under two-stage sampling without replacement. Hence, deduce the variance of the sample mean under : (i) Stratified random sampling, and (ii) Cluster sampling (20 marks)
(c) (i) If X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, where Y₁, Y₂ and Y₃ are uncorrelated random variables and each of which has zero mean and unit standard deviation, find the multiple correlation coefficient between X₃ and X₁, X₂.
(ii) Let X be a 3-dimensional random vector with dispersion matrix Σ = ⎛9 3 3⎞
⎜3 9 3⎟
⎝3 3 9⎠. Determine the first principal component and the proportion of the total variability that it explains. (7+8=15 marks)
Answer approach & key points
Derive the required results systematically across all sub-parts. For (a), establish the linear model and count constraints; for (b), describe two-stage sampling with Indian census/NSSO examples, then derive variance formula and deduce special cases; for (c)(i), compute multiple correlation using matrix algebra; for (c)(ii), find eigenvalues and eigenvectors for PCA. Allocate approximately 30% time to (a), 40% to (b), 15% each to (c)(i) and (c)(ii), ensuring all derivations show complete steps with proper justification.
- For (a): Define the two-way ANOVA model with one observation per cell, identify total contrasts (rk-1), subtract treatment contrasts (k-1 for factor A, r-1 for factor B), and show error contrasts = (r-1)(k-1) using degrees of freedom partition
- For (b): Describe two-stage sampling with NSSO household survey or agricultural census example; derive variance of sample mean under SRSWOR at both stages; deduce stratified random sampling variance by letting second-stage sampling fraction tend to 1
- For (b) continued: Deduce cluster sampling variance by letting first-stage sampling fraction tend to 1, showing how the general formula collapses to known special cases
- For (c)(i): Compute Var(X₁), Var(X₂), Cov(X₁,X₂), Cov(X₃,X₁), Cov(X₃,X₂); set up multiple regression of X₃ on X₁,X₂; calculate R² and multiple correlation coefficient R₃.₁₂
- For (c)(ii): Find eigenvalues of Σ (6, 6, 12), identify first principal component as (1/√3)(1,1,1)′ corresponding to λ=12, and compute proportion of variability as 12/24 = 0.5 or 50%
Q7 50M analyse Design of experiments and multivariate analysis
(a) Consider the following data given for a BIBD with v = b = 4, r = k = 3, λ = 2 and N = 12 : Analyse the design. [Given that : F₃,₅ (0·05) = 5·41] 15
(b) (i) The data matrix of a random sample of size n = 3 from a bivariate normal population BVN (μ₁, μ₂, σ₁², σ₂², ρ) is X = [6 10; 10 6; 8 2]. Test the null hypothesis H₀ : μ = μ₀ against H₁ : μ ≠ μ₀, where μ₀' = (8, 5), at 10% level of significance. [You are given : F₀.₁₀; ₂, ₁ = 49·5, F₀.₁₀; ₁, ₂ = 8·53]
(ii) Suppose n₁ = 11 and n₂ = 12, observations are made on two random vectors X₁ and X₂ which are assumed to have bivariate normal distribution with a common covariance matrix Σ, but possibly different mean vectors μ₁ and μ₂. The sample mean vectors and pooled covariance matrix are X̄₁ = (-1, -1)', X̄₂ = (2, 1)', S_pooled = (7 -1; -1 5). Obtain Mahalanobis sample distance D² and Fisher's linear discriminant function. Assign the observation X₀ = (0, 1)' to either population Π₁ or Π₂. 10+10=20
(c) A sample of size n is drawn with equal probability and without replacement from a population with size N. Let Ŷ_N = Σᵣ₌₁ⁿ aᵣ yᵣ be any linear estimate of the population mean Ȳ_N, where aᵣ are constants and yᵣ denotes the value of the unit included in the sample at the rᵗʰ draw.
(i) Show that Ŷ_N is an unbiased estimate of Ȳ_N if and only if Σᵣ₌₁ⁿ aᵣ = 1
(ii) Under above condition V(Ŷ_N) = (S²/N)[NΣᵣ₌₁ⁿ aᵣ² - 1]
(iii) If aᵣ = 1/n, for what value of n may this variance of the sample mean in simple random sampling without replacement be exactly half the variance of the mean of a random sample of the same size taken with replacement ? 15
Answer approach & key points
The directive 'analyse' demands systematic examination with computational rigour across all sub-parts. Allocate approximately 30% time to part (a) BIBD analysis, 40% to part (b) multivariate tests and discriminant analysis, and 30% to part (c) sampling theory proofs. Structure as: brief identification of appropriate statistical methods for each sub-part → step-by-step computational working with formulae stated → interpretation of results in context → final conclusions with statistical significance statements.
- Part (a): Verify BIBD parameters satisfy λ(v-1) = r(k-1), construct ANOVA table with SST, SSB, SStr, SSE, compute F-ratio and compare with critical value 5.41 for treatment significance
- Part (b)(i): Compute sample mean vector, sample covariance matrix S, Hotelling's T² statistic, convert to F-statistic using F = (n-p)/((n-1)p) × T² with p=2, compare with given critical value
- Part (b)(ii): Calculate Mahalanobis D² = (X̄₁-X̄₂)'S_pooled⁻¹(X̄₁-X̄₂), derive Fisher's linear discriminant function Z = a'X where a = S_pooled⁻¹(X̄₁-X̄₂), compute discriminant scores and classify X₀
- Part (c)(i): Prove unbiasedness by showing E(Ŷ_N) = Ȳ_N requires Σaᵣ = 1 using linearity of expectation and equal probability sampling properties
- Part (c)(ii): Derive variance expression using V(yᵣ) = σ² and Cov(yᵣ, yₛ) = -σ²/(N-1) for r≠s, expand V(Σaᵣyᵣ) and simplify
- Part (c)(iii): Set V(SRSWOR) = ½ V(SRSWR), i.e., (N-n)/(Nn) × S² = ½ × S²/n, solve to get n = N/2
Q8 50M derive Regression, sampling and experimental design
(a) (i) What are orthogonal polynomials ? How do you fit an orthogonal polynomial of degree 'p' ?
(ii) For the model Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, where X_(n×k) is a matrix of rank k (k < n), find out the value of E[Y'(I_n - X(X'X)⁻¹X')Y]. 10+10=20
(b) Consider an artificial population of three farms. Their selection probabilities and the wheat production (in '000 tons) are as follows : Farm unit (i) : 1 2 3; Selection probability (pᵢ) : 0·3 0·2 0·5; Wheat production (yᵢ) : 11 6 25. Draw all possible samples of size 2 with replacement (order is to be considered). Show that Horvitz-Thompson estimator of total wheat production is unbiased. 15
(c) What is a missing plot technique ? Derive the missing value formula for a Latin Square Design. How would you proceed to analyse such a design ? 15
Answer approach & key points
Begin with (a)(i) defining orthogonal polynomials with the orthogonality condition Σφᵢ(x)φⱼ(x)=0 for i≠j, then describe the recurrence relation method for fitting degree p. For (a)(ii), recognize the residual sum of squares form and apply E[u'Mu]=σ²tr(M) to obtain (n-k)σ². In (b), enumerate all 9 ordered samples with replacement, compute πᵢ=1-(1-pᵢ)² for inclusion probabilities, verify E[Ŷ_HT]=Y. For (c), derive the Latin Square missing value formula ŷ=(R+C+T-2G)/((t-1)(t-2)) and outline the adjusted ANOVA procedure. Allocate ~40% time to (a), ~30% each to (b) and (c).
- (a)(i) Definition: orthogonal polynomials satisfy Σφᵢ(x)φⱼ(x)=0 for i≠j over the point set; fitting uses recurrence φᵣ₊₁(x)=(x-aᵣ)φᵣ(x)-bᵣφᵣ₋₁(x) with specific coefficient formulas
- (a)(ii) Recognition that Iₙ-X(X'X)⁻¹X' is the residual maker matrix M; E[Y'MY]=E[u'Mu]=σ²tr(M)=σ²(n-k) using tr(Iₙ)=n and tr(X(X'X)⁻¹X')=k
- (b) Enumeration of 9 ordered samples: (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3) with their probabilities; calculation of first-order inclusion probabilities πᵢ=1-(1-pᵢ)²; verification that Σ(yᵢ/πᵢ)·πᵢ/Σ1 = Y
- (c) Missing plot technique: Yates' method for estimating missing observations by minimizing error sum of squares; derivation using ∂SSE/∂y=0 for Latin Square layout
- (c) Latin Square missing value formula: ŷ = (tRᵢ + tCⱼ + tTₖ - 2G) / [(t-1)(t-2)] where R,C,T are respective totals and G is grand total; analysis proceeds with reduced degrees of freedom and bias correction in treatment SS