Q5 50M Compulsory solve Linear models, multivariate normal, experimental design, sampling
(a) Define general linear model with usual assumptions. If y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, where u₁, u₂, u₃ are mutually independent random variables with mean zero and variance σ², then find the least square estimators of β₁ and β₂. (10 marks)
(b) Given X ~ N₃(μ, Σ), where μ = (2, 4, 3)' and Σ = ⎛8 2 3⎞
⎜2 4 1⎟
⎝3 1 3⎠
(i) find the regression function of X₁ on X₂ and X₃, and (ii) compute the conditional variance of X₁ given X₂ and X₃. (10 marks)
(c) What is a uniformity trial ? Explain how it can be used to determine optimum shape and size. (10 marks)
(d) In a 2⁶ – factorial experiment, the key block is given as : (1), ab, cd, ef, ace, abef, abcd, bce, cdef, acf, ade, abcdef, bde, bcf, adf, bdf. Identify the confounded effects. (10 marks)
(e) If the coefficients of variation of x and y are equal and the correlation coefficient between x and y is ρ = 2/3, compute the efficiency of ratio estimator relative to the mean of a simple random sample. (10 marks)
हिंदी में पढ़ें
(a) सामान्य रैखिक निदर्श को प्रचलित कल्पनाओं सहित परिभाषित कीजिए । यदि y₁ = β₁ + u₁, y₂ = –β₁ + β₂ + u₂, y₃ = –β₂ + u₃, जहाँ u₁, u₂, u₃ परस्पर स्वतंत्र यादृच्छिक चर हैं जिनका माध्य शून्य तथा प्रसरण σ² है, तो β₁ और β₂ के न्यूनतम वर्ग आकलकों को ज्ञात कीजिए । (10 अंक)
(b) दिया गया है कि X ~ N₃(μ, Σ), जहाँ μ = (2, 4, 3)' और Σ = ⎛8 2 3⎞
⎜2 4 1⎟
⎝3 1 3⎠
(i) X₁ का X₂ और X₃ पर समाश्रयण फलन ज्ञात कीजिए, और (ii) X₂ और X₃ के दिए होने पर X₁ के सप्रतिबंध प्रसरण की गणना कीजिए । (10 अंक)
(c) एकसमानता परीक्षण क्या है ? व्याख्या कीजिए कि इसका उपयोग इष्टतम आकृति और आकार ज्ञात करने के लिए कैसे किया जा सकता है । (10 अंक)
(d) एक 2⁶ – बहु-उपादानी प्रयोग में, मुख्य खंडक इस प्रकार दिया गया है : (1), ab, cd, ef, ace, abef, abcd, bce, cdef, acf, ade, abcdef, bde, bcf, adf, bdf. संकीर्ण प्रभावों की पहचान कीजिए । (10 अंक)
(e) यदि x और y के विचरण गुणांक समान हैं और x और y के बीच सहसंबंध गुणांक ρ = 2/3 है, तो अनुपात आकलक की दक्षता सरल यादृच्छिक प्रतिदर्श के माध्य के सापेक्ष परिकलित कीजिए । (10 अंक)
Answer approach & key points
This is a computational-cum-descriptive question requiring precise derivations and calculations across five sub-parts. Allocate approximately 20% time to part (a) for matrix formulation of GLM and LSE derivation, 20% to part (b) for multivariate normal conditional distributions, 15% to part (c) for explaining uniformity trials with agricultural field trial context, 25% to part (d) for systematic identification of confounded effects in 2⁶ factorial, and 20% to part (e) for ratio estimator efficiency computation. Begin each part with clear statement of method, show all computational steps, and conclude with boxed final answers.
- Part (a): Correct matrix formulation of GLM y = Xβ + u with assumptions E(u)=0, Var(u)=σ²I; proper construction of design matrix X and derivation of LSE β̂ = (X'X)⁻¹X'y yielding β̂₁ = (y₁ - y₂)/2 and β̂₂ = (y₁ + y₂ + 2y₃)/2
- Part (b)(i): Correct partitioning of Σ into Σ₁₁, Σ₁₂, Σ₂₁, Σ₂₂ and computation of regression coefficients β = Σ₁₂Σ₂₂⁻¹ for E(X₁|X₂,X₃) = μ₁ + Σ₁₂Σ₂₂⁻¹(x₂-μ₂, x₃-μ₃)'
- Part (b)(ii): Computation of conditional variance Var(X₁|X₂,X₃) = Σ₁₁ - Σ₁₂Σ₂₂⁻¹Σ₂₁ using Schur complement
- Part (c): Definition of uniformity trial as trial with uniform treatment to assess field variability; explanation of how coefficient of variation and soil heterogeneity index guide selection of plot shape (long narrow for fertility gradient) and size (balancing variance reduction vs cost)
- Part (d): Systematic identification of confounded effects by finding generalized interaction of defining contrasts; recognition that key block corresponds to I = ABCDEF or equivalent 6-factor interaction confounding
- Part (e): Application of ratio estimator efficiency formula RE = (1-ρ²)/(Cₓ²/Cᵧ² + 1 - 2ρCₓ/Cᵧ) with Cₓ = Cᵧ yielding simplified computation; final numerical answer for efficiency
Q6 50M derive ANOVA, sampling techniques, multivariate analysis
(a) In a set of two-way classified data according to k levels of factor A and r levels of factor B, there is one observation in each cell. Show that the total number of error contrasts is (r – 1) (k – 1). (15 marks)
(b) Describe with examples the technique of two-stage sampling. Obtain the variance of the sample mean under two-stage sampling without replacement. Hence, deduce the variance of the sample mean under : (i) Stratified random sampling, and (ii) Cluster sampling (20 marks)
(c) (i) If X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, where Y₁, Y₂ and Y₃ are uncorrelated random variables and each of which has zero mean and unit standard deviation, find the multiple correlation coefficient between X₃ and X₁, X₂.
(ii) Let X be a 3-dimensional random vector with dispersion matrix Σ = ⎛9 3 3⎞
⎜3 9 3⎟
⎝3 3 9⎠. Determine the first principal component and the proportion of the total variability that it explains. (7+8=15 marks)
हिंदी में पढ़ें
(a) द्विधा वर्गीकृत आंकड़ों के एक समूह, जिसमें कारक A के k स्तर हैं और कारक B के r स्तर हैं, प्रत्येक कोष्ठक में एक प्रेक्षण है । दर्शाइए कि त्रुटि विपर्यासों की कुल संख्या (r – 1) (k – 1) है । (15 अंक)
(b) द्वि-चरण प्रतिचयन तकनीक का उदाहरणों सहित वर्णन कीजिए । प्रतिस्थापन रहित द्वि-चरण प्रतिचयन के अंतर्गत प्रतिदर्श माध्य का प्रसरण प्राप्त कीजिए । इससे प्रतिदर्श माध्य का प्रसरण : (i) स्तरित यादृच्छिक प्रतिचयन, एवं (ii) गुच्छ प्रतिचयन के अंतर्गत निकालिए । (20 अंक)
(c) (i) यदि X₁ = Y₁ + Y₂, X₂ = Y₂ + Y₃, X₃ = Y₃ + Y₁, जहाँ Y₁, Y₂ और Y₃ असहसंबंधित यादृच्छिक चर हैं तथा इनमें से प्रत्येक का माध्य शून्य एवं मानक विचलन एक है, तो X₃ और X₁, X₂ के बीच बहुसंबंध गुणांक ज्ञात कीजिए ।
(ii) मान लीजिए कि X एक 3-विमीय यादृच्छिक सदिश है जिसका परिक्षेपण आव्यूह Σ = ⎛9 3 3⎞
⎜3 9 3⎟
⎝3 3 9⎠ है । प्रथम मुख्य घटक का एवं इसके द्वारा वर्णित किए गए संपूर्ण परिवर्तनशीलता के भाग का निर्धारण कीजिए । (7+8=15 अंक)
Answer approach & key points
Derive the required results systematically across all sub-parts. For (a), establish the linear model and count constraints; for (b), describe two-stage sampling with Indian census/NSSO examples, then derive variance formula and deduce special cases; for (c)(i), compute multiple correlation using matrix algebra; for (c)(ii), find eigenvalues and eigenvectors for PCA. Allocate approximately 30% time to (a), 40% to (b), 15% each to (c)(i) and (c)(ii), ensuring all derivations show complete steps with proper justification.
- For (a): Define the two-way ANOVA model with one observation per cell, identify total contrasts (rk-1), subtract treatment contrasts (k-1 for factor A, r-1 for factor B), and show error contrasts = (r-1)(k-1) using degrees of freedom partition
- For (b): Describe two-stage sampling with NSSO household survey or agricultural census example; derive variance of sample mean under SRSWOR at both stages; deduce stratified random sampling variance by letting second-stage sampling fraction tend to 1
- For (b) continued: Deduce cluster sampling variance by letting first-stage sampling fraction tend to 1, showing how the general formula collapses to known special cases
- For (c)(i): Compute Var(X₁), Var(X₂), Cov(X₁,X₂), Cov(X₃,X₁), Cov(X₃,X₂); set up multiple regression of X₃ on X₁,X₂; calculate R² and multiple correlation coefficient R₃.₁₂
- For (c)(ii): Find eigenvalues of Σ (6, 6, 12), identify first principal component as (1/√3)(1,1,1)′ corresponding to λ=12, and compute proportion of variability as 12/24 = 0.5 or 50%
Q7 50M analyse Design of experiments and multivariate analysis
(a) Consider the following data given for a BIBD with v = b = 4, r = k = 3, λ = 2 and N = 12 : Analyse the design. [Given that : F₃,₅ (0·05) = 5·41] 15
(b) (i) The data matrix of a random sample of size n = 3 from a bivariate normal population BVN (μ₁, μ₂, σ₁², σ₂², ρ) is X = [6 10; 10 6; 8 2]. Test the null hypothesis H₀ : μ = μ₀ against H₁ : μ ≠ μ₀, where μ₀' = (8, 5), at 10% level of significance. [You are given : F₀.₁₀; ₂, ₁ = 49·5, F₀.₁₀; ₁, ₂ = 8·53]
(ii) Suppose n₁ = 11 and n₂ = 12, observations are made on two random vectors X₁ and X₂ which are assumed to have bivariate normal distribution with a common covariance matrix Σ, but possibly different mean vectors μ₁ and μ₂. The sample mean vectors and pooled covariance matrix are X̄₁ = (-1, -1)', X̄₂ = (2, 1)', S_pooled = (7 -1; -1 5). Obtain Mahalanobis sample distance D² and Fisher's linear discriminant function. Assign the observation X₀ = (0, 1)' to either population Π₁ or Π₂. 10+10=20
(c) A sample of size n is drawn with equal probability and without replacement from a population with size N. Let Ŷ_N = Σᵣ₌₁ⁿ aᵣ yᵣ be any linear estimate of the population mean Ȳ_N, where aᵣ are constants and yᵣ denotes the value of the unit included in the sample at the rᵗʰ draw.
(i) Show that Ŷ_N is an unbiased estimate of Ȳ_N if and only if Σᵣ₌₁ⁿ aᵣ = 1
(ii) Under above condition V(Ŷ_N) = (S²/N)[NΣᵣ₌₁ⁿ aᵣ² - 1]
(iii) If aᵣ = 1/n, for what value of n may this variance of the sample mean in simple random sampling without replacement be exactly half the variance of the mean of a random sample of the same size taken with replacement ? 15
हिंदी में पढ़ें
(a) किसी बी.आई.बी.डी. (BIBD), जहाँ v = b = 4, r = k = 3, λ = 2 और N = 12, के लिए दिए गए निम्नलिखित आँकड़ों पर विचार कीजिए : अभिकल्पना का विश्लेषण कीजिए । [दिया गया है : F₃,₅ (0·05) = 5·41] 15
(b) (i) एक द्विचर प्रसामान्य समष्टि BVN (μ₁, μ₂, σ₁², σ₂², ρ) से लिए गए आमाप n = 3 के एक यादृच्छिक प्रतिदर्श का न्यास मैट्रिक्स X = [6 10; 10 6; 8 2] है। वैकल्पिक परिकल्पना H₁ : μ ≠ μ₀ के विरुद्ध निराकरणीय परिकल्पना H₀ : μ = μ₀, का परीक्षण 10% सार्थकता-स्तर पर कीजिए, जहाँ μ₀' = (8, 5) है। [आपको दिया गया है : F₀.₁₀; ₂, ₁ = 49·5, F₀.₁₀; ₁, ₂ = 8·53]
(ii) मान लीजिए कि दो यादृच्छिक सदिशों X₁ और X₂, जो एक समान सहप्रसरण आव्यूह Σ, किन्तु सम्भवतः भिन्न माध्य सदिशों μ₁ और μ₂ के साथ द्विचर प्रसामान्य बंटन का अनुसरण करते माने जाते हैं, पर n₁ = 11 और n₂ = 12 प्रेक्षण बनाए जाते हैं। प्रतिदर्श माध्य सदिश और संयुक्त सहप्रसरण आव्यूह हैं : X̄₁ = (-1, -1)', X̄₂ = (2, 1)', Sसंयुक्त = (7 -1; -1 5)। महालनोबिस प्रतिदर्श दूरी D² और फिशर के रैखिक विभिक्तकर फलन को प्राप्त कीजिए। प्रेक्षण X₀ = (0, 1)' को या तो समष्टि Π₁ या Π₂ को निर्दिष्ट कीजिए। 10+10=20
(c) N आकार की समष्टि से n आकार का एक प्रतिदर्श समान प्रायिकता एवं प्रतिस्थापन रहित के साथ चुना गया । मान लीजिए कि Ŷ_N = Σᵣ₌₁ⁿ aᵣ yᵣ समष्टि माध्य Ȳ_N का कोई रैखिक आकल है, जहाँ aᵣ अचर हैं और yᵣ rवें ढंग पर प्रतिदर्श में सम्मिलित इकाई का मान है ।
(i) दर्शाइए कि Ŷ_N, Ȳ_N का एक अनभिनत आकल है यदि और केवल यदि Σᵣ₌₁ⁿ aᵣ = 1
(ii) उपर्युक्त प्रतिबंध के अंतर्गत V(Ŷ_N) = (S²/N)[NΣᵣ₌₁ⁿ aᵣ² - 1]
(iii) यदि aᵣ = 1/n, तो n के किस मान के लिए प्रतिस्थापन रहित सरल यादृच्छिक प्रतिचयन में प्रतिदर्शी माध्य का यह प्रसरण उसी आकार के प्रतिस्थापन सहित लिए गए यादृच्छिक प्रतिदर्श के माध्य के प्रसरण का बिल्कुल आधा होगा ? 15
Answer approach & key points
The directive 'analyse' demands systematic examination with computational rigour across all sub-parts. Allocate approximately 30% time to part (a) BIBD analysis, 40% to part (b) multivariate tests and discriminant analysis, and 30% to part (c) sampling theory proofs. Structure as: brief identification of appropriate statistical methods for each sub-part → step-by-step computational working with formulae stated → interpretation of results in context → final conclusions with statistical significance statements.
- Part (a): Verify BIBD parameters satisfy λ(v-1) = r(k-1), construct ANOVA table with SST, SSB, SStr, SSE, compute F-ratio and compare with critical value 5.41 for treatment significance
- Part (b)(i): Compute sample mean vector, sample covariance matrix S, Hotelling's T² statistic, convert to F-statistic using F = (n-p)/((n-1)p) × T² with p=2, compare with given critical value
- Part (b)(ii): Calculate Mahalanobis D² = (X̄₁-X̄₂)'S_pooled⁻¹(X̄₁-X̄₂), derive Fisher's linear discriminant function Z = a'X where a = S_pooled⁻¹(X̄₁-X̄₂), compute discriminant scores and classify X₀
- Part (c)(i): Prove unbiasedness by showing E(Ŷ_N) = Ȳ_N requires Σaᵣ = 1 using linearity of expectation and equal probability sampling properties
- Part (c)(ii): Derive variance expression using V(yᵣ) = σ² and Cov(yᵣ, yₛ) = -σ²/(N-1) for r≠s, expand V(Σaᵣyᵣ) and simplify
- Part (c)(iii): Set V(SRSWOR) = ½ V(SRSWR), i.e., (N-n)/(Nn) × S² = ½ × S²/n, solve to get n = N/2
Q8 50M derive Regression, sampling and experimental design
(a) (i) What are orthogonal polynomials ? How do you fit an orthogonal polynomial of degree 'p' ?
(ii) For the model Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, where X_(n×k) is a matrix of rank k (k < n), find out the value of E[Y'(I_n - X(X'X)⁻¹X')Y]. 10+10=20
(b) Consider an artificial population of three farms. Their selection probabilities and the wheat production (in '000 tons) are as follows : Farm unit (i) : 1 2 3; Selection probability (pᵢ) : 0·3 0·2 0·5; Wheat production (yᵢ) : 11 6 25. Draw all possible samples of size 2 with replacement (order is to be considered). Show that Horvitz-Thompson estimator of total wheat production is unbiased. 15
(c) What is a missing plot technique ? Derive the missing value formula for a Latin Square Design. How would you proceed to analyse such a design ? 15
हिंदी में पढ़ें
(a) (i) लांबिक बहुपद क्या हैं ? 'p' घातीय लांबिक बहुपद का आसंजन आप कैसे करेंगे ?
(ii) निर्देश Y_(n×1) = X_(n×k) β_(k×1) + u_(n×1), E(uu') = σ² I_n, जहाँ X_(n×k) (k < n) का एक आयुः है, के लिए E[Y'(I_n - X(X'X)⁻¹X')Y] का मान ज्ञात कीजिए । 10+10=20
(b) तीन फार्मों की एक कृत्रिम समष्टि पर विचार कीजिए । उनकी चयन प्रायिकताएँ और गेहूँ उत्पादन ('000 टन में) निम्न प्रकार हैं : फार्म इकाई (i) : 1 2 3; चयन प्रायिकता (pᵢ) : 0·3 0·2 0·5; गेहूँ उत्पादन (yᵢ) : 11 6 25। आकार 2 के सभी संभावित प्रतिदर्शों को प्रतिस्थापन सहित निकालिए (क्रम पर विचार किया जाना है) । दर्शाइए कि कुल गेहूँ उत्पादन का हॉर्विट्ज़-थॉम्पसन आकलक अनभिनत है । 15
(c) लुप्त खंड तकनीक क्या है ? किसी लैटिन वर्ग अभिकल्पना के लिए लुप्त मान सूत्र व्युत्पन्न कीजिए । ऐसी अभिकल्पना का विश्लेषण करने के लिए आप कैसे अग्रसर होंगे ? 15
Answer approach & key points
Begin with (a)(i) defining orthogonal polynomials with the orthogonality condition Σφᵢ(x)φⱼ(x)=0 for i≠j, then describe the recurrence relation method for fitting degree p. For (a)(ii), recognize the residual sum of squares form and apply E[u'Mu]=σ²tr(M) to obtain (n-k)σ². In (b), enumerate all 9 ordered samples with replacement, compute πᵢ=1-(1-pᵢ)² for inclusion probabilities, verify E[Ŷ_HT]=Y. For (c), derive the Latin Square missing value formula ŷ=(R+C+T-2G)/((t-1)(t-2)) and outline the adjusted ANOVA procedure. Allocate ~40% time to (a), ~30% each to (b) and (c).
- (a)(i) Definition: orthogonal polynomials satisfy Σφᵢ(x)φⱼ(x)=0 for i≠j over the point set; fitting uses recurrence φᵣ₊₁(x)=(x-aᵣ)φᵣ(x)-bᵣφᵣ₋₁(x) with specific coefficient formulas
- (a)(ii) Recognition that Iₙ-X(X'X)⁻¹X' is the residual maker matrix M; E[Y'MY]=E[u'Mu]=σ²tr(M)=σ²(n-k) using tr(Iₙ)=n and tr(X(X'X)⁻¹X')=k
- (b) Enumeration of 9 ordered samples: (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3) with their probabilities; calculation of first-order inclusion probabilities πᵢ=1-(1-pᵢ)²; verification that Σ(yᵢ/πᵢ)·πᵢ/Σ1 = Y
- (c) Missing plot technique: Yates' method for estimating missing observations by minimizing error sum of squares; derivation using ∂SSE/∂y=0 for Latin Square layout
- (c) Latin Square missing value formula: ŷ = (tRᵢ + tCⱼ + tTₖ - 2G) / [(t-1)(t-2)] where R,C,T are respective totals and G is grand total; analysis proceeds with reduced degrees of freedom and bias correction in treatment SS