Q5 50M Compulsory solve Multivariate normal distribution and linear models
(a) (i) If **X** = (X₁ X₂ X₃)' is distributed as N₃ (μ, Σ), find the distribution of [(X₁ – X₂) (X₂ – X₃)]'. (5 marks)
(ii) Suppose that **X** = (X₁ X₂ X₃)' ~ N₃ (**0**, Σ), where Σ = $\begin{pmatrix} 1 & \rho & 0 \\ \rho & 1 & \rho \\ 0 & \rho & 1 \end{pmatrix}$. Is there a value of ρ for which (X₁ + X₂ + X₃) and (X₁ – X₂ – X₃) are independent ? (5 marks)
(b) Show that **X** = (X₁, X₂, ..., Xₚ)' has p-variate normal distribution if and only if every linear combination (l₁X₁ + l₂X₂ + ... + lₚXₚ) of **X** follows a univariate normal distribution. (10 marks)
(c) Let x₁, x₂, ..., xₙ be n given observations, and suppose that Yᵢ = β₀ + β₁xᵢ + eᵢ; i = 1, 2, ..., n, where β₀, β₁ are unknown parameters and eᵢ are mutually independent normal random variables with E(eᵢ) = 0 and V(eᵢ) = σ², i = 1, 2, ..., n. Also, σ² is assumed to be unknown. Test the null hypothesis H₀ : β₀ = β₁ = 0. (10 marks)
(d) Complete the following analysis of variance table of a design and examine whether there is a significant difference between the treatments at 5% level of significance:
| Source of Variation | Degrees of Freedom | Sum of Squares | Mean Sum of Squares | Variance Ratio |
|---------------------|-------------------|----------------|---------------------|----------------|
| Blocks | — | 21 | 4·2 | — |
| Treatments | — | — | 5·0 | — |
| Error | 15 | 12 | — | |
| Total | — | — | | |
Given that F_{·05}(3, 15) = 8·70, F_{·05}(5, 15) = 4·62 (10 marks)
(e) Define regression estimator used for the estimation of population mean. Obtain its bias and Mean Square Error (MSE) to the first order of approximation. (10 marks)
हिंदी में पढ़ें
(a) (i) यदि **X** = (X₁ X₂ X₃)' का बंटन N₃ (μ, Σ) है, तब [(X₁ – X₂) (X₂ – X₃)]' का बंटन ज्ञात कीजिए । (5 अंक)
(ii) माना कि **X** = (X₁ X₂ X₃)' ~ N₃ (**0**, Σ) है, जहाँ Σ = $\begin{pmatrix} 1 & \rho & 0 \\ \rho & 1 & \rho \\ 0 & \rho & 1 \end{pmatrix}$ है । क्या ρ का ऐसा कोई मान है जिसके लिए (X₁ + X₂ + X₃) एवं (X₁ – X₂ – X₃) स्वतंत्र हैं ? (5 अंक)
(b) दिखाइए कि **X** = (X₁, X₂, ..., Xₚ)' का बंटन p-चरिय प्रसामान्य बंटन है, यदि और केवल यदि **X** के प्रत्येक रैखीय युग्म (l₁X₁ + l₂X₂ + ... + lₚXₚ) का बंटन एकचरिय (एकविचर) प्रसामान्य बंटन है । (10 अंक)
(c) माना x₁, x₂, ..., xₙ दिए हुए n प्रेक्षण हैं तथा Yᵢ = β₀ + β₁xᵢ + eᵢ; i = 1, 2, ..., n, जहाँ β₀, β₁ अज्ञात प्राचल हैं तथा सभी eᵢ E(eᵢ) = 0 एवं V(eᵢ) = σ², i = 1, 2, ..., n के साथ परस्पर स्वतंत्र प्रसामान्य यादृच्छिक चर हैं । σ² को अज्ञात माना गया है । निराकरणीय परिकल्पना H₀ : β₀ = β₁ = 0 का परीक्षण कीजिए । (10 अंक)
(d) एक अभिकल्पना की निम्नलिखित प्रसरण विल्लेखन सारणी को पूर्ण कीजिए एवं 5% सार्थकता स्तर पर बताइए कि क्या व्यवहारों के मध्य सार्थक अंतर है :
| विचरण स्रोत | स्वतंत्र कोटि | वर्गों का योग | माध्य वर्गों का योग | प्रसरण अनुपात |
|------------|-------------|-------------|------------------|-------------|
| खंड | — | 21 | 4·2 | — |
| व्यवहार | — | — | 5·0 | — |
| त्रुटि | 15 | 12 | — | |
| योग | — | — | | |
दिया गया है F_{·05}(3, 15) = 8·70, F_{·05}(5, 15) = 4·62 (10 अंक)
(e) समष्टि माध्य के आकलन के लिए प्रयुक्त समाश्रयण आकलक को परिभाषित कीजिए । इसकी अभिनति (बायस) एवं माध्य वर्ग त्रुटि (एम.एस.ई.) को प्रथम सन्निकटन क्रम तक प्राप्त कीजिए । (10 अंक)
Answer approach & key points
Solve this multi-part numerical problem by allocating time proportionally to marks: spend ~20% on (a)(i)-(ii) combined, ~20% on (b), ~20% on (c), ~20% on (d), and ~20% on (e). Begin each sub-part by stating the relevant theorem or formula, show complete derivation/calculation steps, and conclude with precise final answers. For (d), complete the ANOVA table systematically before hypothesis testing. For (e), clearly define the estimator before deriving bias and MSE.
- (a)(i) Apply linear transformation theorem: if Y = AX, then Y ~ N₂(Aμ, AΣA') with correct matrix A = [[1,-1,0],[0,1,-1]]
- (a)(ii) Use independence condition Cov(X₁+X₂+X₃, X₁-X₂-X₃) = 0; solve for ρ = -1/2 and verify validity
- (b) Prove both directions: (⇒) by definition of MVN, (⇒) using characteristic functions or Cramér-Wold theorem
- (c) Set up F-test for H₀: β₀=β₁=0 using extra sum of squares; compute F = [(SSR/2)]/[SSE/(n-2)] with correct df
- (d) Complete ANOVA table: Blocks df=5, Treatments df=3, Total df=23, Total SS=33, Error MS=0.8; compute F_Treatments=6.25 and compare with critical value
- (e) Define regression estimator Ŷ_reg = ȳ + b(X̄ - x̄); derive bias ≈ 0 and MSE ≈ (1-f)S²_y(1-ρ²)/n to first order
Q6 50M solve Multivariate analysis and principal components
(a) Let **X** = (X₁ X₂ X₃)' be distributed as N₃ (μ, Σ), where μ = (2 −1 3)' and Σ = $\begin{pmatrix} 4 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 3 \end{pmatrix}$. Find (i) the conditional distribution of (X₁ X₂)' given X₃ = 2. (ii) partial correlation coefficient ρ₁₂.₃ and multiple correlation coefficient R₁.₂₃ (8+7 marks)
(b) (i) Describe the complete analysis of two-way classified data with multiple (but equal) observations per cell, clearly stating the assumptions used. Also state two examples where such type of analysis is used. (ii) Let three mutually independent variables Y₁, Y₂ and Y₃ having common variance σ² and E(Y₁) = β₁ + β₂, E(Y₂) = β₁ + β₃, E(Y₃) = β₁ + β₂ be given. Show that the linear parametric function p₁β₁ + p₂β₂ + p₃β₃ is estimable if and only if p₁ = p₂ + p₃, clearly stating the assumptions used, if any. (5 marks)
(c) (i) State briefly three reasons why an analyst may wish to perform a principal component analysis. (6 marks) (ii) Define canonical correlations and give two examples of their application. Describe the procedure of working out canonical correlations and canonical variates. (9 marks)
हिंदी में पढ़ें
(a) माना **X** = (X₁ X₂ X₃)' का बंटन N₃ (μ, Σ) है, जहाँ μ = (2 −1 3)' एवं Σ = $\begin{pmatrix} 4 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 3 \end{pmatrix}$। ज्ञात कीजिए (i) (X₁ X₂)' का प्रतिबंधित बंटन जबकि X₃ = 2 दिया है । (ii) आंशिक सहसंबंध गुणांक ρ₁₂.₃ एवं बहु सहसंबंध गुणांक R₁.₂₃ (8+7 अंक)
(b) (i) प्रति कोष्ठ संख्या में बराबर बहु आंकड़े (आब्जर्वेशन्स) रखने वाले द्वि-विध (टू-वे) वर्गीकृत आंकड़ों के सम्पूर्ण विश्लेषण का विवरण, उपयोग में ली गई मान्यताओं का स्पष्ट उल्लेख करते हुए दीजिए । ऐसे दो उदाहरण भी दीजिए जहाँ इस प्रकार के विश्लेषण का उपयोग होता है । (ii) माना कि तीन परस्पर स्वतंत्र चर Y₁, Y₂ और Y₃ जिनका प्रसरण σ² समान है तथा E(Y₁) = β₁ + β₂, E(Y₂) = β₁ + β₃, E(Y₃) = β₁ + β₂ दिए गए हैं । दिखाइए कि रैखीय प्राचलिक फलन p₁β₁ + p₂β₂ + p₃β₃ प्राकलिक है, यदि एवं केवल यदि p₁ = p₂ + p₃ है । साथ ही यदि कोई मान्यताएं प्रयुक्त होती हैं, तो उनका भी स्पष्ट उल्लेख कीजिए । (5 अंक)
(c) (i) संक्षिप्त में तीन कारण लिखिए जिनके कारण विश्लेषक प्रमुख घटक विश्लेषण का प्रयोग करने की इच्छा कर सकता है। (6 अंक) (ii) विहित सहसंबंधों को परिभाषित कीजिए, तथा इनके अनुप्रयोग के दो उदाहरण दीजिए। विहित सहसंबंधों एवं विहित चरों को ज्ञात करने की विधि का वर्णन कीजिए। (9 अंक)
Answer approach & key points
Solve this multi-part numerical and theoretical question by allocating approximately 35% time to part (a) due to its 15 marks and computational complexity, 25% to part (b) covering ANOVA and estimability, and 40% to part (c) on PCA and canonical correlations. Begin with clear problem identification for each sub-part, show all computational steps with matrix operations for (a), present structured ANOVA decomposition for (b)(i) and rigorous linear algebra proof for (b)(ii), and provide conceptual clarity with real-world Indian examples for (c). Conclude each part with precise final answers and interpretations.
- Part (a)(i): Correctly partition Σ into Σ₁₁, Σ₁₂, Σ₂₁, Σ₂₂ and apply conditional distribution formula N₂(μ₁ + Σ₁₂Σ₂₂⁻¹(x₃-μ₃), Σ₁₁ - Σ₁₂Σ₂₂⁻¹Σ₂₁) with x₃=2
- Part (a)(ii): Compute partial correlation ρ₁₂.₃ = (σ₁₂ - σ₁₃σ₂₃/σ₃₃)/√[(σ₁₁-σ₁₃²/σ₃₃)(σ₂₂-σ₂₃²/σ₃₃)] and multiple correlation R₁.₂₃ = √[σ₁'Σ₂₂⁻¹σ₁/σ₁₁] where σ₁' = (σ₁₂, σ₁₃)
- Part (b)(i): Describe two-way ANOVA with replication: model yᵢⱼₖ = μ + αᵢ + βⱼ + (αβ)ᵢⱼ + εᵢⱼₖ, assumptions (normality, homoscedasticity, independence), ANOVA table with SS_T, SS_A, SS_B, SS_AB, SS_E, and examples like agricultural field trials (ICRISAT crop studies) or industrial quality control
- Part (b)(ii): Set up design matrix X, show rank deficiency, derive condition for estimability via Cβ where C = (p₁,p₂,p₃), prove p₁ = p₂ + p₃ using linear independence of rows and estimability condition C = LX for some L
- Part (c)(i): Three reasons for PCA: dimensionality reduction (e.g., reducing NSSO household survey variables), multicollinearity remediation in regression, and data visualization/pattern detection in large datasets
- Part (c)(ii): Define canonical correlations as correlations between linear combinations u=a'X and v=b'Y maximizing correlation; examples: relationship between economic indicators and social development indices, or agricultural inputs vs outputs; describe eigenvalue solution of Σ₁₁⁻¹Σ₁₂Σ₂₂⁻¹Σ₂₁ and extraction of canonical variates
Q7 50M discuss Sampling methods and stratified random sampling
(a) Discuss the difference between sampling for variables and sampling for attributes with examples. For a qualitative characteristic, find an unbiased estimator of population proportion along with its variance when sample is drawn by simple random sampling without replacement. Also obtain an unbiased estimator of this variance. 20
(b) The table given below gives the population and sample sizes, stratum means and variance of a stratified random sample of size 50. Symbols used have their usual meanings.
| Stratum Number | Nᵢ | nᵢ | ȳᵢ | sᵢ² |
|---|---|---|---|---|
| 1 | 30 | 5 | 35 | 36 |
| 2 | 50 | 10 | 40 | 49 |
| 3 | 60 | 15 | 40 | 81 |
| 4 | 60 | 20 | 55 | 144 |
Verify that the existing allocation is optimum for given 4 strata. Also calculate the estimate of population variance under this allocation. 15
(c) Differentiate between Simple Random Sampling and Probability Proportional to Size Sampling. How will you draw a PPS sample of size n from a population of size N (n < N) by (i) Cumulative Total Method and (ii) Lahri's Method ? Explain. 15
हिंदी में पढ़ें
(a) चरों के प्रतिचयन एवं गुणात्मक चरों के प्रतिचयन में अंतर का उदाहरणों सहित वर्णन कीजिए। एक गुणात्मक अभिलक्षण के लिए समष्टि अनुपात का अनभिनत आकलक तथा इस आकलक का प्रसरण ज्ञात कीजिए जबकि प्रतिचयन प्रतिस्थापन रहित सरल यादृच्छिक विधि द्वारा किया गया है। इस प्रसरण का अनभिनत आकलक भी निकालिए। 20
(b) नीचे दी गई सारणी में 50 आकार के स्तरीकृत यादृच्छिक प्रतिदर्श के स्तरों का माध्य एवं प्रसरण तथा स्तरों की समष्टि का आकार तथा स्तरों से चयनित प्रतिदर्शी आकारों को दिया गया है। चिह्नों को उनके सामान्य अर्थों में प्रयुक्त किया गया है।
| स्तर संख्या | Nᵢ | nᵢ | ȳᵢ | sᵢ² |
|:---:|:---:|:---:|:---:|:---:|
| 1 | 30 | 5 | 35 | 36 |
| 2 | 50 | 10 | 40 | 49 |
| 3 | 60 | 15 | 40 | 81 |
| 4 | 60 | 20 | 55 | 144 |
प्रमाणित कीजिए कि दिए गए 4 स्तरों के लिए मौजूदा आवंटन इष्टतम है। समष्टि प्रसरण का इस आवंटन के सापेक्ष आकलक भी ज्ञात कीजिए। 15
(c) सरल यादृच्छिक प्रतिचयन तथा आकार अनुपातिक प्रायिकता प्रतिचयन में विभेद कीजिए । एक n आकार के आकार अनुपातिक प्रायिकता प्रतिदर्श को आप (i) संचयी योग विधि तथा (ii) लाहिरी विधि द्वारा N (n < N) आकार की समष्टि से कैसे चुनेंगे ? स्पष्ट कीजिए । 15
Answer approach & key points
Begin with a clear conceptual distinction in part (a) between variables (quantitative) and attributes (qualitative) with Indian examples like agricultural yield vs literacy status. Derive the unbiased estimator p̂ = n'/n for population proportion and its variance V(p̂) = (N-n)/(N-1) · p(1-p)/n, then obtain unbiased estimator v(p̂). For part (b), verify Neyman optimum allocation by checking if nᵢ ∝ NᵢSᵢ/√cᵢ (assuming equal costs), then compute V(ȳ_st). For part (c), contrast SRS with PPS on selection probability basis, then detail both Cumulative Total and Lahri's methods with numerical illustration. Allocate approximately 40% time to part (a), 30% each to (b) and (c) based on marks distribution.
- Part (a): Clear distinction between sampling for variables (measurable quantities like income, yield) vs attributes (dichotomous characteristics like employment status, disease presence) with appropriate Indian examples
- Part (a): Derivation of unbiased estimator p̂ = n'/n for population proportion P, its variance V(p̂) = (N-n)/(N-1) · P(1-P)/n under SRSWOR, and unbiased estimator of variance v(p̂) = (N-n)/(N-1) · p̂(1-p̂)/(n-1)
- Part (b): Verification of Neyman optimum allocation condition nᵢ/n = NᵢSᵢ/ΣNⱼSⱼ using given data; calculation showing existing allocation matches or approximates this ratio
- Part (b): Computation of stratified mean estimate ȳ_st = ΣWᵢȳᵢ where Wᵢ = Nᵢ/N, and population variance estimate V(ȳ_st) = ΣWᵢ²(Nᵢ-nᵢ)/(Nᵢnᵢ) · sᵢ²
- Part (c): Systematic comparison of SRS (equal probability) vs PPS (probability ∝ size) on grounds of efficiency, especially for skewed populations like industrial output or agricultural holdings
- Part (c): Step-wise description of Cumulative Total Method: list cumulative totals, select random numbers between 1 and ΣXᵢ, identify selected units
- Part (c): Step-wise description of Lahri's Method: select random number i from 1 to N and random number j from 1 to M (M=max size), accept if j ≤ Xᵢ, else reject and repeat
Q8 50M differentiate Experimental design and statistical models
(a) Differentiate between randomised block design and balanced incomplete block design. In usual notations, for a balanced incomplete block design, prove that (i) bk = vr (ii) λ(v – 1) = r(k – 1) and (iii) b ≥ v. 20
(b) Explain the concept of confounding in design of experiment. In an experiment with three factors A, B and C, each at two levels, three replicates are divided in two blocks, each of four units. How will you confound ABC in the first, AC in the second and BC in the third replication ? 15
(c) Differentiate among fixed, random and mixed effect models with examples. How are the three basic principles of design fulfilled in randomised block design ? Explain. 15
हिंदी में पढ़ें
(a) यादृच्छिक खंड अभिकल्पना तथा संतुलित अपूर्ण खंडक अभिकल्पना में अंतर बताइए । सामान्य प्रयुक्त संकेताक्षरों में सिद्ध कीजिए कि संतुलित अपूर्ण खंडक अभिकल्पना में (i) bk = vr (ii) λ(v – 1) = r(k – 1) तथा (iii) b ≥ v. 20
(b) प्रयोगात्मक अभिकल्पना में संकरण के सिद्धांत की व्याख्या कीजिए । किसी प्रयोग में जिसमें तीन उपादान A, B तथा C जिनमें प्रत्येक दो स्तरों पर हैं, तीन पुनरावृत्त चार इकाइयों के दो खंडों में विभाजित हैं । आप ABC को पहले, AC को दूसरे तथा BC को तीसरे पुनरावृत्त में किस प्रकार संकीर्ण करेंगे ? 15
(c) नियत, यादृच्छिक एवं मिश्रित प्रभाव मॉडलों में उदाहरणों सहित विभेद कीजिए । यादृच्छिक खण्डक अभिकल्पना में अभिकल्पना के तीन मूलभूत सिद्धान्तों का समावेश कैसे होता है ? स्पष्ट कीजिए । 15
Answer approach & key points
Begin with a structured comparison of RBD vs BIBD in part (a), then rigorously prove all three BIBD parameters using standard notation with clear algebraic steps. For part (b), first define confounding with factorial design context, then explicitly construct the three replication schemes showing which treatment combinations go to which block. For part (c), use tabular comparison for model types with agricultural/industrial examples, then explain how RBD satisfies randomization, replication, and local control. Allocate approximately 40% time to part (a) given its 20 marks and proof demands, 30% each to parts (b) and (c).
- Part (a): Clear distinction between RBD (complete blocks, all treatments per block) and BIBD (incomplete blocks, not all treatments appear in each block) with structural conditions
- Part (a): Correct proofs of bk = vr, λ(v–1) = r(k–1), and b ≥ v using incidence matrix properties or combinatorial counting with λ defined as pairwise concurrence
- Part (b): Accurate definition of confounding as sacrificing higher-order interaction information to achieve block homogeneity, with distinction between complete and partial confounding
- Part (b): Correct construction of three replications: Rep I confounds ABC (assign +++ and +–– to Block 1, ++– and +–+ to Block 2, etc.), Rep II confounds AC, Rep III confounds BC using Yates notation
- Part (c): Precise differentiation of fixed (levels specifically chosen, inference only to those levels), random (levels random sample from population, variance component estimation), and mixed models with appropriate examples like crop varieties vs fertilizer doses
- Part (c): Explanation of how RBD achieves randomization (random allocation within blocks), replication (multiple blocks), and local control (homogeneous blocks reducing experimental error)