Q5
(a) Explain the multicollinearity problem in a regression model. What are its consequences? State the different indicators of multicollinearity and explain. 10 marks (b) Establish the relationship among crude birthrate, general fertility rate and total fertility rate in the context of continuous data. Also, mention the properties of these fertility rates. 10 marks (c) What are the implications of using stable versus quasi-stable population assumption in demographic modelling? 10 marks (d) Discuss the problem of heteroscedasticity. Given that $Y_i = \alpha + \beta X_i + U_i$ with $E(U_i^2) = K^2X_i^2$, prove that OLS estimates of $\alpha$ and $\beta$ possess greater variance than OLS estimates of the transformed version of original model. 10 marks (e) What does it imply by validity of a test? Distinguish between the concepts of validity and reliability. 10 marks
हिंदी में प्रश्न पढ़ें
(a) एक समाश्रयण निदर्श में बहुसंरेखता समस्या की व्याख्या कीजिए। इसके नतीजे क्या हैं? बहुसंरेखता के विभिन्न संकेतकों को बताइए तथा उनकी व्याख्या कीजिए। 10 (b) संतत आँकड़ों के संदर्भ में अशोधित जनदर, सामान्य प्रजनन दर और कुल प्रजनन दर के बीच संबंध स्थापित कीजिए। इन प्रजनन दरों के गुणों का भी उल्लेख कीजिए। 10 (c) जनसांख्यिकीय मॉडलिंग में स्थिर बनाम अर्ध-स्थिर जनसंख्या अनुमान का उपयोग करने के तात्पर्य क्या हैं? 10 (d) विषम विचलितता (हेटेरोस्केडेस्टिसिटी) की समस्या की विवेचना कीजिए। दिया गया है कि $Y_i = \alpha + \beta X_i + U_i$ साथ में $E(U_i^2) = K^2X_i^2$, तो सिद्ध कीजिए कि $\alpha$ और $\beta$ के साधारण न्यूनतम वर्ग (ओ० एल० एस०) आकलकों के प्रसरण, मूल मॉडल के रूपांतरित संस्करण के साधारण न्यूनतम वर्ग आकलकों के प्रसरण से अधिक हैं। 10 (e) एक परीक्षण की वैधता से क्या अर्थ मिलता है? वैधता तथा विश्वसनीयता की अवधारणाओं के बीच का अंतर बताइए। 10
Directive word: Explain
This question asks you to explain. The directive word signals the depth of analysis expected, the structure of your answer, and the weight of evidence you must bring.
See our UPSC directive words guide for a full breakdown of how to respond to each command word.
How this answer will be evaluated
Approach
Begin with a brief introduction acknowledging that regression diagnostics and demographic measures are foundational to applied statistical analysis in Indian economic planning and population studies. Allocate approximately 20% time to each sub-part given equal 10-mark weighting: for (a) explain multicollinearity with consequences and indicators like VIF and condition index; for (b) derive the mathematical relationship CBR = GFR × (P_F/P) and connect to TFR via GFR = TFR × (1/m) where m is mean age of childbearing; for (c) contrast stable population (constant fertility/mortality, fixed age distribution) versus quasi-stable (gradually changing vital rates) with implications for Indian population projections; for (d) prove the variance inequality using weighted least squares transformation with weights 1/X_i; for (e) define validity (measuring what it claims) versus reliability (consistency) with psychometric examples. Conclude by synthesizing how diagnostic rigor ensures robust policy-relevant demographic modeling.
Key points expected
- For (a): Definition of multicollinearity as near-linear dependence among regressors; consequences including inflated variances, unstable coefficients, t-statistic deflation; indicators—VIF > 10, condition number > 30, high R² but insignificant t-ratios, correlation matrix examination
- For (b): Derivation showing CBR = GFR × (proportion of women in reproductive ages) = TFR × (1/m) × (P_F/P); properties—CBR is crude and age-structure dependent, GFR refines by restricting to women 15-49, TFR is age-standardized and period synthetic
- For (c): Stable population implies Lotka's equation with constant rates leading to fixed age distribution and exponential growth; quasi-stable allows slowly changing rates with nearly stable age distribution; implications for Indian Census projections, intercensal estimation, and momentum effects
- For (d): Heteroscedasticity as non-constant error variance; transformation to Y_i/X_i = α/X_i + β + U_i/X_i with homoscedastic errors; proof that Var(β̂_OLS) > Var(β̂_WLS) using Gauss-Markov theorem or direct variance comparison
- For (e): Validity as accuracy of measurement (content, criterion, construct validity); reliability as precision/repeatability (test-retest, internal consistency); distinction—validity concerns systematic error, reliability concerns random error; trade-offs in educational testing and NSSO survey instruments
Evaluation rubric
| Dimension | Weight | Max marks | Excellent | Average | Poor |
|---|---|---|---|---|---|
| Setup correctness | 20% | 10 | Correctly defines all core concepts across sub-parts: exact specification of multicollinearity, precise demographic rate definitions, accurate stable/quasi-stable distinction, proper heteroscedasticity setup with E(U_i²)=K²X_i², and valid-reliability conceptual clarity | Most definitions are correct but some imprecision exists; may conflate stable with stationary population, or miss the continuous data context in (b), or provide generic rather than mathematically precise heteroscedasticity specification | Fundamental definitional errors; treats multicollinearity as simple correlation, confuses CBR with GFR, misunderstands stable population as any equilibrium, or fails to specify the variance structure in (d) |
| Method choice | 20% | 10 | Appropriate methodological tools selected: VIF/tolerance for multicollinearity detection, continuous-time demographic integration for rate relationships, Lotka's stable population theory for (c), weighted least squares derivation for (d), and psychometric classification for (e) | Generally appropriate methods but missing sophistication; may list indicators without explaining computation, state discrete approximations instead of continuous derivations, or describe GLS without showing the specific transformation | Inappropriate or missing methods; suggests removing multicollinear variables without diagnosis, uses discrete summation for continuous rates, applies stable theory to rapidly changing populations, or attempts OLS without transformation for (d) |
| Computation accuracy | 20% | 10 | Mathematically rigorous derivations: explicit VIF formula VIF_j = 1/(1-R_j²), correct integral forms for fertility rates, proper variance derivation showing Var(β̂_WLS) = σ²/Σ(X_i-X̄_w)² versus inflated OLS variance | Correct general approach with minor computational slips; algebraic steps mostly sound but may omit constants, or present variance inequality without full derivation, or give discrete approximations where continuous required | Serious computational errors; incorrect variance formulas, wrong transformation weights, algebraic mistakes in deriving rate relationships, or purely verbal treatment where proof demanded |
| Interpretation | 20% | 10 | Insightful interpretation linking diagnostics to policy: multicollinearity's impact on NSSO consumption regressions, fertility rate selection for NFHS analysis, stable theory for Indian population momentum, heteroscedasticity in cross-sectional income data, validity-reliability trade-offs in census questionnaires | Standard textbook interpretations without application; explains consequences generically without Indian context, or states properties without explaining why they matter for demographic analysis | Missing or incorrect interpretation; fails to explain why multicollinearity matters, lists formulas without meaning, or provides definitions without distinguishing practical implications |
| Final answer & units | 20% | 10 | Clear, structured presentation with all five sub-parts distinctly addressed; precise final statements—multicollinearity indicators ranked by utility, fertility rate hierarchy established, stable/quasi-stable criteria specified, variance inequality proved with final expression, validity-reliability distinction tabulated | All parts attempted but some conclusions vague or incomplete; may prove inequality without final variance comparison, or describe indicators without prioritizing, or mix up final properties of different rates | Incomplete coverage with missing sub-parts; no clear conclusions, proof abandoned midway, or fundamental confusion in final statements about which rate measures what |
Practice this exact question
Write your answer, then get a detailed evaluation from our AI trained on UPSC's answer-writing standards. Free first evaluation — no signup needed to start.
Evaluate my answer →More from Statistics 2025 Paper II
- Q1 (a) State the significance of operating characteristic (OC) curves in control chart analysis. Obtain the general expression for the OC func…
- Q2 (a) (i) What are control charts by variables and control charts by attributes? 5 marks (ii) Derive the control limits for the construction…
- Q3 (a) A company manufactures 30 items per day. The sale of those items depends upon demand which has the following distribution : | Sale (uni…
- Q4 (a) Solve the game whose payoff matrix is $$ \begin{bmatrix} -1 & -2 & 8 \\ 7 & 5 & -1 \\ 6 & 0 & 12 \end{bmatrix} $$ (15 marks) (b) Use th…
- Q5 (a) Explain the multicollinearity problem in a regression model. What are its consequences? State the different indicators of multicollinea…
- Q6 (a) On the basis of the figures given below, calculate the age-specific death rates (ASDRs) for all the age groups. Also, calculate the cru…
- Q7 (a) Explain the concept of index number. Calculate the Fisher's ideal index number from the following data and verify that whether it satis…
- Q8 (a) Define time series. For a moving-average process with weights {a₁, a₂, ..., aₘ} of random components {eᵢ, i = 1, 2, ...}, where eᵢ's ar…