Q5 50M Compulsory explain Psychometrics, population growth, epidemiological rates
(c) What do you mean by reliability and validity of tests ? What is the difference between reliability and validity of a test ? If the reliability of a test is raised from 0·80 to 0·90 by lengthening the test, a validity coefficient of 0·60 for this test would be expected to increase to what value ? 10 marks
(d) The rate of increase of a population at time t is r(t) = 0·01 + 0·0001 t². If the population totals 1,000,000 at time t = 0, what is the population at t = 30 ? 10 marks
(e) Suggest which of the two measures : Morbidity Incidence rate (MIR) and Morbidity Prevalence rate (MPR) should be used to decide on the amount of medicine to be sent to a Malaria affected area. Cite an example where the other rate can be useful. 10 marks
हिंदी में पढ़ें
(c) परीक्षणों की विश्वसनीयता और वैधता से आप क्या समझते हैं ? एक परीक्षण की विश्वसनीयता और वैधता के बीच क्या अंतर है ? यदि एक परीक्षण की विश्वसनीयता को, परीक्षण को लंबा करके, 0·80 से 0·90 तक बढ़ाया गया, तो इस परीक्षण के लिए वैधता गुणांक 0·60 से बढ़कर कितने मान तक जाना प्रत्याशित होगा ? 10
(d) जनसंख्या वृद्धि दर समय t पर है r(t) = 0·01 + 0·0001 t² यदि समय t = 0 पर कुल जनसंख्या 1,000,000 है, तो समय t = 30 पर जनसंख्या क्या होगी ? 10
(e) दोनों में से कौन-सा उपाय सुझाएं : रुग्णता घटना दर (एम.आई.आर.) और रुग्णता व्यापकता दर (एम.पी.आर.), दवा की मात्रा तय करने के लिए, इसका उपयोग किया जाना चाहिए जो एक मलेरिया प्रभावित क्षेत्र में भेजी जाएगी । एक उदाहरण उद्धृत करें जहाँ अन्य दर उपयोगी हो सकता है । 10
Answer approach & key points
This question requires explaining three distinct statistical concepts across psychometrics, demography, and epidemiology. Allocate approximately 35% time to part (c) covering reliability, validity definitions, their distinction, and the Spearman-Brown prophecy formula application; 35% to part (d) setting up and solving the differential equation for population growth; and 30% to part (e) comparing MIR and MPR with practical Indian public health examples. Begin with clear conceptual definitions, proceed to mathematical derivations where required, and conclude with contextual interpretations.
- Part (c): Define reliability (consistency/stability of test scores) and validity (extent test measures what it claims to measure); distinguish reliability as necessary but not sufficient for validity; apply Spearman-Brown prophecy formula to calculate new validity coefficient ≈ 0.67
- Part (c): Correctly identify that validity coefficient increases proportionally to square root of reliability ratio: r_new = r_old × √(0.90/0.80) = 0.60 × 1.0607 ≈ 0.636 or 0.64
- Part (d): Set up differential equation dP/dt = P×r(t) = P(0.01 + 0.0001t²); integrate ln(P) = ∫(0.01 + 0.0001t²)dt = 0.01t + 0.0001t³/3 + C
- Part (d): Apply initial condition P(0) = 1,000,000 to find C = ln(10⁶); compute P(30) = 10⁶ × exp[0.01(30) + 0.0001(27000)/3] = 10⁶ × e^1.2 ≈ 3,320,117
- Part (e): Recommend MIR (incidence rate) for medicine allocation as it measures new cases over time, directly indicating current disease burden and transmission dynamics requiring immediate intervention
- Part (e): Cite MPR usefulness for chronic disease planning like diabetes or hypertension prevalence studies in India where total existing cases matter for long-term healthcare infrastructure and resource allocation
Q6 50M explain Time series analysis, least squares, heteroscedasticity
(a) Explain the principle of least squares. How it is used in fitting trend in time series analysis ? Explain the fitting of trend for the curve $y=ab^tc^{t^2}$. 15 marks
(b) Define stationary time series. How would you test the stationarity of the given time series ? Write the importance of stationary time series. Check the following time series for stationarity.
(i) $Y_t = Y_{t-1} + U_t$
(ii) $Y_t = \delta + Y_{t-1} + U_t$
(iii) $Y_t = \delta Y_{t-1} + U_t$ ; $-1 \leq \delta \leq 1$ 15 marks
(c) State the different methods of detecting the presence of heteroscedasticity. Explain in brief the Goldfeld-Quandt Test for detecting the presence of heteroscedasticity. Also write the assumption required to apply this test.
For a data on consumption expenditure in relation to income for a cross section of 30 families, after dropping the middle 4 observations, the OLS regression based on the first 13 and the last 13 observations and their associated residual sum of squares are as follows :
Regression based on the first 13 observations :
$\hat{Y}_i = 3.4094 + 0.6968 X_i$
$(r^2 = 0.8887, RSS_1 = 377.17, df = 11)$
Regression based on the last 13 observations :
$\hat{Y}_i = -28.0272 + 0.7941 X_i$
$(r^2 = 0.7681, RSS_2 = 1536.8, df = 11)$
Check the presence of heteroscedasticity for the above given results and write your conclusion.
$(F_{(11, 11, 5\%)} = 2.82, F_{(11, 11, 1\%)} = 4.46, F_{(13, 13, 5\%)} = 2.53, F_{(13, 13, 1\%)} = 3.82)$ 20 marks
हिंदी में पढ़ें
(a) न्यूनतम वर्ग के सिद्धांत को समझाइये । काल श्रेणी विश्लेषण में इसका उपयोग प्रवृत्ति समंजन में कैसे किया जाता है ? वक्र $y=ab^tc^{t^2}$ के लिए प्रवृत्ति के समंजन को समझाइए । 15
(b) अनुपन्न काल श्रेणी को परिभाषित कीजिए । एक दी हुई काल श्रेणी की स्थावरता की जाँच (परीक्षण) कैसे करेंगे ? अनुपन्न काल श्रेणी के महत्व को लिखिए । निम्नलिखित काल श्रेणियों की स्थावरता की जाँच कीजिए ।
(i) $Y_t = Y_{t-1} + U_t$
(ii) $Y_t = \delta + Y_{t-1} + U_t$
(iii) $Y_t = \delta Y_{t-1} + U_t$ ; $-1 \leq \delta \leq 1$ 15
(c) विषम विचलितता (हैट्रोसिडास्टिसिटी) की उपस्थिति का पता लगाने की विभिन्न विधियों को बताइए । विषम विचलितता की उपस्थिति पता लगाने के लिए गोल्डफेल्ड-क्वांड्ट (Goldfeld-Quandt) के परीक्षण को संक्षेप में समझाइए । इस परीक्षण को लागू करने के लिए आवश्यक अभिधारणा भी लिखें ।
उपभोग व्यय पर डेटा के लिए, जो 30 परिवारों के क्रॉस-सेक्शन की आय से संबंधित है, बीच में 4 अवलोकनों को हटाने के बाद, प्रथम 13 प्रेक्षणों और अंतिम 13 प्रेक्षणों के आधार पर साधारण न्यूनतम वर्ग (ओ.एल.एस.) समाश्रयण और उनके संबद्ध वर्गों का अवशिष्ट योग निम्नांकित है :
पहले 13 प्रेक्षणों के आधार पर समाश्रयण :
$\hat{Y}_i = 3.4094 + 0.6968 X_i$
$(r^2 = 0.8887, RSS_1 = 377.17, df = \text{स्वतंत्रकोटि} = 11)$
पिछले (या बाद के) 13 प्रेक्षणों के आधार पर समाश्रयण :
$\hat{Y}_i = -28.0272 + 0.7941 X_i$
$(r^2 = 0.7681, RSS_2 = 1536.8, \text{स्वतंत्रकोटि (df)} = 11)$
उपरोक्त दिये गये परिणामों के लिए विषम विचलितता की उपस्थिति की जाँच करें और अपना निष्कर्ष लिखें ।
$(F_{(11, 11, 5\%)} = 2.82, F_{(11, 11, 1\%)} = 4.46, F_{(13, 13, 5\%)} = 2.53, F_{(13, 13, 1\%)} = 3.82)$ 20
Answer approach & key points
Explain the theoretical foundations first, then demonstrate computational application. Allocate ~30% time to part (a) on least squares and trend fitting, ~30% to part (b) on stationarity concepts and testing the three given models, and ~40% to part (c) on heteroscedasticity detection with complete Goldfeld-Quandt test execution. Structure as: theoretical exposition → mathematical derivation → numerical computation → statistical inference.
- Part (a): Principle of least squares (minimizing sum of squared residuals), its application in linear and non-linear trend fitting, and complete working for y=ab^tc^{t^2} using logarithmic transformation to linear form
- Part (b): Formal definition of weak/strong stationarity (constant mean, variance, autocovariance), Dickey-Fuller or graphical methods for testing, importance for valid inference, and classification of (i) random walk (non-stationary), (ii) random walk with drift (non-stationary), (iii) AR(1) process (stationary when |δ|<1)
- Part (c): Listing detection methods (graphical, Park test, Glejser test, White test, Goldfeld-Quandt test), complete Goldfeld-Quandt procedure with assumptions (normality, homoscedasticity under null, increasing/decreasing variance pattern)
- Correct computation of F-statistic = RSS2/RSS1 = 1536.8/377.17 = 4.075 with proper degrees of freedom (11,11)
- Proper hypothesis testing conclusion: F_calculated (4.075) > F_critical at 5% (2.82), reject null, heteroscedasticity present; also note significance at 1% level since 4.075 < 4.46 is false—actually 4.075 < 4.46, so not significant at 1%
- Recognition that RSS2 > RSS1 indicates increasing variance with income, confirming heteroscedasticity in consumption expenditure data
Q7 50M derive Life table functions and standard scores
7.(a) Derive, by starting from a suitable functional form for $l_x$, the formula
(i) $L_x = \dfrac{l_x + l_{x+1}}{2}$ and (ii) $L_x = \dfrac{l_x - l_{x+1}}{(\log l_x - \log l_{x+1})} = -\dfrac{d_x}{\log p_x}$
(iii) $e_x^0 = \dfrac{1}{2} + \sum\limits_{i=1}^{\infty} \dfrac{i d_{x+i}}{l_x}$
where
$l_x$ = members of the cohort alive at age $x$
$L_x$ = number of years lived, in the aggregate, by the cohort of $l_0$ persons between age $x$ and $(x+1)$
$d_x$ = number of persons dying between age $x$ and $(x+1)$
$= l_x - l_{x+1}$
$p_x$ = probability that a person of age $x$ will survive till age $(x+1)$
$e_x^0$ = expectation of life at age $x$
7.(b) (i) 400 students are given a test. The average is 60 and the standard deviation is 12. Obtain the Z-score and the standard scores equivalent to raw scores. The raw scores are given by
| Raw scores | 84 | 78 | 72 | 66 | 60 | 54 | 48 | 42 | 36 |
(ii) Convert the ten scores 1, 2, ..., 10 into standard scores with mean 50 and standard deviation 10.
7.(c) On the life table with $l_x = \dfrac{100-x}{190}$, $5 \leq x \leq 100$,
Find
(i) the chance that a child who has reached age 5 will live to age 60.
(ii) the chance that a man of age 30 will live until age 80.
(iii) the probability of dying within 5 years for a man aged 40.
(iv) the expectation of life at age 40.
(v) the chance that of the three men aged 30 at least one survives till age 80.
हिंदी में पढ़ें
7.(a) $l_x$ के लिए एक उपयुक्त फलनिक रूप से शुरू करके निम्नलिखित सूत्र को व्युत्पन्न कीजिए
(i) $L_x = (l_x + l_{x+1})/2$ और (ii) $L_x = (l_x – l_{x+1})/(\log l_x – \log l_{x+1}) = – dx/\log p_x$
(iii) $e°_x = 1/2 + \sum_{i=1}^\infty (i d_{x+i})/l_x$
जहाँ
$l_x$ = जत्था (कोहोर्ट) के सदस्य जो आयु $x$ तक जीवित हैं
$L_x$ = जितने वर्ष जीवित रहे, सकल में $l_0$ व्यक्तियों के जत्थों द्वारा आयु $x$ और आयु $(x+1)$ के बीच
$d_x$ = व्यक्तियों की संख्या जिनकी मृत्यु आयु $x$ और $(x+1)$ के बीच में होती है
$= l_x - l_{x+1}$
$p_x$ = आयु $x$ के एक व्यक्ति के आयु $(x+1)$ तक जीवित रहने की प्रायिकता है
$e_x^0$ = आयु $x$ पर जीवन की प्रत्याशा
7.(b) (i) 400 विद्यार्थियों ने एक परीक्षा दी है। औसत 60 है और मानक विचलन 12 है। Z-समंक और मानक समंकों को प्राप्त कीजिए जो कि यथाप्रास समंकों के तुल्य हैं। यथाप्रास समंक नीचे दिये गये हैं।
| यथाप्रास समंक | 84 | 78 | 72 | 66 | 60 | 54 | 48 | 42 | 36 |
(ii) दस समंकों 1, 2, ..., 10 को मानक समंकों में बदलो जिनका माध्य 50 और मानक विचलन 10 है।
7.(c) वय-सारणी में $l_x = \dfrac{100-x}{190}$ के साथ, $5 \leq x \leq 100$,
ज्ञात कीजिए
(i) प्रायिकता कि एक बच्चा जो आयु 5 पर पहुँच गया है, वह आयु 60 तक जीवित रहेगा।
(ii) प्रायिकता कि एक व्यक्ति जिसकी आयु 30 वर्ष है वह आयु 80 तक जीवित रहेगा।
(iii) प्रायिकता कि एक व्यक्ति जिसकी आयु 40 वर्ष है, वह 5 वर्ष के अन्दर मर जायेगा।
(iv) आयु 40 पर जीवन की प्रत्याशा।
(v) प्रायिकता कि 30 वर्ष की आयु वाले तीन व्यक्तियों में से कमसे कम एक आयु 80 तक जीवित रहे।
Answer approach & key points
Begin with clear statement of assumptions for each derivation in 7(a), showing step-by-step integration for L_x formulas and summation manipulation for e_x^0. For 7(b), apply Z-score formula z = (X-μ)/σ systematically, then demonstrate linear transformation for standard scores with mean 50, SD 10. For 7(c), substitute given l_x = (100-x)/190 into survival probabilities, death probabilities, and life expectancy formulas, computing each numerical value with proper fraction handling. Allocate approximately 35% time to derivations in (a), 25% to standard score calculations in (b), and 40% to life table computations in (c) given its five sub-parts.
- 7(a)(i): Assume l_x linear in [x, x+1], integrate l(t)dt from x to x+1 to obtain (l_x + l_{x+1})/2
- 7(a)(ii): Assume l_x exponential (constant force of mortality), use l_t = l_x·e^{-μt} and integrate to derive harmonic mean form -d_x/log(p_x)
- 7(a)(iii): Express T_x = Σ L_{x+i} using linear assumption, substitute L_x = (l_x + l_{x+1})/2, rearrange to obtain e_x^0 = 1/2 + Σ i·d_{x+i}/l_x
- 7(b): Calculate Z-scores as (X-60)/12 for each raw score, then apply linear transformation 50 + 10·z for standard scores; for 1-10 scores, first find mean=5.5, SD=√8.25, then transform
- 7(c): Compute survival probabilities as l_60/l_5, l_80/l_30; death probability as 1-l_45/l_40; e_40^0 = T_40/l_40 with T_40 = Σ L_{40+i}; binomial probability for at least one survivor among three men
Q8 50M explain Agricultural statistics, fertility rates, and econometric model identification
8.(a) Explain the method of collection of agriculture data. Describe the
(i) official publications for data collection and
(ii) statistics collected by the various official agencies pertaining to agriculture
production.
8.(b) Distinguish between GFR and TFR. What is meant by TFR = 3.29 ? Discuss the
merits and demerits of TFR. Construct the relationship between GRR and TFR.
Interpret GRR when GRR >1, <1 or =1.
8.(c) State the order and rank conditions to check the identifiability of the given system of simultaneous equations.
Consider the following extended Keynesian model of income determination :
Consumption function : $C_t = \beta_1 + \beta_2 Y_t - \beta_3 T_t + U_{1t}$
Investment function : $I_t = \alpha_0 + \alpha_1 Y_{t-1} + U_{2t}$
Taxation function : $T_t = \gamma_0 + \gamma_1 Y_t + U_{3t}$
Income Identity : $Y_t = C_t + I_t + G_t$
Where
C = Consumption expenditure
Y = Income
I = Investment
T = Taxes
G = Government expenditure and
U's = the disturbance terms.
In the model the endogenous variables are C, I, T and Y and the predetermined variables are G and $Y_{t-1}$.
By applying the order condition, check the identifiability of each of the equations in the system and of the system as a whole. Write your conclusion.
हिंदी में पढ़ें
8.(a) कृषि आँकड़ों के संग्रह की विधि की व्याख्या कीजिए।
(i) आँकड़ों के संग्रह के लिए राजकीय प्रकाशनों का वर्णन कीजिए।
(ii) कृषि उत्पादन से संबंधित विभिन्न राजकीय एजेंसियों द्वारा एकत्र किये गये आँकड़ों का वर्णन कीजिए।
8.(b) सामान्य प्रजनन दर (जी.एफ.आर.) और संपूर्ण प्रजनन दर (टी.एफ.आर.) में अंतर स्पष्ट करें ।
टी.एफ.आर. = 3.29 से क्या समझते हैं ? संपूर्ण प्रजनन दर (टी.एफ.आर.) के गुणों और अवगुणों
के बारे में समझाइए । सकल जनन दर (जी.आर.आर.) और संपूर्ण प्रजनन दर (टी.एफ.आर.) के
बीच में संबंध निकालें ।
जी.आर.आर.(GRR) की व्याख्या कीजिए जब GRR(जी.आर.आर.) >1, <1 या =1 हो ।
8.(c) दी गई युगपत् समीकरणों की प्रणाली की अभिज्ञेयता की जाँच करने के लिए क्रम और कोटि
प्रतिबंधों को बताइए ।
निम्नलिखित विस्तारित किनेसियन मॉडल, जो कि आय निर्धारण के लिए है, पर विचार कीजिए :
उपभोग फलन : $C_t = \beta_1 + \beta_2 Y_t - \beta_3 T_t + U_{1t}$
निवेश फलन : $I_t = \alpha_0 + \alpha_1 Y_{t-1} + U_{2t}$
कर लगना या कर-निर्धारण फलन : $T_t = \gamma_0 + \gamma_1 Y_t + U_{3t}$
आय पहचान : $Y_t = C_t + I_t + G_t$
जहाँ
C = उपभोग व्यय
Y = आय
I = निवेश
T = कर
G = सरकार व्यय और
U's = त्रुटि (डिस्टर्बेंस टर्म्स)
मॉडल में अंतर्जात चर (endogenous variables) C, I, T और Y हैं और पूर्वनिर्धारित चर
(predetermined variable) G और $Y_{t-1}$ हैं ।
क्रम शर्त लागू करके प्रणाली में प्रत्येक समीकरण की अभिज्ञेयता (आइडेंटिफाइबिलिटी) की जाँच
करें, और समग्र रूप से प्रणाली की जाँच करें । अपना निष्कर्ष लिखें ।
Answer approach & key points
The question demands explanation across three distinct domains: agricultural data systems, fertility measures, and econometric identification. Allocate approximately 35% (15-18 marks) to part (a) covering data collection methods and official publications; 35% (15-18 marks) to part (b) distinguishing GFR/TFR with mathematical relationships and demographic interpretation; and 30% (12-15 marks) to part (c) applying order/rank conditions to the Keynesian model. Structure with clear sectional headings, begin each part with definitions, proceed to methodological details, and conclude with synthesis—ensuring the econometric section explicitly shows matrix calculations for identifiability.
- Part (a): Enumeration of agricultural data collection methods (census, sample surveys, administrative records) with specific Indian examples—Land Records, Agricultural Census, NSSO rounds; identification of official publications (Agricultural Statistics at a Glance, State Statistical Abstracts, FAO reports, DES publications)
- Part (a)(ii): Classification of statistics by agency—MoA&FW (crop area, yield, production), CSO (national income from agriculture), RBI (agricultural credit), NABARD (rural credit, WDRA data), State Directorates of Economics and Statistics
- Part (b): Precise distinction between GFR (age-specific births per 1000 women) and TFR (average births per woman completing reproductive span); interpretation of TFR=3.29 as replacement-level analysis; merits/demerits covering data requirements, period sensitivity, and cross-population comparability
- Part (b): Mathematical derivation TFR = 5 × GFR (assuming 5-year age groups) or TFR = Σ ASFR; GRR-TFR relationship via sex ratio at birth; GRR interpretation with population stability implications—GRR>1 (growing), GRR<1 (declining), GRR=1 (stationary, ignoring mortality)
- Part (c): Correct statement of order condition (K ≥ M-1, where K=excluded predetermined, M=endogenous in equation) and rank condition; systematic application to four-equation Keynesian model identifying endogenous (C,I,T,Y) and predetermined (G,Yt-1) variables; construction of coefficient matrix and explicit identifiability verdict for each equation