![]() |
|
|
3 Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, 52161 Israel; 4 Center for Nutrition Policy and Promotion, USDA, Alexandria, VA 22302; 5 National Cancer Institute, Bethesda, MD 20892-7344; and 6 National Agricultural Statistics Service, USDA, Fairfax, VA 22030
* To whom correspondence should be addressed. E-mail: lsf{at}actcom.co.il.
| ABSTRACT |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Ideally, the HEI-2005 should be calculated on the basis of the usual dietary intake of each individual, i.e. their mean intake over a specified period (often 1 y). This is consistent with the Institute of Medicine's emphasis on assessing usual diets. Both the Institute of Medicine and the Dietary Guidelines for Americans 2005 point out that recommendations should be met over the long term (3,4). In practice, the usual intake of an individual cannot be observed. Often, only 1 d of food intake, collected via a 24-h recall (24HR), has been available. In such circumstances, the HEI-2005 component and total scores of each individual's 1-d intake can be calculated, but this will lead to a biased measure of the individual's HEI-2005 score of usual intake when the individual's 1-d food/nutrient:energy ratio is correlated with his/her energy intake. More critically, even in the absence of such a correlation, the HEI-2005 score on a single day can be a biased measure of the mean HEI-2005 score across days, because the scoring system is truncated at 0 on one end and at 5, 10, or 20 at the other. As a result, the long-term mean of HEI-2005 scores on single days of intake differs from the score of the long-term mean intake over those days, which is what we want to measure.
USDA's most important use of the HEI is to monitor the dietary intake of the population over time. For this purpose, the natural measure of the quality of the population's diet is the population's mean HEI component and total scores, based on the usual intake of each component. We will call these the population's mean usual HEI component scores. In this report, we examine 3 ways of estimating the population's mean usual HEI-2005 component scores from data on a series of individuals, each of whom supplied a single 24HR. With such limited data, no unbiased estimate is available. Our main concern was to identify which of the 3 methods had the least bias.
| Methods |
|---|
|
|
|---|
1) For each individual, calculate each HEI-2005 component score on the basis of his/her 24HR. Then, for each component score and for the total score, take the (arithmetic) mean over individuals. We call this the mean score. The HEI-2005 total score is calculated as the sum of these scores over the 12 components.
2) For each individual and each component, calculate the ratio of the reported intake of food group or nutrient (relevant to the HEI component considered) to the reported energy intake. Then take the mean of these ratios over individuals. Finally, calculate the HEI-2005 component score based on this mean ratio. We call this the score of the mean ratio. The HEI-2005 total score is calculated as the sum of these scores over the 12 components.
3) Calculate the population's total intake of food group or nutrient (relevant to the HEI component considered) and the population's total energy intake and take the ratio of these. Then calculate the HEI-2005 component score based on this ratio of the totals. We call this the score of the population ratio. The HEI-2005 total score is calculated as the sum of these scores over the 12 components.
The names given to methods 2 and 3 follow those of Krebs-Smith et al. (5). It is not immediately clear which method would be least biased and one can construct different numerical examples where each 1 of the 3 is the superior method. The methods must therefore be tested with data that are realistic and conform to typically reported values and come from a population with known population mean usual HEI-2005 component scores. Unfortunately, available real datasets do not satisfy condition 2, so we employed instead computer simulations of data generated from a statistical model that is based on real data.
The dataset we used as a basis for our statistical model is drawn from the Eating at America's Table Study (EATS) (6). The study was approved by the National Cancer Institute Special Studies Institutional Review Board. The 738 women we studied were part of a nationally representative sample. Participants were asked to complete 4 24HRs via telephone over a period of 1 y (1997–98), with 1 recall per season. Six hundred and fifty (88%) of these women completed all 4 recalls. Foods reported on the 24HR were coded using the Food Intake Analysis System, version 2, which calculated total daily intakes for energy, saturated fat, and sodium. The food codes, in turn, were linked to the MyPyramid Equivalents Database, version 1.0, to calculate total daily intakes of the food groups of interest.
We computed summary statistics on the first day's reported intake of the 12 HEI-2005 components (and energy) (Table 1). Note that the mean ratio is different from the population ratio (final 2 columns of Table 1). In most cases, the mean ratio has the larger value; but for oils and saturated fat, and solid fats, alcoholic beverages, and added sugars (SoFAAS), it has the smaller value.
|
Some food groups are not consumed every day by all individuals. We refer to days on which a given food group is consumed by a given individual as that individual's "consumption days," the remaining days being the individual's "nonconsumption days."
First, we made an assumption about the intake distributions. Distributions of intake on consumption days, both between individuals and within individuals, were assumed to be normal after a suitable power transformation. The power transformation for each food/nutrient was individually chosen after inspection of the deciles of the distribution (see column 2 of on-line Supplemental Table 1).
For food groups (but not for nutrients), there is a probability of nonconsumption on a single day. We examined 3 assumptions regarding this probability, each of increasing complexity.
Assumption 1: The probability of consumption is the same for all individuals.
Unfortunately, this assumption was not supported by the EATS data, where too many individuals report consuming a particular food group on either no days or on all 4 d. Therefore, we postulated the following.
Assumption 2: There are 5 subclasses of individuals consuming the food on 0, 25, 50, 75, or 100% of days. In addition, the distribution of intakes on consumption days is independent of the probability of consumption.
The second part of assumption 2, namely, that the distribution of intakes on consumption days is independent of the individual's probability of consumption, can be readily checked against data. Unfortunately, this independence assumption was also not supported by the EATS data. In fact, reported intakes on consumption days have previously been reported to correlate positively with the probability to consume (7). This led us to the final assumption.
Assumption 3: There are the same 5 probability of consumption subclasses as in assumption 2, but each has its own mean intake on consumption days.
Once the statistical model had been formulated, simulation programs were written in S-Plus (S-Plus 2000, Professional Edition for Windows, Release 1, Seattle, 1999) that generated data from the food/nutrient and energy intake distributions under the 3 different assumptions regarding the probability to consume and computed the 3 estimates of population mean HEI-2005 component scores (mean score, score of the mean ratio, and score of the population ratio). Each simulation generated a population of 10,000 persons, a single day of intake (both the food/nutrient for the component of interest and energy) for that individual to be used in computing the 3 estimates, and a true usual intake (both for the food/nutrient and for energy) for each individual. The true usual intakes were used to compute the true population mean HEI-2005 component scores with which the estimates of this population mean based on a 1-d report could be compared. Further details of the simulations may be found in online Appendix B.
| Results |
|---|
|
|
|---|
|
|
|
To obtain a summary view of the accuracy of each method, we averaged the absolute bias (i.e. the absolute difference between the estimate and the true value) over the 12 components (final row of Tables 2–4). The maximum and minimum absolute bias taken over the 12 components is also shown. The mean absolute bias was substantially lower when using the score of the population ratio than when using either of the other 2 estimators.
| Discussion |
|---|
|
|
|---|
Such complications are compounded by the HEI-2005 component scores themselves, which are nonlinear functions of a ratio, due to the truncation imposed at the minimum and maximum scores. This nonlinearity can lead to bias even when the ratio itself is estimated without bias. For example, consider the whole fruits component and imagine an individual who consumes exactly 2000 kcal (8368 kJ) consistently each day. Suppose this individual consumes a 1-cup equivalent (240 mL) of whole fruit on one-half of the days, but none on the other one-half. Then the mean or "usual" ratio for the individual is 0.25-cup equivalents (60 mL) per 1000 kcal (4184 kJ), leading to a score of 5 x 0.25/0.4 = 3.125, where 0.4 is the truncation point for the maximum achievable score of 5. If we determine the mean of the ratios over several days, we obtain over the long-term the correct 0.25–cup equivalents (60 mL) per 1000 kcal (4184 kJ) (because energy intake is constant). If we determine means of the scores on individual days, however, then over the long term, we obtain a minimum score of 0 on one-half of the days and a maximum score of 5 on the other one-half, giving a mean of 2.5 and not the true value of 3.125.
These complications make it impossible to predict analytically which of the 3 proposed estimates is likely to be the least biased. This suggests that the surest way to investigate the matter is through computer simulation. Based on the results in Tables 2–4, the least biased of the 3 methods to estimate a population's mean usual HEI-2005 component scores is the score of the population ratio.
Our conclusion is that one should estimate the population's mean usual HEI-2005 component scores by calculating the score of the population ratio, i.e. by taking the score of the ratio of the total food/nutrient intake:energy intake. Nevertheless, this conclusion has some caveats. The conclusion is empirically driven and depends on the U.S. distributions of reported intakes of the components included in the HEI-2005, as well on the standards by which the HEI-2005 component scores are determined.
We have found in a sensitivity analysis that our conclusion is robust to the sampling errors involved when estimating the parameters from the sample of 738 women participating in EATS. The results are reported in online Appendix C. We have also examined distributions of intake reported by men in the EATS study and by women in the Continuing Survey of Food Intakes by Individuals, 1994–96 (8). Although we have not fully modeled these data in the same depth as the data on the women in EATS, we obtained a strong impression that the distributional characteristics were very similar in the 3 groups (allowing for different levels of absolute intake) and would lead to the same conclusions presented here.
Nevertheless, we are aware that substantial changes in intake distributions or in the scoring standards could change the conclusions. For example, while developing the details of this work, we noticed that changes in the chosen standards for the scores could change the performance of the 3 methods that we examined.
It is important to check that the data used for calculating the population's mean usual HEI scores are representative of the usual intake of the population, even if usual intake cannot be assessed in the individual participants. This requires that, in order to make inferences about the U.S. population, the data come from a nationally representative sample and the dietary reports are collected for all 7 d of the week with proportional representation of weekend and week days and seasons of the year. If probability samples rather than simple random samples are used, then the appropriate weights must be employed when the population ratios of the total food/nutrient intake:total energy intake are estimated. It is also advisable that the sample is quite large, in the order of 1000 individuals or more, to ensure that the standard errors of the estimates are relatively small.
As mentioned above, we are confident that our conclusion holds true for the currently available U.S. population data. However, we are not so sanguine with regard to minority subpopulations of the US nor with regard to populations in other countries. We recommend that researchers interested in HEI-2005 component scores in these populations carry out a similar exercise to that reported here, simulating data that follow intake distributions reported in the population of interest. Until evidence emerges for the superiority of another estimate, the score of the population ratio would seem to be the best choice in such cases. We also recommend that periodic checks be carried out to confirm that this measure remains optimal for the U.S. population, because intake distributions may change.
With the caveats mentioned, we recommend estimating the population's mean usual HEI-2005 component scores by the score of the population ratio. Constructing a 2-sided 95% CI for this measure is recommended over estimating a standard error, because the sampling distribution may be asymmetric. A 95% CI for a component score can be constructed using standard survey packages in the following manner. First, determine the CI for the associated population ratio with the package and then score the end points of the interval. A precision measure for the total HEI-2005 score, the sum of the 12 component scores, is more difficult to develop. An algorithm is given in online Appendix D.
Our main comparison of the 3 estimators was based on their biases and not on their standard errors. We considered the standard error of the estimators to be of secondary importance to the bias, because in the relatively large samples that we envisage, the bias will dominate the error of the estimate, especially in this case where the biases are often large. To check this further, we computed from our simulation (under the assumption of a varying probability of consumption that is correlated with the amount of intake on consumption days) the standard error of the 3 estimates that would be expected from a sample of 1000 individuals. The means of the standard errors taken over the 12 components were 0.09 for the mean score, 0.18 for the score of the mean ratio, and 0.14 for the score of the population ratio, compared with mean absolute biases of 0.73, 0.66, and 0.37, respectively. More details may be found in online Appendix E.
Nutritional survey data sometimes include repeated dietary assessments on all or a subset of participants. Such repeat assessments allow statistical modeling of within-person variation and offer the possibility of reducing the bias in estimating the population distribution of usual intakes using statistical modeling (8,9). A future research aim will be to extend such methods to estimate the U.S. population distribution of the usual HEI-2005 component scores. It is clearly advantageous to be able to estimate the full distribution rather than just the mean. Furthermore, if this can be implemented successfully, it would be a short further step to estimate the population mean directly from these distributions. In principle, estimates of the population mean derived in this manner should have minimal bias and could, therefore, be an improvement over the best method when one 24HR is available, namely the score of the population ratio. Currently, the score of the population ratio should be regarded as the principal method for estimating the population mean usual HEI-2005 component and total scores.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
2 Author disclosures: L. S. Freedman, P. M. Guenther, S. M. Krebs-Smith, and P. S. Kott, no conflicts of interest. ![]()
7 Abbreviations used: EATS, Eating at America's Table Study; HEI, Healthy Eating Index; SoFAAS, Solid Fats, Alcoholic beverages, and Added Sugars; 24HR, 24-h recall. ![]()
Manuscript received 23 December 2007. Initial review completed 18 February 2008. Revision accepted 5 June 2008.
| LITERATURE CITED |
|---|
|
|
|---|
1. Guenther PM, Reedy J, Krebs-Smith SM. Development of the Healthy Eating Index-2005. J Am Diet Assoc. In press 2008.
2. Guenther PM, Reedy J, Krebs-Smith SM, Reeve BB. Evaluation of the Healthy Eating Index-2005. J Am Diet Assoc. In press 2008.
3. Institute of Medicine. Dietary reference intakes: applications in dietary assessment. Washington DC: National Academy Press; 2000.
4. US Department of Health and Human Services and USDA. Dietary guidelines for Americans 2005. Washington (DC): US Government Printing Office, Stock Number: 001–000–04719–1; 2005. Available from: http://www.healthierus.gov/dietaryguidelines
5. Krebs-Smith SM, Kott PS, Guenther PM. Mean proportion and population proportion: two answers to the same question? J Am Diet Assoc. 1989;89:671–6.[Medline]
6. Subar AF, Thompson FE, Kipnis V, Midthune D, Hurwitz P, McNutt S, McIntosh A, Rosenfeld S. Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires: the Eating at America's Table Study. Am J Epidemiol. 2001;154:1089–99.
7. Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Guenther PM, Carroll RJ, Kipnis V. A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. J Am Diet Assoc. 2006;106:1575–87.[Medline]
8. Tippett KS, Cypel YS, eds. Design and operation: The Continuing Survey of Food Intakes by Individuals and the Diet and Health Knowledge Survey, 1994–96. Beltsville (MD): USDA, Agricultural Research Service, Nationwide Food Surveys Report No. 96–1; 1997.
9. Dodd KW, Guenther PM, Freedman LS, Subar AF, Kipnis V, Midthune D, Tooze JA, Krebs-Smith SM. Statistical methods for estimating usual intake of nutrients and foods: a review of the theory. J Am Diet Assoc. 2006;106:1640–50.[Medline]
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||