Journal of Nutrition LabDiet, Your World of Nutritional Answers

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hansen, C. M.
Right arrow Articles by Shultz, T. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hansen, C. M.
Right arrow Articles by Shultz, T. D.
(Journal of Nutrition. 1999;129:1915-1919.)
© 1999 The American Society for Nutritional Sciences


Article

Application of the Bootstrap Procedure Provides an Alternative to Standard Statistical Procedures in the Estimation of the Vitamin B-6 Requirement1

Christine M. Hansen*, Marc A. Evans{dagger} and Terry D. Shultz*2

* Department of Food Science and Human Nutrition and the {dagger} Program in Statistics, Washington State University, Pullman, WA 99164-6376

2To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
The bootstrap procedure is a versatile statistical tool for the estimation of standard errors and confidence intervals. It is useful when standard statistical methods are not available or are poorly behaved, e.g., for nonlinear functions or when assumptions of a statistical model have been violated. Inverse regression estimation is an example of a statistical tool with a wide application in human nutrition. In a recent study, inverse regression was used to estimate the vitamin B-6 requirement of young women. In the present statistical application, both standard statistical methods and the bootstrap technique were used to estimate the mean vitamin B-6 requirement, standard errors and 95% confidence intervals for the mean. The bootstrap procedure produced standard error estimates and confidence intervals that were similar to those calculated by using standard statistical estimators. In a Monte Carlo simulation exploring the behavior of the inverse regression estimators, bootstrap standard errors were found to be nearly unbiased, even when the basic assumptions of the regression model were violated. On the other hand, the standard asymptotic estimator was found to behave well when the assumptions of the regression model were met, but behaved poorly when the assumptions were violated. In human metabolic studies, which are often restricted to small sample sizes, or when statistical methods are not available or are poorly behaved, bootstrap estimates for calculating standard errors and confidence intervals may be preferred. Investigators in human nutrition may find that the bootstrap procedure is superior to standard statistical procedures in cases similar to the examples presented in this paper.


KEY WORDS: • Vitamin B-6 requirement • statistics • inverse regression • bootstrap • human metabolic studies


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
Statistical methods have become the foundation for objective decision making in much of the scientific world. ANOVA, regression analysis, chi-square tests for independence, t-tests and so on are the champions of statistical inference. Although it is our belief that these utilitarian approaches will continue to dominate the paths to sound statistical inference, several new methodologies are currently available but are relatively unknown to the scientific community. One of the methods to which we are referring is the bootstrap (Efron 1979Citation ). This procedure is of recent origin, having arisen, for the most part, because of the advent of powerful computers.

The bootstrap is a statistically elegant procedure. It can be applied in situations where standard statistical tools do not exist or in situations where the usual statistical methods are inappropriate because the underlying assumptions are violated (e.g., ANOVA and regression analysis have assumptions of normality and constant variance). This violation may cause the inferences based on the usual methods of analysis to lead to spurious conclusions. On the other hand, the bootstrap procedure can produce valid inferences for ANOVA and regression analysis when the typical assumptions are violated. Although the bootstrap would appear useful in those aforementioned situations, this is not the principal area of application. The bootstrap procedure is also applicable in those situations where analytic statistical methods are not readily available.

This paper introduces the bootstrap procedure to compute standard errors and confidence intervals for the estimation of the vitamin B-6 requirement by using inverse regression. Bootstrap estimates of the mean, standard error of the mean and confidence intervals will be compared to estimates based on formulas commonly found in statistics textbooks. Furthermore, through Monte Carlo simulation of inverse regression the characteristics of bootstrap estimators compared to the more typically used estimators are explored.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
In a recent controlled-diet study (Huang et al. 1998Citation ), after a 9-d adjustment period (1.60 mg vitamin B-6/d), eight young women consumed a low vitamin B-6 diet (0.45 mg/d) for 27 d, followed by repletion with three levels of vitamin B-6 (1.26, 1.66 and 2.06 mg/d) for successive periods of 21, 21 and 14 d, respectively. Six measures of vitamin B-6 status were assessed at the end of each experimental period. One method of determining the vitamin B-6 requirement is to determine the intake at which status indicators are restored to their baseline values after depletion. To provide estimates of the vitamin B-6 requirement, values for each of the six status indicators were regressed on vitamin B-6 intake by using linear regression, and the estimated regression lines were then solved for vitamin B-6 intake by using the mean from the end of the adjustment period for each measure of B-6 status [vitamin B-6 intake = (mean adjustment period value - intercept)/slope]. This type of regression analysis is known as inverse prediction or calibration (Kutner et al. 1996Citation ). An example of the regression analysis is illustrated in Fig. 1Citation . The standard error of the predicted vitamin B-6 requirement was calculated by using an asymptotic estimator (Kutner et al. 1996Citation ). The weighted mean of the six estimates (by using the standard error as the weight factor) and 95% confidence limits for the mean were then calculated (Table 1Citation ). Equations used for the calculation of both classical and bootstrap statistics are listed in Appendix A .



View larger version (21K):
[in this window]
[in a new window]
 
Figure 1. Linear regression analysis of urinary 4-pyridoxic acid (4-PA) excretion versus vitamin B-6 intake in eight premenopausal women who consumed a depletion diet providing 0.45 mg vitamin B-6 for 27 d and then were repleted with vitamin B-6 intakes of 1.26, 1.66 and 2.06 mg/d for three successive 14- or 21-d periods. The solid horizontal line represents the mean baseline urinary 4-PA excretion at the end of the adjustment period that preceded depletion. The vertical line is drawn from the intersection of the regression line and the adjustment period mean line, perpendicular to the x-axis, to predict the vitamin B-6 intake required to achieve baseline urinary 4-PA excretion after depletion, an estimation of the vitamin B-6 intake requirement. Similar regression analysis and inverse predictions were performed for the five additional measures of vitamin B-6 status listed in Table 1Citation .

 

View this table:
[in this window]
[in a new window]
 
Table 1. Vitamin B-6 requirement calculated from six measures of vitamin B-6 status by inverse regression by using standard statistical procedures and the bootstrap procedure1

 
The multivariate bootstrap was also applied to the prediction of the vitamin B-6 requirement for each of the six status indicators by using inverse regression as follows:
  1. Because there were measurements at four vitamin B-6 intakes for each of the eight subjects, our observed data consisted of 32 (x, y) pairs for each status indicator. For each vitamin B-6 status indicator, 2,000 independent samples of n = 32 pairs were selected, with replacement, from the observed data. These represent 2,000 bootstrap samples.
  2. The simple linear regression equation (Yi = ß0 + ß1Xi + {epsilon}i) was calculated for each of the 2,000 bootstrap samples and solved for vitamin B-6 intake (Xnew) at the adjustment mean for that status indicator (Ynew): Xnew = (Ynew - ß0)/ß1. (The simple linear regression model was appropriate because repeated observations on the same subject had been found to be uncorrelated.) The average of the 2,000 bootstrap estimates of the vitamin B-6 requirement is shown in Table 1Citation for each vitamin B-6 status indicator.
  3. The bootstrap standard error and t-confidence interval were computed as shown in Appendix A .
  4. The bootstrap percentile confidence interval was constructed as follows:
    1. (a) The bootstrap estimates were ordered from smallest to largest.
    2. (b) The 50th and the 1,950th values represent the lower and upper limits of the 95% confidence interval (0.025 x 2000 and 0.975 x 2000, respectively).

A program using the SAS® IML software (SAS Institute, Cary, NC) for bootstrap estimation of inverse regression standard errors and confidence intervals is provided in Appendix B3 . This program was used to compute the bootstrap estimation results displayed in Table 1Citation for the inverse regression example. Included with the estimates of the vitamin B-6 requirement are the estimated standard errors and the lower and upper limits for 95% confidence intervals based on both the asymptotic estimator and the bootstrap estimator.

As shown in Table 1Citation , the standard and bootstrap methods produced inverse regression estimates and standard errors that were nearly identical. However, the bootstrap percentile confidence intervals were asymmetric, unlike the confidence intervals produced by the standard method. This asymmetry is likely the result of a skewness in the data and possibly in the distribution of the underlying population. Thus, the bootstrap confidence intervals may better reflect reality. Unfortunately, one cannot draw such a conclusion from a single experiment. Therefore, a Monte Carlo simulation was undertaken to assess the performance of the bootstrap estimator as applied to inverse regression. Although extensive statistical theory was developed for the bootstrap procedure (Efron and Tibshirani 1993Citation ), and the bootstrap procedure was shown to have outstanding performance characteristics, analytical assessment of the quality of bootstrap estimators is not available for all statistical estimators (e.g., inverse regression). Therefore, alternative means of assessing the quality of specific bootstrap applications are required. Monte Carlo simulation provides this alternative means of assessment.

Monte Carlo simulation (Manly 1991Citation ) is simple in nature: generate data under a specific set of conditions, compute the statistic(s) of interest and store the results. This process is repeated several thousand times to accumulate information concerning the characteristics of the statistic(s) in question. Because the conditions (i.e., mean, standard deviation, probability distribution) under which the data are generated are known, the properties of the statistic(s) can be compared to the known conditions.

To assess the inverse regression bootstrap estimators, response (Y) and predictor (X) data for a population of size 1,000 were generated under the simple linear regression model: Yi = ß0 + ß1Xi + {epsilon}i with ß0 = 0; ß1 = 5; 0.0 <= Xi <= 50.0 in increments of 0.05. Data were generated under two different conditions for sample sizes of n = 10, 20, 40 and 80: (a) regression model with valid assumptions; {epsilon}i normally distributed random variable with mean 0.0 and standard deviation 5.0; (b) regression model with nonconstant variance; {epsilon}i normally distributed random variable with mean 0.0 and standard deviation 0.2 x Xi. Specific values of the response (Ynew) were selected to calculate the inverse regression estimate of the predictor (Xnew): Ynew = 31.25 {Rightarrow} Xnew = 6.25; Ynew = 62.50 {Rightarrow} Xnew = 12.50; Ynew = 125.00 {Rightarrow} Xnew = 25.00.

The two conditions described above are possible conditions under which linear regression data might occur in nature, i.e., from data that satisfy the usual regression model assumptions of normality and constant variance to data that would seriously violate one of these assumptions. Furthermore, the sample sizes considered range from small to relatively large, and the selected values of Ynew range across the first half of the distribution of Y.

Figure 2Citation is a graphical representation of the population under the usual regression assumptions. This graph also depicts the true values of Xnew given each value of Ynew. For each of the 24 combinations of sample size, Ynew and distribution conditions, 1,000 samples were generated. The regression coefficient, Xnew and the standard error of new were estimated for each of the 1,000 samples. For each sample, the 95% classical t-confidence interval for Xnew was constructed and checked for coverage of the true value of Xnew. Also, for each sample, 1,000 bootstrap samples were drawn, and the bootstrap standard error for new was computed along with the 95% bootstrap t-confidence intervals and 95% bootstrap percentile confidence intervals. Again, the coverage of the confidence interval was checked against the true value of Xnew. Lastly, the mean of the new for the 1,000 samples was computed, as was the standard deviation of these 1,000 values. This last value is an empirically derived value for the true standard error of new.



View larger version (26K):
[in this window]
[in a new window]
 
Figure 2. An example of 1,000 paired (X,Y) values generated under linear regression model Yi = ß0 + ß1Xi + {epsilon}i with ß0 = 0; ß1 = 5; 0.0 <= Xi <= 50.0 in increments of 0.05; {epsilon}i normally distributed random variable with mean 0.0 and standard deviation 5.0; as a population from which samples of n = 10, 20, 40 and 80 were repeatedly drawn for the Monte Carlo simulation.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
Table 1Citation shows the inverse regression estimates, standard error estimates and 95% confidence intervals for the vitamin B-6 requirement based on mean adjustment values for the six vitamin B-6 status indicators, by using standard statistical estimators and bootstrap estimators. The standard and bootstrap methods produced inverse regression estimates and standard errors that were nearly identical. However, the bootstrap percentile confidence intervals were asymmetric, unlike the confidence intervals produced by the standard method, which is likely the result of a skewness in the data and possibly in the distribution of the underlying population. Thus, the bootstrap confidence intervals may better reflect reality. The Monte Carlo simulation was undertaken to assess the precision of bootstrap estimates for inverse regression.

The results of the Monte Carlo simulation are presented in Tables 2Citation , 3 and 4. Table 2Citation presents the simulated mean values for new. These results clearly show that the inverse regression estimator of new has little or no bias, even when the regression assumptions are violated. Table 3Citation shows the empirically derived true standard error, the average estimated asymptotic standard error and the average bootstrap standard error for Ynew = 62.50. Except for a couple of instances, the bootstrap estimated standard error more closely estimated the true standard error than the asymptotic standard error. In fact, the bootstrap standard error estimate was essentially unbiased for sample sizes of 20 or more, whereas the asymptotic standard error was biased when the assumption of constant variance was violated. Table 4Citation presents the percent noncoverage of the 95% confidence intervals. These confidence intervals are based on the classical t-confidence interval, using the asymptotic standard error; the bootstrap t-confidence interval, using the bootstrap standard error; and the bootstrap percentile confidence interval. In general, when the regression assumptions are satisfied, the classical t-confidence interval, the bootstrap t-confidence interval and the bootstrap percentile confidence interval cover the true value of Xnew at the nominal 95% rate (2.5% noncoverage above and below Xnew). However, when the assumptions are violated (e.g., nonconstant variance), the classical t-confidence interval produces nearly 100% coverage. This would indicate that the confidence intervals constructed in this manner were excessively broad. Both the bootstrap t-confidence interval and the percentile interval came closer to 95% coverage (5% noncoverage) for most conditions of sample size and distribution. Similar results were obtained for Ynew = 31.25 and 125.00 (data not reported here).


View this table:
[in this window]
[in a new window]
 
Table 2. Monte Carlo simulated mean inverse regression estimates for true Xnew = 6.25, 12.50 and 25.00, based on Ynew = 31.25, 62.50 and 125.00, respectively1

 

View this table:
[in this window]
[in a new window]
 
Table 3. Empirically derived true standard error, the average estimated asymptotic standard error and the average bootstrap standard error based on results of the Monte Carlo simulation for Ynew = 62.50 and Xnew = 12.501

 

View this table:
[in this window]
[in a new window]
 
Table 4. Percent upper and lower limit noncoverage by 95% confidence intervals based on the classical t-confidence interval, the bootstrap t-confidence interval and the bootstrap percentile confidence interval from the Monte Carlo simulation for Ynew = 62.50 and Xnew = 12.501

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
The bootstrap procedure is a computationally intensive statistical tool used in situations where standard techniques for estimating standard errors and computing confidence intervals are not available. The results shown here for inverse regression analysis clearly show the superiority of the bootstrap procedure for the estimation of the standard error of new and the computation of confidence intervals when the assumptions are violated relative to the standard approach that uses the asymptotic estimator of the standard error of new. When bootstrap analysis is used to estimate nutrient requirements, the benefits of this method are not immediately obvious. When using the asymptotic estimator, the estimation of the vitamin B-6 requirement based on the weighted mean was between 1.72 and 1.92 mg/d (95% confidence interval). When the bootstrap estimator was used, the estimation of the vitamin B-6 requirement was between 1.75 and 2.04 mg/d. Thus, both methods produce very similar confidence intervals (Table 1)Citation .

Evidence of the superiority of one estimator over another can only be obtained through analytic, mathematical methods that are generally not available or through Monte Carlo computer simulation. The simulation presented in this paper for inverse regression clearly shows the superiority of the bootstrap procedure compared to the asymptotic methods presented in statistics textbooks (Kutner et al. 1996Citation ), when the standard assumptions are violated. The mean of the estimated standard error based on the bootstrap procedure is, in nearly all simulations, either of the same magnitude or smaller and closer to the true standard error than the mean of the estimated standard error based on the asymptotic procedure. Furthermore, the bootstrap procedure tends to produce confidence intervals that are either of the same nominal 95% coverage rate as the standard methods or closer to the nominal 95% coverage rate. The lack of confidence interval coverage for a 95% confidence interval should be 2.5% in the region below the confidence interval and 2.5% in the region above the confidence interval. For the inverse regression problem, the classical t-confidence interval based on the asymptotic standard error is conservative when the assumption of constant variance has been violated.

In summary, the simple beauty of the bootstrap procedure is that it can be applied in situations where point estimators, but not the methods for computing standard errors or confidence intervals, have been developed. The bootstrap is a procedure of relatively recent origin; its development was largely based on the availability of high-speed computers. Because of this recent derivation, the bootstrap has not found its way into all scientific disciplines. This conclusion appears to be true in the area of human nutrition, and this paper is an attempt to introduce the bootstrap within this discipline.


    APPENDIX A
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 
Statistical equations

Let Xi (i = 1, 2, ... , n) be the observed values for a random sample of size n taken from a population with mean µ and standard deviation {varsigma}. Let (x) represent an estimator of a parametric function, {theta}(x) and se((x)) the standard error of (x). A simple example of this is the sample mean x = (x), which is an estimator of µ = {theta}(x), the population mean. Then a (1-{alpha}) · 100% confidence interval for {theta}(x) is:

• Classical t-confidence interval:

where df denotes the degrees of freedom.

• Asymptotic estimator of the standard error for inverse regression estimator new:

Let new be the estimated value of the predictor X at a specified value of Y that is denoted Ynew. Let 0 be the estimated intercept and 1 the estimated slope of the regression line. The asymptotic estimate of the standard error is:

new and sê(new) are then used in the classical t-confidence interval equation.

• Bootstrap t-confidence interval:

Select B independent samples of size n, with replacement, from the observed data. These samples may be denoted x*1, x*2, ... , x*B, with each x*b (b = 1, 2, ... , B) consisting of n data values. The bootstrap estimated standard error for (x) will be:

and sê((x)) are then used in the classical t-confidence interval equation.

• Bootstrap percentile confidence interval:

Order the B bootstrap estimates from smallest to largest. Identify the and the values of the ordered values. These values represent the lower and upper limits for the (1-{alpha}) · 100% confidence interval.


    FOOTNOTES
 
1 Presented in part at the 1998 meeting of the Federation of American Societies for Experimental Biology in San Francisco, CA [Hansen CM, Evans MA & Shultz TD. (1998) Application of the bootstrap to estimation of vitamin B-6 requirement. FASEB J. 12:A2479 (abs.)]. Back

3 Deposited in the National Auxiliary Publication Service. Back

Manuscript received April 15, 1999. Initial review completed May 11, 1999. Revision accepted June 23, 1999.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX A
 REFERENCES
 

1. Efron B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979;7:1-26

2. Efron B., Tibshirani R. J. An Introduction to the Bootstrap 1st ed. 1993 Chapman and Hall London, UK.

3. Huang Y. -C., Chen W., Evans M. A., Mitchell M. E., Shultz T. D. Vitamin B-6 requirement and status assessment of young women fed a high protein diet with various levels of vitamin B-6. Am. J. Clin. Nutr. 1998;67:208-220[Abstract]

4. Kutner M. H., Nachtschien C. J., Wasserman W., Neter J. Applied Linear Statistical Models 4th ed. 1996 Richard D. Irwin, Inc Homewood, IL.

5. Manly B.F.J. Randomization and Monte Carlo Methods in Biology 1st ed. 1991 Chapman and Hall London, UK.




This article has been cited by other articles:


Home page
J. Nutr.Home page
C. Guo and L. R. Wilkens
Use of Bootstrap Procedure and Monte Carlo Simulation
J. Nutr., October 1, 2000; 130(10): 2618 - 2618.
[Full Text]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hansen, C. M.
Right arrow Articles by Shultz, T. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hansen, C. M.
Right arrow Articles by Shultz, T. D.


Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
Copyright © 1999 by American Society for Nutrition