![]() |
|
|

*
Department of Food Science and Human Nutrition and the
Program in Statistics, Washington State University, Pullman, WA 99164-6376
2To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
KEY WORDS: Vitamin B-6 requirement statistics inverse regression bootstrap human metabolic studies
| INTRODUCTION |
|---|
|
|
|---|
The bootstrap is a statistically elegant procedure. It can be applied in situations where standard statistical tools do not exist or in situations where the usual statistical methods are inappropriate because the underlying assumptions are violated (e.g., ANOVA and regression analysis have assumptions of normality and constant variance). This violation may cause the inferences based on the usual methods of analysis to lead to spurious conclusions. On the other hand, the bootstrap procedure can produce valid inferences for ANOVA and regression analysis when the typical assumptions are violated. Although the bootstrap would appear useful in those aforementioned situations, this is not the principal area of application. The bootstrap procedure is also applicable in those situations where analytic statistical methods are not readily available.
This paper introduces the bootstrap procedure to compute standard errors and confidence intervals for the estimation of the vitamin B-6 requirement by using inverse regression. Bootstrap estimates of the mean, standard error of the mean and confidence intervals will be compared to estimates based on formulas commonly found in statistics textbooks. Furthermore, through Monte Carlo simulation of inverse regression the characteristics of bootstrap estimators compared to the more typically used estimators are explored.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
i) was calculated for each of the 2,000
bootstrap samples and solved for vitamin B-6 intake
(Xnew) at the adjustment mean for that status
indicator (Ynew): Xnew =
(Ynew -
ß0)/ß1. (The simple
linear regression model was appropriate because repeated observations
on the same subject had been found to be uncorrelated.) The average of
the 2,000 bootstrap estimates of the vitamin B-6 requirement is shown
in Table 1
A program using the SAS® IML software
(SAS Institute, Cary, NC) for bootstrap estimation of inverse
regression standard errors and confidence intervals is provided in
Appendix B3
. This program was used to compute the bootstrap estimation results
displayed in Table 1
for the inverse regression example. Included with
the estimates of the vitamin B-6 requirement are the estimated standard
errors and the lower and upper limits for 95% confidence intervals
based on both the asymptotic estimator and the bootstrap estimator.
As shown in Table 1
, the standard and bootstrap methods produced
inverse regression estimates and standard errors that were nearly
identical. However, the bootstrap percentile confidence intervals were
asymmetric, unlike the confidence intervals produced by the standard
method. This asymmetry is likely the result of a skewness in the data
and possibly in the distribution of the underlying population. Thus,
the bootstrap confidence intervals may better reflect reality.
Unfortunately, one cannot draw such a conclusion from a single
experiment. Therefore, a Monte Carlo simulation was undertaken to
assess the performance of the bootstrap estimator as applied to inverse
regression. Although extensive statistical theory was developed for the
bootstrap procedure (Efron and Tibshirani 1993
), and the
bootstrap procedure was shown to have outstanding performance
characteristics, analytical assessment of the quality of bootstrap
estimators is not available for all statistical estimators (e.g.,
inverse regression). Therefore, alternative means of assessing the
quality of specific bootstrap applications are required. Monte Carlo
simulation provides this alternative means of assessment.
Monte Carlo simulation (Manly 1991
) is simple in nature:
generate data under a specific set of conditions, compute the
statistic(s) of interest and store the results. This process is
repeated several thousand times to accumulate information concerning
the characteristics of the statistic(s) in question. Because the
conditions (i.e., mean, standard deviation, probability distribution)
under which the data are generated are known, the properties of the
statistic(s) can be compared to the known conditions.
To assess the inverse regression bootstrap estimators, response (Y) and
predictor (X) data for a population of size 1,000 were generated under
the simple linear regression model: Yi =
ß0 +
ß1Xi +
i with ß0 = 0;
ß1 = 5; 0.0
Xi
50.0 in increments of 0.05. Data were generated under two different
conditions for sample sizes of n = 10, 20, 40 and 80:
(a) regression model with valid assumptions;
i
normally distributed random variable with mean 0.0 and standard
deviation 5.0; (b) regression model with nonconstant variance;
i normally distributed random variable with
mean 0.0 and standard deviation 0.2 x Xi.
Specific values of the response (Ynew) were
selected to calculate the inverse regression estimate of the predictor
(Xnew): Ynew = 31.25
Xnew = 6.25; Ynew = 62.50
Xnew = 12.50; Ynew =
125.00
Xnew = 25.00.
The two conditions described above are possible conditions under which linear regression data might occur in nature, i.e., from data that satisfy the usual regression model assumptions of normality and constant variance to data that would seriously violate one of these assumptions. Furthermore, the sample sizes considered range from small to relatively large, and the selected values of Ynew range across the first half of the distribution of Y.
Figure 2
is a graphical representation of the population under the usual
regression assumptions. This graph also depicts the true values of
Xnew given each value of
Ynew. For each of the 24 combinations of sample
size, Ynew and distribution conditions, 1,000
samples were generated. The regression coefficient,
Xnew and the standard error of
new were estimated for each of
the 1,000 samples. For each sample, the 95% classical
t-confidence interval for Xnew was
constructed and checked for coverage of the true value of
Xnew. Also, for each sample, 1,000 bootstrap
samples were drawn, and the bootstrap standard error for
new was computed along with the
95% bootstrap t-confidence intervals and 95% bootstrap
percentile confidence intervals. Again, the coverage of the confidence
interval was checked against the true value of
Xnew. Lastly, the mean of the
new for the 1,000 samples
was computed, as was the standard deviation of these 1,000 values. This
last value is an empirically derived value for the true standard error
of
new.
|
| RESULTS |
|---|
|
|
|---|
The results of the Monte Carlo simulation are presented in Tables 2
, 3 and
4. Table 2
presents the simulated mean values for
new. These results clearly
show that the inverse regression estimator of
new has little or no bias,
even when the regression assumptions are violated. Table 3
shows the
empirically derived true standard error, the average estimated
asymptotic standard error and the average bootstrap standard error for
Ynew = 62.50. Except for a couple of instances,
the bootstrap estimated standard error more closely estimated the true
standard error than the asymptotic standard error. In fact, the
bootstrap standard error estimate was essentially unbiased for sample
sizes of 20 or more, whereas the asymptotic standard error was biased
when the assumption of constant variance was violated. Table 4
presents
the percent noncoverage of the 95% confidence intervals. These
confidence intervals are based on the classical t-confidence
interval, using the asymptotic standard error; the bootstrap
t-confidence interval, using the bootstrap standard error;
and the bootstrap percentile confidence interval. In general, when the
regression assumptions are satisfied, the classical
t-confidence interval, the bootstrap t-confidence
interval and the bootstrap percentile confidence interval cover the
true value of Xnew at the nominal 95% rate
(2.5% noncoverage above and below Xnew).
However, when the assumptions are violated (e.g., nonconstant
variance), the classical t-confidence interval produces
nearly 100% coverage. This would indicate that the confidence
intervals constructed in this manner were excessively broad. Both the
bootstrap t-confidence interval and the percentile interval
came closer to 95% coverage (5% noncoverage) for most conditions of
sample size and distribution. Similar results were obtained for
Ynew = 31.25 and 125.00 (data not reported here).
|
|
|
| DISCUSSION |
|---|
|
|
|---|
new and the computation of
confidence intervals when the assumptions are violated relative to the
standard approach that uses the asymptotic estimator of the
standard error of
new.
When bootstrap analysis is used to estimate nutrient requirements, the
benefits of this method are not immediately obvious. When using the
asymptotic estimator, the estimation of the vitamin B-6 requirement
based on the weighted mean was between 1.72 and 1.92 mg/d (95%
confidence interval). When the bootstrap estimator was used, the
estimation of the vitamin B-6 requirement was between 1.75 and 2.04
mg/d. Thus, both methods produce very similar confidence intervals
(Table 1)
Evidence of the superiority of one estimator over another can
only be obtained through analytic, mathematical methods that are
generally not available or through Monte Carlo computer simulation. The
simulation presented in this paper for inverse regression clearly shows
the superiority of the bootstrap procedure compared to the asymptotic
methods presented in statistics textbooks (Kutner et al. 1996
), when the standard assumptions are violated. The mean of
the estimated standard error based on the bootstrap procedure is, in
nearly all simulations, either of the same magnitude or smaller and
closer to the true standard error than the mean of the estimated
standard error based on the asymptotic procedure. Furthermore, the
bootstrap procedure tends to produce confidence intervals that are
either of the same nominal 95% coverage rate as the standard methods
or closer to the nominal 95% coverage rate. The lack of confidence
interval coverage for a 95% confidence interval should be 2.5% in the
region below the confidence interval and 2.5% in the region above the
confidence interval. For the inverse regression problem, the classical
t-confidence interval based on the asymptotic standard
error is conservative when the assumption of constant variance has been
violated.
In summary, the simple beauty of the bootstrap procedure is that it can be applied in situations where point estimators, but not the methods for computing standard errors or confidence intervals, have been developed. The bootstrap is a procedure of relatively recent origin; its development was largely based on the availability of high-speed computers. Because of this recent derivation, the bootstrap has not found its way into all scientific disciplines. This conclusion appears to be true in the area of human nutrition, and this paper is an attempt to introduce the bootstrap within this discipline.
| APPENDIX A |
|---|
|
|
|---|
Let Xi (i = 1, 2, ... , n) be the observed
values for a random sample of size n taken from a population
with mean µ and standard deviation
. Let
(x) represent an
estimator of a parametric function,
(x) and se(
(x)) the
standard error of
(x). A simple example of this is the sample
mean x =
(x), which is an estimator of µ =
(x), the population mean. Then a (1-
) · 100% confidence
interval for
(x) is:
Classical t-confidence interval:
where df denotes the degrees of freedom.
Asymptotic estimator of the standard error for inverse regression
estimator
new:
Let
new be the estimated
value of the predictor X at a specified value of Y that is denoted
Ynew. Let
0 be the estimated intercept
and
1 the estimated slope of the regression line.
The asymptotic estimate of the standard error is:
![]() |
new and
sê(
new) are then used in
the classical t-confidence interval equation.
Bootstrap t-confidence interval:
Select B independent samples of size n, with
replacement, from the observed data. These samples may be denoted
x*1, x*2, ... , x*B, with each
x*b (b = 1, 2, ... , B) consisting of
n data values. The bootstrap estimated standard error for
(x) will be:
![]() |
and sê(
(x)) are then used in the
classical t-confidence interval equation.
Bootstrap percentile confidence interval:
Order the B bootstrap estimates from smallest to largest. Identify
the
and the
values of the ordered values. These values represent the lower and
upper limits for the (1-
) · 100% confidence interval.
| FOOTNOTES |
|---|
3 Deposited in the National Auxiliary Publication
Service. ![]()
Manuscript received April 15, 1999. Initial review completed May 11, 1999. Revision accepted June 23, 1999.
| REFERENCES |
|---|
|
|
|---|
1. Efron B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979;7:1-26
2. Efron B., Tibshirani R. J. An Introduction to the Bootstrap 1st ed. 1993 Chapman and Hall London, UK.
3. Huang Y. -C., Chen W., Evans M. A., Mitchell M. E., Shultz T. D. Vitamin B-6 requirement and status assessment of young women fed a high protein diet with various levels of vitamin B-6. Am. J. Clin. Nutr. 1998;67:208-220[Abstract]
4. Kutner M. H., Nachtschien C. J., Wasserman W., Neter J. Applied Linear Statistical Models 4th ed. 1996 Richard D. Irwin, Inc Homewood, IL.
5. Manly B.F.J. Randomization and Monte Carlo Methods in Biology 1st ed. 1991 Chapman and Hall London, UK.
This article has been cited by other articles:
![]() |
C. Guo and L. R. Wilkens Use of Bootstrap Procedure and Monte Carlo Simulation J. Nutr., October 1, 2000; 130(10): 2618 - 2618. [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||