Introduction to Estimation
One area of concern in inferential statistics is the estimation of the population parameter from the sample statistic. It is important to realize the order here. The sample statistic is calculated from the sample data and the population parameter is inferred (or estimated) from this sample statistic. Let me say that again: Statistics are calculated, parameters are estimated.
We talked about problems of obtaining the value of the parameter earlier in the course when we talked about sampling techniques.
Another area of inferential statistics is sample size determination. That is, how large of a sample should be taken to make an accurate estimation. In these cases, the statistics can't be used since the sample hasn't been taken yet.
Point Estimates
There are two types of estimates we will find: Point Estimates and Interval Estimates. The point estimate is the single best value.A good estimator must satisfy three conditions:
- Unbiased: The expected value of the estimator must be equal to the mean of the parameter
- Consistent: The value of the estimator approaches the value of the parameter as the sample size increases
- Relatively Efficient: The estimator has the smallest variance of all estimators which could be used
Confidence Intervals
The point estimate is going to be different from the population parameter because due to the sampling error, and there is no way to know who close it is to the actual parameter. For this reason, statisticians like to give an interval estimate which is a range of values used to estimate the parameter.A confidence interval is an interval estimate with a specific level of confidence. A level of confidence is the probability that the interval estimate will contain the parameter. The level of confidence is 1 - alpha. 1-alpha area lies within the confidence interval.
Maximum Error of the Estimate
The maximum error of the estimate is denoted by E and is one-half the width of the confidence interval. The basic confidence interval for a symmetric distribution is set up to be the point estimate minus the maximum error of the estimate is less than the true population parameter which is less than the point estimate plus the maximum error of the estimate. This formula will work for means and proportions because they will use the Z or T distributions which are symmetric. Later, we will talk about variances, which don't use a symmetric distribution, and the formula will be different.Area in Tails
Since the level of confidence is 1-alpha, the amount in the tails is alpha. There is a notation in statistics which means the score which has the specified area in the right tail.Examples:
- Z(0.05) = 1.645 (the Z-score which has 0.05 to the right, and 0.4500 between 0 and it)
- Z(0.10) = 1.282 (the Z-score which has 0.10 to the right, and 0.4000 between 0 and it).
Here are some common values
Confidence Level | Area between 0 and z-score | Area in one tail (alpha/2) | z-score |
50% | 0.2500 | 0.2500 | 0.674 (0.6745) |
80% | 0.4000 | 0.1000 | 1.282 (1.2816) |
90% | 0.4500 | 0.0500 | 1.645 (1.6449) |
95% | 0.4750 | 0.0250 | 1.960 (1.9600) |
98% | 0.4900 | 0.0100 | 2.326 (2.3263) |
99% | 0.4950 | 0.0050 | 2.576 (2.5758) |
Also notice - if you look at the student's t distribution, the top row is a level of confidence, and the bottom row is the z-score. In fact, this is where I got the extra digit of accuracy from. The value in parentheses comes from another table I have with 4 decimal places, your textbook only has three. Note that there are several mistakes in the Triola text.
Triola's text has the following values:
- 2.575 instead of 2.576
- 2.327 instead of 2.326
- 0.675 instead of 0.674 (even though 0.6745 would round to be 0.675, the actual value is closer to 0.6744897495, which would round to be 0.674).
Now, to Triola's defense, the values that he has are less like to cause rejection of the null hypothesis in error, and that is the technique used when assigning critical values when the exact value is unknown.
No comments:
Post a Comment