PROJECT TOPIC ON STATISTICAL POWER OF HYPOTHESIS TESTING USING PARAMETRIC AND NONPARAMETRIC METHODS
Parametric and nonparametric techniques are two broad statistical methods for significance testing among continuous random variables. In this thesis, parametric and nonparametric techniques were utilized to test the power of the tests. The real-life data is simulated, generated from normal and exponential distribution. Two nonparametric tests and their parametric tests equivalents were carried out, they include; Wilcoxon Rank-Sum test and Kruskal-Wallis test as well as their parametric counterparts; independent t-Test and One-Way Anova respectively. The comparison is based on the voilation of assumption of nomality and homogenity of variance.
The tests were subjected to three cases depending on the sample sizes, n ≤ 30, and n ≥ 30 at α= 0.05, 0.01 and0.1 significance levels. It was observed from the analysis performed at n = 10 and n = 45 for Independentent T-test and Wilcoxon Rank-Sum test under the normal distribution that the power of the test are the same that is the two tests performed equally at all levels of significants, but at n=30 the two tests perfomed equally at α = 0.05 but at α= 0.01 and 0.1 the nonparametric is as powerful as the parametric.
Under the exponential distribution, the parametric test is more powerful at α = 0.05 and 0.1 for n = 45 and 30, but the nonparametric is more powerful for n=10, at α = 0.01 the three size performed differently. Also under the normal distrbution for more than two independent samples, for the three sample sizes at α= 0.05 and 0.1 and also at α= 0.01 for n= 45 and 10, the Parametric test is more powerful but for n=30 the nonparametric test is as powerful as the Parametric Test.
Under the exponential at the three levels for n= 45 and 30 the parametric test is more powerful but for n = 10 also at the three levels the nonparametric is more powerful. The power is also represented on bar chart. therefore the high chance of committing Type I orType II error is less when sample size is large and parametric test is more powerful.
1.1 BACKGROUND OF THE STUDY
Non parametric approaches are often utilized when the conditions for parametric approaches are not satisfied and in most cases when the scale of measurement is ordinal or nominal. Statistical procedure in which inferences are made about the population parameters are referred to as Parametric Statistics Cyprain (1990). Parametric approach follows certain assumptions which include samples that are randomly drawn from a normally distributed population,
- Consist of independent observations, except for paired values,
- Consist of values on an interval or ratio measurement scale,
- Have respective populations of approximately equal variances,
- Are adequately large, and
- Approximately resemble a normal distribution.
If any of the samples breaks one of these rules, then the assumptions of a parametric test are violated. The nature of the study might be changed to adhere to the rules. If an ordinal or nominal measurement scale is being used, the study might be redesigned to use an interval or ratio scale. Also, try to seek additional participants to enlarge the sample sizes. Unfortunately, there are times when one or neither of these changes is appropriate or even possible. There are three major parametric assumption, which are and will continue to be violated by researchers in health sciences; level of measurement, sample size and normal distribution of the dependent variable Pett (1992).
If samples do not resemble a normal distribution, you might have learned to modify them so that you can use the tests you know. There are several legitimate ways to modify your data, so you can use parametric tests. First, if the reasons can be justified, then the extreme values from samples called outlier might be removed. Second, you can apply a mathematical adjustment to each value in your samples called a transformation. That is you might square every value in a sample. Transformations do not always work, however. Third, there are more complicated methods that are so advanced. Fortunately, there is a family of statistical tests that does not demand all the parameters, or listed rules above. They are called nonparametric tests.
Roughly speaking, a nonparametric procedure is a statistical procedure that has certain desirable properties that hold under relatively mild assumptions regarding the underlying population from which the data are obtained. Although nonparametric assumptions do not require the stringent assumptions associated with their parametric counter paths this does not imply that they are assumptions free Pett (1992). The rapid and continuous development of nonparametric statistical procedures over the past six decades is due to some advantages. This informs the need to compare results between parametric and nonparametric statistical procedures for some certain results.
As related to experimental designs therefore, it is imperative to combine both parametric and nonparametric approaches in the test of hypothesis, because experimental results could produce both continuous and categorical variables (ordinal and nominal variables).
Statistical inference procedures enable researchers to determine, in terms of probability, whether the observed differences between sample data could easily occur by chance or not. If sample size is very small there may be no alternative to using a nonparametric statistical test. Siegel and Castellan (1988). However it is important to discuss the difference between parametric and non-parametric test because this leads to a major decision for the researcher– in choosing the appropriate test.
In the development of modern statistics the first methods developed made a lot of assumptions about the characteristics of the population from which the samples were drawn. That is, they made assumptions about the population parameters and the test is referred to as parametric tests. The most obvious assumption is that the data were randomly drawn from a normally distributed population.
Another assumption is that the data are randomly drawn from populations having the same variance. These assumptions make the general overriding assumption that the probability distribution of the population (from which the sample was drawn) is known in advance. The most common distribution assumed is the normal distribution. It has generally been argued that parametric statistics should not be applied to data with non- normal distributions empirical research has demonstrated that Wilcoxon Rank-Sum test generally has greater power than the T-test unless data are sampled from normal distribution Siegel and Castellan (1988).More recently, distribution free or non-parametric tests have been developed and subsequently commonly used. These tests do not trigger assumptions and in particular do not have the overriding assumption of a normally distributed population.
Sometimes the nonparametric procedures are simpler than their parametric counterparts. On the contrary, a primary criticism of using parametric methods in statistical analysis is that they oversimplify the population or process we are observing. Indeed parametric tests are not more useful because they are perfectly appropriate, rather because they are perfectly convenient.
However, even when the parametric assumptions hold perfectly true, we will see that nonparametric methods are only slightly less powerful than the more presumptuous statistical methods. Furthermore, if the parametric assumptions about the data fail to hold, only the non parametric method is valid. A t-test between the mean of two normal populations can be dangerously misleading if the underlying data are not actually normally distributed.
The other phase of statistical inference is hypothesis testing which some people feel is the more important aspect of statistical inference. Although the modern trend is to view testing statistical hypotheses from the point of decision theory, in this study we will view it from the classical point of view, having choice of accepting or rejecting a given hypothesis. Also, we will confine ourselves to two decision (action) problems. That is the `status quo’ hypothesis called the null hypothesis and is denoted by Ho. The hypothesis denoting the change is called the alternative hypothesis and is denoted by.
The hypothesis test comprises two mutually exclusive statement s, the alternative and the null hypotheses. The null hypothesis states the negative case, that „it is not true or there is no difference‟, and the alternative hypothesis states that „it is true or there is a
difference‟. The procedure involved is a scientific one that is founded in simple logic for the purpose of being both open and potentially repetitive (can be replicated by others). The following steps outline the hypothesis testing procedure:
- State the null ( H o ) and alternative ( H1 ) hypotheses.
- Decide whether parametric or non parametric.
- Choose a statistical test to test H0.
- Specify a significance level (alpha=α) or probability level for rejection of Ho.
- Determine the sample size (n).
- Assume (or find) the sampling distribution of the statistical test in 2.
- On the basis of 2, 3, 4 and 5 above, define the region of rejection of H o .
- Compute the value of the statistical test using the sample data.
- If the resultant value of the test is in the rejection area, reject H o .
- If the resultant value of the test is outside the rejection area H o is not rejected at the
Recommended : FINANCIAL STRATEGY AS SUPPORT DETERMINANT FOR THE AVOIDANCE AND RESOLUTION OF DISTRESS IN THE NIGERIAN BANKING INDUSTRY
Numerous advantages have been identified with non parametric procedures, such as: Tests are available for dealing with samples from different population; it usually depends on minimum assumptions and is thus less subject to improper use; they are the only alternative for small sample sizes unless the population distribution is known. In addition, computations are usually fast and easy to understand. However, a major disadvantage of the procedure is that, it is considered wasteful if all the assumptions for parametric test hold.
PROJECT TOPIC ON STATISTICAL POWER OF HYPOTHESIS TESTING USING PARAMETRIC AND NONPARAMETRIC METHODS
1.2 Power Efficiency
This is the amount of increase in sample size which is necessary to make a test B as powerful as test A.
The table below illustrates the relationship between type I error, type II error and power. Table1.1. Relationship between Type I Error and Type II Error
|Decision||Null Hypothesis||Alternative Hypothesis|
|Reject||Type I error, α= p(Type I||Correct decision|
|Accept/Do not Reject||Correct decision||Type II error ,|
|β=p(Type II error)|
Rather than referring to type II error, statisticians usually use 1- β which is called the power of a statistical test. The different probability values for β that occur when alternative is composite can be presented by a functional relationship known as power function.
Ways of increasing power.
- Increasing α will increase power but it also increase chance of a Type 1 error.
- Increasing sample size.
- Using ratio or interval data versus nominal or ordinal. Test involving ratio/ interval are called parametric test. Non-parametric test are also applied on data measured on ratio/ interval scale.
- Using repeated measures test such as the repeated measures t-test or ANOVA. By using the same subjects repeatedly, variability is reduced.
|5.||If variance are equal use pooled estimates of variance (e.g independent group t-|
- Increasing measurement precision, increase probability of finding a significant difference.
- Using sample from extremes of the distribution. Reduces generalizability of experiment reduces variability.
Noticeable differences between parametric and non parametric procedures are,
- In parametric approach, the conditions about population from which sample is taken are specified, while conditions for non parametric approach are fewer and weaker.
- Parametric procedures utilize information based on measurements, while in non parametric procedures; measurements are reduced to ranks and signs.
- Parametric procedure requires at least interval scale of measurement while non parametric procedure uses ordinal scale.
- Parametric procedures are usually based on mean, while non parametric procedures are usually based on median.
The classical parametric methods in analysis of variance, from one-way to multi-way tables, often suffer from sensitivity to the effects of non-normal data. The nonparametric methods are much more robust. In most cases, they mimic their parametric counterparts but focus on analyzing ranks instead of response measurements in the experimental outcome.
In practice, for a random variable X, parametric approach can be used if the following conditions are satisfied:
- The underlying distribution of the population is known.
- The underlying distribution satisfies the normality assumptions. That is;
- Error term is a random vector.
- Error term has mean zero.
- Constancy of error variance (Homoscedasticity).
- Error terms are uncorrelated. ( No serial or autocorrelation)
- The error terms are Normally Independent Identical Distribution (NIID) (0, σ²).
- The scale of measurement is at least interval.
In cases when these assumptions are not satisfied, the parametric method can be criticized regarding validity and optimality. Then the non parametric approach will be used. Non parametric methods based on ranks are valid for some broad family of underlying distributions. It is however often argued that except for a simple design such as matched pair or completely randomized design their powers tend to be low and their possibilities to detect different hypotheses are limited. For example in a randomized complete block design Friedman‟s test can be used to test for differences between the treatments. Since it is based on intra- block ranking, its sensitivity is low particularly if the number of observation for each block is small.
Non parametric tests are distribution free tests that do not depend on the distribution of the population from which the sample used is taken. It can be used when:
- The conditions for parametric test are not satisfied.
- Ordinal scale of measurement is used.
TABLE1.2: Comparison of Statistical Tests
|Single sample||Z-test ,T-test||Sign test, K-S test||Goodness of|
|Two independent||Z-test, T-test||Wilconxon Rank sum|
|sample||( Mann Whitney u)|
|Two dependent||Paired T-test||Paired sign|
|Two factors||Two- way ANOVA||Fredman test||Test|
|Comparison of||One- way ANOVA||Kruskal –Wallis test|
In modern statistical methods a good number of assumptions are made about the nature of population from which samples are drawn and data were collected. These statistical techniques are also known as parametric tests.
On the other hand, development of a large number of techniques of inferences which do not make stringent assumptions about the population from which samples are drawn is of great interest. These techniques are known as nonparametric statistical techniques.
The nonparametric methods make less of many assumptions which are made under parametric statistical tests; which makes them easy to apply. On the other hand, parametric methods are said to be more efficient in most cases. The motivation is to compare the results of these two techniques using the power of a test to examine the simplicity and efficiency between the techniques
1.4 Aim and Objectives of the Study
The study/research is aimed at investigating the Statistical Power of Hypothesis Testing using Parametric and Nonparametric Method with a view to achieving the following objectives to compare;
- The decision of parametric test with non parametric test when normal or exponential distribution is used for simulation.
- The power of parametric with non parametric approaches in test of hypotheses using small and large sample size.
- The consistency of the two approaches in hypothesis testing.
1.5 Significance of the Study
This research work is based on hypothesis test where comparisons of the class of test were subject to two cases which are the sample size n and n in order to know the relative consistency and power of the statistical techniques. At the end of the study the research will be a goldmine to statisticians, engineers, agricultural and researchers. It has relevance for statistical approaches which consider comparing different unique cases.
1.6 Scope of the Study
This thesis is focused on comparing parametric with non parametric approaches by analyzing two sample problems and completely randomized design (CRD) using Wilcoxon-Rank Sum and Kruskal-Wallis respectively. The parametric counterpart will be student‟s T- test and One-Way Analysis of Variance.
1.7 Source of Data
The data used for this research work were simulated data which were generated from the real life data which follows a normal distribution N(0,1) and exponential distribution The various tests were done using SPSS 20.