The Continuity Corrected Chi square Test Always Has 1 Degree of Freedom

  • Journal List
  • Am J Hum Genet
  • v.86(5); 2010 May 14
  • PMC2869018

Am J Hum Genet. 2010 May 14; 86(5): 813–818.

The Number of Markers in the HapMap Project: Some Notes on Chi-Square and Exact Tests for Hardy-Weinberg Equilibrium

Jan Graffelman

1Universitat Politècnica de Catalunya, Departament d'Estadística i Investigació Operativa, Avinguda Diagonal 647, 6th floor, 08028 Barcelona, Spain

To the Editor: Pearson's chi-square test was, until recently, the most widely used procedure for assessing Hardy-Weinberg equilibrium (HWE) in random samples of unrelated individuals.1–3 Over the last few years, however, Haldane's exact test for HWE has gained popularity. Procedures for testing HWE have been extensively investigated.4–9 Bayesian and other alternatives for the classical tests have also been proposed.10–13

A recent study14 compared type 1 error rates for the chi-square test with those of Haldane's exact test, and it reported above nominal type 1 error rates for the chi-square test and therefore recommended the exact test in all situations. However, in the comparison,14 Yates' continuity correction15,16 had apparently not been applied. In statistics, the continuity correction is widely accepted as a device for improving the accuracy of the results when working with discrete variables.17

The p value in an exact test is usually computed as the sum of the probabilities of all samples that are as extreme or more extreme than the current one.14 An alternative approach is to define the p value as twice the p value of a one-sided test. Because of the nonsymmetrical nature of the Levene-Haldane distribution of the number of heterozygotes given the allele frequency, the two definitions give different results. Yates16 advocated the use of a doubled one-tail probability as the p value for Fisher's exact test.

In the light of these remarks, a new comparison of the type 1 error rates for chi-square and exact procedures is needed, in which we consider the continuity correction and both definitions of the p value in an exact test. We briefly summarize both tests and compare their type 1 error rates below. The practical implications of using the various procedures are illustrated with HapMap data.18

The Pearson chi-square statistic for a test for HWE is given by:

in which the ni represents one of the three genotypic counts (nAA , nAB , and nBB ) and ei the respective expected value under HWE. This is a test for the goodness of fit of a multinomial distribution. Parameter c represents the continuity correction. Setting c = 0 gives the ordinary chi-square statistic, and setting c=1/2 gives the corrected chi-square statistic. The p value of the test is obtained by comparing the chi-square statistic with a chi-square distribution with one degree of freedom.

The exact test for HWE19–21 uses the conditional distribution of the number of heterozygotes, NAB , given the allele count NA , and is given by

P ( N A B = n A B | N A = n A ) = n ! n A ! n B ! 2 n A B ( 2 n ) ! n A B ! ( 1 2 ( n A n A B ) ) ! 1 2 ( ( n B n A B ) ) ! ,

in which nA and nB refer to the sample counts of a and b alleles. We will refer to this distribution as the Levene-Haldane distribution. Geneticists usually wish to perform a two-sided test, because there is no a priori reason to suppose that a SNP deviating from HWE will show a lack or an excess of heterozygotes. If there are reasons to expect a lack (e.g., inbreeding) or an excess (e.g., overdominance) then a one-sided test is needed. The p value of the exact test is usually calculated as the sum of the probabilities of all possible samples as extreme or more extreme than the observed sample, given the allele count of the observed sample. We refer to this p value as the selome p value ( s um e qually l ikely o r m ore e xtreme). An alternative is to define the p value as twice the one-sided tail area, and we will call this p value the dost p value ( d ouble o ne- s ided t ail). If the observed number of heterozygotes is below that expected under HWE, the dost p value is twice the sum of the probabilities of observing the number of heterozygotes in the sample or less. If it is above that expected, then the p value is twice the sum of the probabilities of observing the number of heterozygotes in the sample or more. We argue that dost p values are the most sensible p values, and we motivate this with the following example.

If we have a sample of n = 100 individuals, and if there are 93 copies of the minor allele, then the corresponding Levene-Haldane distribution of NAB |NA , given in the left panel of Figure 1, is a virtually symmetric distribution with expectation 50.005. Very low and very high heterozygote frequencies both constitute evidence against HWE. Because of the near-symmetric nature of the distribution in this case, it should be evident that observing 61 heterozygotes in a sample constitutes virtually as much evidence against HWE as observing 39 heterozygotes. Observing 61 heterozygotes or more has a probability of 0.021670, and observing 39 or fewer heterozygotes has nearly the same probability, 0.021674 (Table 1). Suppose we observed 61 heterozygotes. If we wish to perform a two-sided test, the obvious dost p value is 2 × 0.021670 = 0.04334. When we use the selome rule, the probability of observing a sample as extreme as 61 heterozygotes or more extreme is 0.04334, but the probability of observing a sample as extreme as 39 heterozygotes is different: 0.029243. For a symmetrical distribution, this difference seems absurd. The reason for this difference is that the probability of P(NAB = 61|NA = 93) = 0.014101 is omitted in the calculation of the latter p value, because it is slightly larger than P(NAB = 39|NA = 93) = 0.014076 (Table 1). On common-sense grounds, the practice of summing probabilities "as extreme or more extreme as those observed" seems mistaken. This is further exemplified by approximating the Levene-Haldane distribution with a normal distribution, as is done in the right panel of Figure 1. Under the approximating normal curve, the evidence against a null value of 50.005 is evidently twice the probability of exceeding 61, and this equals twice the probability of observing 39 or less. In practice, the discrete Levene-Haldane distribution is more asymmetric than in the example above, but it can often be well approximated by a normal curve. It is markedly asymmetric for extreme allele frequencies, but it can then be approximated by a normal curve after proper transformation. In short, doubling the one-sided tail area seems a more adequate way to compute the p value in Haldane's exact test and is much more in line with statistical procedures for continuous variables, as well as with the classical chi-square test, the latter also being essentially a two-sided test when considered as the square of an N(0, 1) variate. Table 1 also shows that selome p values are generally smaller than dost p values in both tails and therefore more easily lead to rejection of HWE.

An external file that holds a picture, illustration, etc.  Object name is gr1.jpg

Levene-Haldane Distribution of the Number of Heterozygotes for a Given Allele Count without and with Normal Approximation

Left panel: Levene-Haldane distribution for n = 100, nA = 93. Right panel: Levene-Haldane distribution with normal approximation.

Table 1

Sample Probabilities and p Values

nAA nAB nBB P(NAB|NA) P(NAB ≥ nAB) P(NAB ≤ nAB) pselome pdost
35 23 42 0.000000 1.000000 0.000000 0.000000 0.000000
34 25 41 0.000000 1.000000 0.000000 0.000001 0.000001
33 27 40 0.000003 1.000000 0.000003 0.000006 0.000007
32 29 39 0.000019 0.999997 0.000022 0.000044 0.000045
31 31 38 0.000102 0.999978 0.000124 0.000245 0.000249
30 33 37 0.000455 0.999876 0.000579 0.001148 0.001158
29 35 36 0.001697 0.999421 0.002277 0.004532 0.004553
28 37 35 0.005322 0.997723 0.007598 0.015168 0.015196
27 39 34 0.014076 0.992402 0.021674 0.029243 0.043348
26 41 33 0.031516 0.978326 0.053190 0.074861 0.106380
25 43 32 0.059891 0.946810 0.113081 0.166375 0.226163
24 45 31 0.096794 0.886919 0.209875 0.323289 0.419751
23 47 30 0.133237 0.790125 0.343113 0.553643 0.686225
22 49 29 0.156350 0.656887 0.499462 0.843528 0.998925
21 51 28 0.156472 0.500538 0.655935 1.000000 1.000000
20 53 27 0.133535 0.344065 0.789470 0.687178 0.688131
19 55 26 0.097117 0.210530 0.886586 0.420405 0.421060
18 57 25 0.060120 0.113414 0.946706 0.226495 0.226827
17 59 24 0.031623 0.053294 0.978330 0.106484 0.106588
16 61 23 0.014101 0.021670 0.992431 0.043344 0.043341
15 63 22 0.005314 0.007569 0.997745 0.009846 0.015139
14 65 21 0.001686 0.002255 0.999431 0.002835 0.004511
13 67 20 0.000448 0.000569 0.999879 0.000693 0.001138
12 69 19 0.000099 0.000121 0.999979 0.000143 0.000242
11 71 18 0.000018 0.000021 0.999997 0.000025 0.000043
10 73 17 0.000003 0.000003 1.000000 0.000004 0.000006
9 75 16 0.000000 0.000000 1.000000 0.000000 0.000001
8 77 15 0.000000 0.000000 1.000000 0.000000 0.000000

We compared the type 1 error rates of chi-square tests and exact tests with both types of p values. Type 1 error rates can be computed exactly by summing the probabilities of all genotypic compositions that pertain to the rejection region. We computed rejection rates for the same combinations of parameters used previously,14 with 100 or 1000 individuals and three significance levels (0.05, 0.01, and 0.001). Figure 2 shows the error rates for the exact test with dost p values, the chi-square test, and the chi-square test with continuity correction. dost p values in fact form the natural choice, because the tests being compared are now both actually two-tailed.

An external file that holds a picture, illustration, etc.  Object name is gr2.jpg

Type 1 Error Rates as a Function of Sample Size and α for Different Statistical Tests

Type 1 error rates for different sample sizes (100, 1000) and significance levels (0.05, 0.01, and 0.001) for the exact test (red), the chi-square test (blue), and the chi-square test with continuity correction (green).

Figure 2 shows the inflated type 1 error rates for the ordinary chi-square test (blue) in comparison with the exact test reported previously.14 However, the graph also shows that the continuity correction (green) effectively reduces this inflation, bringing the chi-square test into very close agreement with the exact test. The better agreement of the corrected chi-square test with the exact test has been noted before with numerical examples.22

The chi-square test with correction has highly inflated rates (100%) for very small minor allele counts. This is due to an edge effect of the continuity correction.23 This edge effect is easily avoided by using a cutoff for the continuity correction for low minor allele frequencies, as was done in Figure 2. The test with correction has a rejection rate that is mostly below the nominal level for α = 0.05 or 0.01. Often the test with correction is the closest to the nominal level. Results of HWE tests are often poorly reported in association studies.2 We add that it is typically not reported whether a continuity correction has been applied or not.

Figure 3 compares the error rates of the exact test for both definitions of the p value. Both tests have a rejection rate that is always below the nominal level. The selome rates are closer to the nominal level and are larger than or equal to the dost rates. When the distribution of the number of heterozygotes is asymmetric, the exact test that uses the selome p values is essentially a one-sided test, because all probabilities that contribute to the p value are in one tail of the distribution only. Evidently a one-tailed test has, as Figure 3 shows, better power, but this gain in power is irrelevant if one really needs a two-sided test. We therefore recommend the use of dost p values in the exact test for HWE.

An external file that holds a picture, illustration, etc.  Object name is gr3.jpg

DOST and SELOME Type 1 Error Rates as a Function of Sample Size and α

Type 1 error rates for different sample sizes (100, 1000) and significance levels (0.05, 0.01, and 0.001) for exact tests with selome p values (purple) and dost p values (red).

We use a HapMap database18 from chromosome 1 to illustrate the effects on marker admission of the choices made in chi-square and exact tests. The HapMap project currently uses the exact test for HWE with criterion p > 0.001 as a filter for the inclusion of a SNP in the database. This is based on the idea that strong deviations from HWE may be the result of genotyping error. Violation of HWE may, however, be due to many alternative explanations, such as selection, nonrandom mating, population substructure, and, not in the least, disease association.1,24 Several scholars25–27 have therefore argued that HWE tests should be performed but not used as a criterion for excluding markers prior to association study. We used the Han Chinese sample from Beijing (CHB), consisting of 45 unrelated individuals (phase II, NCBI build 35). This database contains 529,081 redundant, unfiltered markers. The database has three additional duplicate individuals, and many submitted SNPs are repeated. Of each repeated SNP, we selected the one which had the fewest missing values. Next, SNPs were filtered according to HapMap criteria,18,28,29 by eliminating SNPs that had more than one inconsistency over the three duplicates and by eliminating SNPs with more than 20% missing values. After filtering, the database consisted of 45 individuals typed for 337,746 SNPs. Of these, 42% were monomorphic, and 16.8% of the polymorphic SNPs had a minor allele frequency below 0.05. We analyzed this filtered database by using the four different tests for HWE described above. We used the R package30 HardyWeinberg (version 1.4) for the computation of all test results. Rejection rates for the different tests are given in Table 2.

Table 2

Rejection Rates for HWE Tests

Rejected (%)
HWE Test α = 0.05 α = 0.01 α = 0.001
χ2 4.73 2.77 1.87
χ2 c (with cutoff) 3.19 2.01 1.40
Exact (dost) 2.86 1.70 1.20
Exact (selome) 3.59 1.90 1.30

Table 2 shows that dost p values have the lowest rejection rate and form the most conservative approach to testing HWE. The ordinary chi-square test has the highest rejection rates, followed by exact selome and corrected chi-square. When the criterion for inclusion of a SNP is changed from selome to dost p values, an additional amount of 0.73% of the SNPs would be admitted at the 5% level, or 0.1% at the 0.1% level. These percentages look small, but genome-wide they correspond to a large amount of markers. With 3.1 million admitted SNPs genome-wide18 this corresponds roughly to minimally 22,630 additional SNPs admitted at the 5% level or minimally 3100 additional SNPs at the 0.1% level. In practice, the number of additionally admitted SNPs will be larger, because the number of unfiltered SNPs in the project is well over 3.1 million. We note that the HapMap database is an empirical database and that the rejection rates in Table 2 are therefore not expected to coincide with the theoretical levels of 5%, 1%, or 0.1%, the true number of markers out of HWE being unknown. We investigated the "newly admitted" markers in some detail. Figure 4 shows a ternary plot of the newly admitted markers without missing data (sample size 45). The plot shows the acceptance regions of the chi-square test with and without continuity correction and the acceptance regions of the exact test with the selome and the dost criterion, with α = 0.001. The zigzag lines for the exact tests connect samples for which the exact test is just significant. The newly admitted SNPs cover the whole range of allele frequencies and are typically around the boundary of the acceptance region of a corrected chi-square test for HWE. The exact test using the dost criterion has the largest acceptance region. Note that for some intermediate allele frequencies, equilibrium is rejected according to a selome exact test but accepted by a corrected chi-square. The ordinary chi-square test has the smallest acceptance region.

An external file that holds a picture, illustration, etc.  Object name is gr4.jpg

Ternary Plot of Extra Admitted Markers

Ternary plot of newly admitted markers without missing data. The black curve represents HWE. Acceptance regions of the chi-square test with and without correction (green and blue, respectively) and the exact tests with selome p values (purple) and dost p values (red) are shown. Green and red dots indicate nonsignificant and significant SNPs, respectively, for the corrected chi-square test with α = 0.001.

Constructing reliable SNP assays in the laboratory is expensive and time consuming. We have no sound statistical reasons to reject HWE for SNPs that have a significant selome p value but a nonsignificant dost p value. The logical consequence is to admit these markers to the HapMap project. This will increase the genomic coverage of the project, and, after all, these markers may be associated with disease.

Acknowledgments

This study was supported by grants SEC2003-04476 and CODA-RSS MTM2009-13272 of the Spanish Ministry of Education and Science. The author thanks the referees for their comments on the paper.

Web Resources

The URLs for data presented herein are as follows:

References

1. Wittke-Thompson J.K., Pluzhnikov A., Cox N.J. Rational inferences about departures from Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:967–986. [PMC free article] [PubMed] [Google Scholar]

2. Salanti G., Amountza G., Ntzani E.E., Ioannidis J.P.A. Hardy-Weinberg equilibrium in genetic association studies: an empirical evaluation of reporting, deviations, and power. Eur. J. Hum. Genet. 2005;13:840–848. [PubMed] [Google Scholar]

3. Yu C., Zhang S., Zhou C., Sile S. A likelihood ratio test of population Hardy-Weinberg equilibrium for case-control studies. Genet. Epidemiol. 2009;33:275–280. [PMC free article] [PubMed] [Google Scholar]

4. Elston R.C., Forthofer R. Testing for Hardy-Weinberg equilibrium in small samples. Biometrics. 1977;33:536–542. [Google Scholar]

5. Emigh T.H. A comparison of tests for Hardy-Weinberg equilibrium. Biometrics. 1980;36:627–642. [PubMed] [Google Scholar]

6. Hernández J.L., Weir B.S. A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics. 1989;45:53–70. [PubMed] [Google Scholar]

7. Guo S.W., Thompson E.A. Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics. 1992;48:361–372. [PubMed] [Google Scholar]

8. Gomes I., Collins A., Lonjou C., Thomas N.S., Wilkinson J., Watson M., Morton N. Hardy-Weinberg quality control. Ann. Hum. Genet. 1999;63:535–538. [PubMed] [Google Scholar]

9. Cox D.G., Kraft P. Quantification of the power of Hardy-Weinberg equilibrium testing to detect genotyping error. Hum. Hered. 2006;61:10–14. [PubMed] [Google Scholar]

10. Lindley D.V. Statistical inference concerning Hardy-Weinberg equilibrium. In: Bernardo J.M., DeGroot M.H., Lindley D.V., Smith A.F.M., editors. Bayesian Statistics, 3. Oxford University Press; Oxford, UK: 1988. pp. 307–326. [Google Scholar]

11. Montoya-Delgado L.E., Irony T.Z., de B Pereira C.A., Whittle M.R. An unconditional exact test for the Hardy-Weinberg equilibrium law: sample-space ordering using the Bayes factor. Genetics. 2001;158:875–883. [PMC free article] [PubMed] [Google Scholar]

12. Wellek S. Tests for establishing compatibility of an observed genotype distribution with Hardy-Weinberg equilibrium in the case of a biallelic locus. Biometrics. 2004;60:694–703. [PubMed] [Google Scholar]

13. Pereira C.A.B., Nakano F., Stern J.M., Whittle M.R. Genuine Bayesian multiallelic significance test for the Hardy-Weinberg equilibrium law. Genet. Mol. Res. 2006;5:619–631. [PubMed] [Google Scholar]

14. Wigginton J.E., Cutler D.J., Abecasis G.R. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:887–893. [PMC free article] [PubMed] [Google Scholar]

15. Yates F. Contingency tables involving small numbers and the χ 2 test. Journal of the Royal Statistical Society (Supplement) 1934;1:217–235. [Google Scholar]

16. Yates F. Tests of significance for 2 × 2 contingency tables. J. R. Stat. Soc. [Ser A] 1984;147:426–463. [Google Scholar]

17. Fleiss J.L. Second Edition. John Wiley & Sons; New York: 1981. Statistical methods for rates and proportions. [Google Scholar]

18. Frazer K.A., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A., Belmont J.W., Boudreau A., Hardenbol P., Leal S.M., International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed] [Google Scholar]

19. Haldane J.B.S. An exact test for randomness of mating. J. Genet. 1954;52:631–635. [Google Scholar]

20. Levene H. On a matching problem arising in genetics. Ann. Math. Stat. 1949;20:91–94. [Google Scholar]

21. Weir B.S. Sinauer Associates; Massachusetts: 1996. Genetic Data Analysis II. [Google Scholar]

22. Rohlfs R.V., Weir B.S. Distributions of Hardy-Weinberg equilibrium test statistics. Genetics. 2008;180:1609–1616. [PMC free article] [PubMed] [Google Scholar]

23. Graffelman J., Camarena J.M. Graphical tests for Hardy-Weinberg equilibrium based on the ternary plot. Hum. Hered. 2008;65:77–84. [PubMed] [Google Scholar]

24. Li M., Li C. Assessing departure from Hardy-Weinberg equilibrium in the presence of disease association. Genet. Epidemiol. 2008;32:589–599. [PubMed] [Google Scholar]

25. Fardo D.W., Becker K.D., Bertram L., Tanzi R.E., Lange C. Recovering unused information in genome-wide association studies: the benefit of analyzing SNPs out of Hardy-Weinberg equilibrium. Eur. J. Hum. Genet. 2009;17:1676–1682. [PMC free article] [PubMed] [Google Scholar]

26. Minelli C., Thompson J.R., Abrams K.R., Thakkinstian A., Attia J. How should we use information about HWE in the meta-analyses of genetic association studies? Int. J. Epidemiol. 2008;37:136–146. [PubMed] [Google Scholar]

27. Zou G.Y., Donner A. The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched case-control data: a cautionary note. Ann. Hum. Genet. 2006;70:923–933. [PubMed] [Google Scholar]

28. International HapMap Consortium The international hapmap project. Nature. 2003;426:789–796. [PubMed] [Google Scholar]

29. International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299–1320. [PMC free article] [PubMed] [Google Scholar]

30. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2004. URL http://www.R-project.org. ISBN 3-900051-00-3.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics


jerezaccur1968.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2869018/

0 Response to "The Continuity Corrected Chi square Test Always Has 1 Degree of Freedom"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel