Hypothesis Testing - Chi Squared Test

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introductory word scramble

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.  

The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.

Learning Objectives

After completing this module, the student will be able to:

  • Perform chi-square tests by hand
  • Appropriately interpret results of chi-square tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

Tests with One Sample, Discrete Outcome

Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete variables are variables that take on more than two distinct responses or categories and the responses can be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes and the objective is to compare the distribution of responses, or the proportions of participants in each response category, to a known distribution. The known distribution is derived from another study or report and it is again important in setting up the hypotheses that the comparator distribution specified in the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical control.   

In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the proportions of participants in each response

Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0

We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1. In the test statistic, O = observed frequency and E=expected frequency in each of the response categories. The observed frequencies are those observed in the sample and the expected frequencies are computed as described below. χ 2 (chi-square) is another probability distribution and ranges from 0 to ∞. The test above statistic formula above is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories.  

When we conduct a χ 2 test, we compare the observed frequencies in each response category to the frequencies we would expect if the null hypothesis were true. These expected frequencies are determined by allocating the sample to the response categories according to the distribution specified in H 0 . This is done by multiplying the observed sample size (n) by the proportions specified in the null hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test statistic above, we need to ensure that the following: min(np 10 , n p 20 , ..., n p k0 ) > 5.  

The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess whether the distribution of responses follows a known distribution, is called the χ 2 goodness-of-fit test. As the name indicates, the idea is to assess whether the pattern or distribution of responses in the sample "fits" a specified population (external or historical) distribution. In the next example we illustrate the test. As we work through the example, we provide additional details related to the use of this new test statistic.  

A University conducted a survey of its recent graduates to collect demographic and health information for future planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked the same questions. The survey was completed by 470 graduates and the following data were collected on the exercise question:

 

Number of Students

255

125

90

470

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.

In this example, we have one sample and a discrete (ordinal) outcome variable (with three response options). We specifically want to compare the distribution of responses in the sample to the distribution reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise, respectively). We now run the test using the five-step approach.  

  • Step 1. Set up hypotheses and determine level of significance.

The null hypothesis again represents the "no change" or "no difference" situation. If the health promotion campaign has no impact then we expect the distribution of responses to the exercise question to be the same as that measured prior to the implementation of the program.

H 0 : p 1 =0.60, p 2 =0.25, p 3 =0.15,  or equivalently H 0 : Distribution of responses is 0.60, 0.25, 0.15  

H 1 :   H 0 is false.          α =0.05

Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis as stated captures any difference in the distribution of responses from that specified in the null hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the sample data "fit" the distribution in H 0 or not. With the χ 2 goodness-of-fit test there is no upper or lower tailed version of the test.

  • Step 2. Select the appropriate test statistic.  

The test statistic is:

We must first assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are 0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The sample size is more than adequate so the formula can be used.

  • Step 3. Set up decision rule.  

The decision rule for the χ 2 test depends on the level of significance and the degrees of freedom, defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. Critical values can be found in a table of probabilities for the χ 2 distribution. Here we have df=k-1=3-1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as follows: Reject H 0 if χ 2 > 5.99.

  • Step 4. Compute the test statistic.  

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) and the expected frequencies into the formula for the test statistic identified in Step 2. The computations can be organized as follows.

   

255

125

90

470

470(0.60)

=282

470(0.25)

=117.5

470(0.15)

=70.5

470

Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:

  • Step 5. Conclusion.  

We reject H 0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H 0 is false, or that the distribution of responses is not 0.60, 0.25, 0.15.  The p-value is p < 0.005.  

In the χ 2 goodness-of-fit test, we conclude that either the distribution specified in H 0 is false (when we reject H 0 ) or that we do not have sufficient evidence to show that the distribution specified in H 0 is false (when we fail to reject H 0 ). Here, we reject H 0 and concluded that the distribution of responses to the exercise question following the implementation of the health promotion campaign was not the same as the distribution prior. The test itself does not provide details of how the distribution has shifted. A comparison of the observed and expected frequencies will provide some insight into the shift (when the null hypothesis is rejected). Does it appear that the health promotion campaign was effective?  

Consider the following: 

 

255

125

90

470

282

117.5

70.5

470

If the null hypothesis were true (i.e., no change from the prior year) we would have expected more students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories. In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise. Thus, there is a shift toward more regular exercise following the implementation of the health promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is there room for improvement?

The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we created the BMI categories as defined and observed the following:

 

30

20

932

1374

1000

3326

  • Step 1.  Set up hypotheses and determine level of significance.

H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23     or equivalently

H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23

H 1 :   H 0 is false.        α=0.05

The formula for the test statistic is:

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02, 0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1, 1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.

Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and the decision rule is as follows: Reject H 0 if χ 2 > 7.81.

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test statistic identified in Step 2. We organize the computations in the following table.

 

30

20

932

1374

1000

3326

66.5

1297.1

1197.4

765.0

3326

The test statistic is computed as follows:

We reject H 0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that H 0 is false or that the distribution of BMI in Framingham is different from the national data reported in 2002, p < 0.005.  

Again, the χ 2   goodness-of-fit test allows us to assess whether the distribution of responses "fits" a specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is different from the national distribution. To understand the nature of the difference we can compare observed and expected frequencies or observed and expected proportions (or percentages). The frequencies are large because of the large sample size, the observed percentages of patients in the Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30% obese. In the Framingham Offspring sample there are higher percentages of overweight and obese persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to 2% and 39% in the national data). Are these meaningful differences?

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable in a single population. We presented a test using a test statistic Z to test whether an observed (sample) proportion differed significantly from a historical or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square goodness-of-fit test.

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

We presented the following approach to the test using a Z statistic. 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : p = 0.75

H 1 : p ≠ 0.75                               α=0.05

We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following formula can be used

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample proportion is:

null hypothesis for chi squared test

We reject H 0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001).  

We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our sample data as follows:

 

Saw a Dentist

in Past 12 Months

Did Not See a Dentist

in Past 12 Months

Total

# of Participants

64

61

125

H 0 : p 1 =0.75, p 2 =0.25     or equivalently H 0 : Distribution of responses is 0.75, 0.25 

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ...,np k >) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are 0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than adequate so the formula can be used.

Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and the decision rule is as follows: Reject H 0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

 

64

61

125

93.75

31.25

125

(Note that (-6.15) 2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown above.)

We reject H 0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data.  (p < 0.0001). This is the same conclusion we reached when we conducted the test using the Z test above. With a dichotomous outcome, Z 2 = χ 2 !   In statistics, there are often several approaches that can be used to test hypotheses. 

Tests for Two or More Independent Samples, Discrete Outcome

Here we extend that application of the chi-square test to the case with two or more independent comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical). We now consider the situation where there are two or more independent comparison groups and the goal of the analysis is to compare the distribution of responses to the discrete outcome variable among several independent comparison groups.  

The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are independent (hence the name of the test). Independence here implies homogeneity in the distribution of the outcome among comparison groups.    

The null hypothesis in the χ 2 test of independence is often stated in words as: H 0 : The distribution of the outcome is independent of the groups. The alternative or research hypothesis is that there is a difference in the distribution of responses to the outcome variable among the comparison groups (i.e., that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure the discrete outcome variable in each participant in each comparison group. The data of interest are the observed frequencies (or number of participants in each response category in each group). The formula for the test statistic for the χ 2 test of independence is given below.

Test Statistic for Testing H 0 : Distribution of outcome is independent of groups

and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-1).

Here O = observed frequency, E=expected frequency in each of the response categories in each group, r = the number of rows in the two-way table and c = the number of columns in the two-way table.   r and c correspond to the number of comparison groups and the number of response options in the outcome (see below for more details). The observed frequencies are the sample data and the expected frequencies are computed as described below. The test statistic is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories in each group.  

The data for the χ 2 test of independence are organized in a two-way table. The outcome and grouping variable are shown in the rows and columns of the table. The sample table below illustrates the data layout. The table entries (blank below) are the numbers of participants in each group responding to each response category of the outcome variable.

Table - Possible outcomes are are listed in the columns; The groups being compared are listed in rows.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N

In the table above, the grouping variable is shown in the rows of the table; r denotes the number of independent groups. The outcome variable is shown in the columns of the table; c denotes the number of response options in the outcome variable. Each combination of a row (group) and column (response) is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X 5 table. The row and column totals are shown along the right-hand margin and the bottom of the table, respectively. The total sample size, N, can be computed by summing the row totals or the column totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size in the analysis. The sample data can be organized into a table like the above. The numbers of participants within each group who select each response option are shown in the cells of the table and these are the observed frequencies used in the test statistic.

The test statistic for the χ 2 test of independence involves comparing observed (sample data) and expected frequencies in each cell of the table. The expected frequencies are computed assuming that the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and the outcome) are independent. The definition of independence is as follows:

 Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).

The second statement indicates that if two events, A and B, are independent then the probability of their intersection can be computed by multiplying the probability of each individual event. To conduct the χ 2 test of independence, we need to compute expected frequencies in each cell of the table. Expected frequencies are computed by assuming that the grouping variable and outcome are independent (i.e., under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

 The above states that the probability that an individual is in Group 1 and their outcome is Response Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a person is in Response Option 1. To conduct the χ 2 test of independence, we need expected frequencies and not expected probabilities . To convert the above probability to a frequency, we multiply by N. Consider the following small example.

 

10

8

7

25

22

15

13

50

30

28

17

75

62

51

37

150

The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table are the observed frequencies. If Group and Response are independent, then we can compute the probability that a person in the sample is in Group 1 and Response category 1 using:

P(Group 1 and Response 1) = P(Group 1) P(Response 1),

P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.

Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4.   We could do the same for Group 2 and Response 1:

P(Group 2 and Response 1) = P(Group 2) P(Response 1),

P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.

The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.

Thus, the formula for determining the expected cell frequencies in the χ 2 test of independence is as follows:

Expected Cell Frequency = (Row Total * Column Total)/N.

The above computes the expected frequency in one step rather than computing the expected probability first and then converting to a frequency.  

In a prior example we evaluated data from a survey of university graduates which assessed, among other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the χ 2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question following the implementation of a health promotion campaign on campus. We specifically considered one sample (all students) and compared the observed distribution to the distribution of responses the prior year (a historical control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students' living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the university). The data are shown below.

 

32

30

28

90

74

64

42

180

110

25

15

150

39

6

5

50

255

125

90

470

Based on the data, is there a relationship between exercise and student's living arrangement? Do you think where a person lives affect their exercise status? Here we have four independent comparison groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We specifically want to test whether living arrangement and exercise are independent. We will run the test using the five-step approach.  

H 0 : Living arrangement and exercise are independent

H 1 : H 0 is false.                α=0.05

The null and research hypotheses are written in words rather than in symbols. The research hypothesis is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or related.   

  • Step 2.  Select the appropriate test statistic.  

The condition for appropriate use of the above test statistic is that each expected frequency is at least 5. In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.

The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows and columns in the two-way data table.   The row variable is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ 2 tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. The rejection region for the χ 2 test of independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H 0 if c 2 > 12.59.

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency.   The expected frequencies are shown in parentheses.

 

32

(48.8)

30

(23.9)

28

(17.2)

90

74

(97.7)

64

(47.9)

42

(34.5)

180

110

(81.4)

25

(39.9)

15

(28.7)

150

39

(27.1)

6

(13.3)

5

(9.6)

50

255

125

90

470

Notice that the expected frequencies are taken to one decimal place and that the sums of the observed frequencies are equal to the sums of the expected frequencies in each row and column of the table.  

Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is appropriate to use the test statistic.

We reject H 0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that H 0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or related), p < 0.005.  

Again, the χ 2 test of independence is used to test whether the distribution of the outcome variable is similar across the comparison groups. Here we rejected H 0 and concluded that the distribution of exercise is not independent of living arrangement, or that there is a relationship between living arrangement and exercise. The test provides an overall assessment of statistical significance. When the null hypothesis is rejected, it is important to review the sample data to understand the nature of the relationship. Consider again the sample data. 

Because there are different numbers of students in each living situation, it makes the comparisons of exercise patterns difficult on the basis of the frequencies alone. The following table displays the percentages of students in each exercise category by living arrangement. The percentages sum to 100% in each row of the table. For comparison purposes, percentages are also shown for the total sample along the bottom row of the table.

36%

33%

31%

41%

36%

23%

73%

17%

10%

78%

12%

10%

54%

27%

19%

From the above, it is clear that higher percentages of students living in dormitories and in on-campus apartments reported regular exercise (31% and 23%) as compared to students living in off-campus apartments and at home (10% each).  

Test Yourself

 Pancreaticoduodenectomy (PD) is a procedure that is associated with considerable morbidity. A study was recently conducted on 553 patients who had a successful PD between January 2000 and December 2010 to determine whether their Surgical Apgar Score (SAS) is related to 30-day perioperative morbidity and mortality. The table below gives the number of patients experiencing no, minor, or major morbidity by SAS category.  

0-4

21

20

16

5-6

135

71

35

7-10

158

62

35

Question: What would be an appropriate statistical test to examine whether there is an association between Surgical Apgar Score and patient outcome? Using 14.13 as the value of the test statistic for these data, carry out the appropriate test at a 5% level of significance. Show all parts of your test.

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable and two independent comparison groups. We presented a test using a test statistic Z to test for equality of independent proportions. The chi-square test of independence can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square test of independence.

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

50

23

0.46

50

11

0.22

We tested whether there was a significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows. 

H 0 : p 1 = p 2    

H 1 : p 1 ≠ p 2                             α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group or that:

In this example, we have

Therefore, the sample size is adequate, so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5.  Conclusion.  

We now conduct the same test using the chi-square test of independence.  

H 0 : Treatment and outcome (meaningful reduction in pain) are independent

H 1 :   H 0 is false.         α=0.05

The formula for the test statistic is:  

For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and the decision rule is as follows: Reject H0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

We now compute the expected frequencies using:

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.

23

(17.0)

27

(33.0)

50

11

(17.0)

39

(33.0)

50

34

66

100

A condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to use the test statistic.

(Note that (2.53) 2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown above.)

Chi-Squared Tests in R

The video below by Mike Marin demonstrates how to perform chi-squared tests in the R programming language.

Answer to Problem on Pancreaticoduodenectomy and Surgical Apgar Scores

We have 3 independent comparison groups (Surgical Apgar Score) and a categorical outcome variable (morbidity/mortality). We can run a Chi-Squared test of independence.

H 0 : Apgar scores and patient outcome are independent of one another.

H A : Apgar scores and patient outcome are not independent.

Chi-squared = 14.3

Since 14.3 is greater than 9.49, we reject H 0.

There is an association between Apgar scores and patient outcome. The lowest Apgar score group (0 to 4) experienced the highest percentage of major morbidity or mortality (16 out of 57=28%) compared to the other Apgar score groups.

11.3 - Chi-Square Test of Independence

The chi-square (\(\chi^2\)) test of independence is used to test for a relationship between two categorical variables. Recall that if two categorical variables are independent, then \(P(A) = P(A \mid B)\). The chi-square test of independence uses this fact to compute expected values for the cells in a two-way contingency table under the assumption that the two variables are independent (i.e., the null hypothesis is true).

Even if two variables are independent in the population, samples will vary due to random sampling variation. The chi-square test is used to determine if there is convincing evidence that the two variables are not independent in the population using the same hypothesis testing logic that we used with one mean, one proportion, etc.

Again, we will be using the five-step hypothesis testing procedure:

The assumptions are that the sample is randomly drawn from the population and that all expected values are at least 5 (we will see what expected values are later).

Our hypotheses are:

     \(H_0:\) There is not a relationship between the two variables in the population (they are independent)

     \(H_a:\) There is a relationship between the two variables in the population (they are dependent)

Note: When you're writing the hypotheses for a given scenario, use the names of the variables, not the generic "two variables."

Chi-Square Test Statistic

\(\chi^2=\sum \dfrac{(Observed-Expected)^2}{Expected}\)

Expected Cell Value

\(E=\dfrac{row\;total \; \times \; column\;total}{n}\)

The p-value can be found using Minitab. Look up the area to the right of your chi-square test statistic on a chi-square distribution with the correct degrees of freedom. Chi-square tests are always right-tailed tests. 

Degrees of Freedom: Chi-Square Test of Independence

\(df=(number\;of\;rows-1)(number\;of\;columns-1)\)

If \(p \leq \alpha\) reject the null hypothesis.

If \(p>\alpha\) fail to reject the null hypothesis.

Write a conclusion in terms of the original research question.

11.3.1 - Example: Gender and Online Learning

Gender and online learning.

A sample of 314 Penn State students was asked if they have ever taken an online course. Their genders were also recorded. The contingency table below was constructed. Use a chi-square test of independence to determine if there is a relationship between gender and whether or not someone has taken an online course.

  Have you taken an online course?
  Yes No
Men 43 63
Women 95 113

\(H_0:\) There is not a relationship between gender and whether or not someone has taken an online course (they are independent)

\(H_a:\) There is a relationship between gender and whether or not someone has taken an online course (they are dependent)

Looking ahead to our calculations of the expected values, we can see that all expected values are at least 5. This means that the sampling distribution can be approximated using the \(\chi^2\) distribution. 

In order to compute the chi-square test statistic we must know the observed and expected values for each cell. We are given the observed values in the table above. We must compute the expected values. The table below includes the row and column totals.

  Have you taken an online course?  
  Yes No  
Men 43 63 106
Women 95 113 208
  138 176 314
\(E=\dfrac{row\;total \times column\;total}{n}\)
\(E_{Men,\;Yes}=\dfrac{106\times138}{314}=46.586\)
\(E_{Men,\;No}=\dfrac{106\times176}{314}=59.414\)
\(E_{Women,\;Yes}=\dfrac{208\times138}{314}=91.414\)
\(E_{Women,\;No}=\dfrac{208 \times 176}{314}=116.586\)

Note that all expected values are at least 5, thus this assumption of the \(\chi^2\) test of independence has been met. 

Observed and expected counts are often presented together in a contingency table. In the table below, expected values are presented in parentheses.

  Have you taken an online course?  
  Yes No  
Men 43 (46.586) 63 (59.414) 106
Women 95 (91.414) 113 (116.586) 208
  138 176 314

\(\chi^2=\sum \dfrac{(O-E)^2}{E} \)

\(\chi^2=\dfrac{(43-46.586)^2}{46.586}+\dfrac{(63-59.414)^2}{59.414}+\dfrac{(95-91.414)^2}{91.414}+\dfrac{(113-116.586)^2}{116.586}=0.276+0.216+0.141+0.110=0.743\)

The chi-square test statistic is 0.743

\(df=(number\;of\;rows-1)(number\;of\;columns-1)=(2-1)(2-1)=1\)

We can determine the p-value by constructing a chi-square distribution plot with 1 degree of freedom and finding the area to the right of 0.743.

Distribution Plot - Chi-Square, DF=1

\(p = 0.388702\)

\(p>\alpha\), therefore we fail to reject the null hypothesis.

There is not enough evidence to conclude that gender and whether or not an individual has completed an online course are related.

Note that we cannot say for sure that these two categorical variables are independent, we can only say that we do not have enough evidence that they are dependent.

11.3.2 - Minitab: Test of Independence

Raw vs summarized data.

If you have a data file with the responses for individual cases then you have "raw data" and can follow the directions below. If you have a table filled with data, then you have "summarized data." There is an example of conducting a chi-square test of independence using summarized data on a later page. After data entry the procedure is the same for both data entry methods.

Minitab ®  – Chi-square Test Using Raw Data

Research question : Is there a relationship between where a student sits in class and whether they have ever cheated?

  • Null hypothesis : Seat location and cheating are not related in the population. 
  • Alternative hypothesis : Seat location and cheating are related in the population.

To perform a chi-square test of independence in Minitab using raw data:

  • Open Minitab file: class_survey.mpx
  • Select Stat > Tables > Chi-Square Test for Association
  • Select Raw data (categorical variables) from the dropdown.
  • Choose the variable  Seating  to insert it into the  Rows  box
  • Choose the variable  Ever_Cheat  to insert it into the  Columns  box
  • Click the Statistics button and check the boxes  Chi-square test for association  and  Expected cell counts
  • Click  OK and OK

This should result in the following output:

Rows: Seating Columns: Ever_Cheat

  No Yes All
Back 24 8 32
  24.21 7.79  
Front 38 8 46
  34.81 11.19  
Middle 109 39 148
  111.98 36.02  
All 1714 55 226

Chi-Square Test

  Chi-Square DF P-Value
Pearson 1.539 2 0.463
Likelihood Ratio 1.626 2 0.443

All expected values are at least 5 so we can use the Pearson chi-square test statistic. Our results are \(\chi^2 (2) = 1.539\). \(p = 0.463\). Because our \(p\) value is greater than the standard alpha level of 0.05, we fail to reject the null hypothesis. There is not enough evidence of a relationship in the population between seat location and whether a student has cheated.

11.3.2.1 - Example: Raw Data

Example: dog & cat ownership.

Is there a relationship between dog and cat ownership in the population of all World Campus STAT 200 students? Let's conduct an hypothesis test using the dataset: fall2016stdata.mpx

 \(H_0:\) There is not a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students \(H_a:\) There is a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students

Assumption: All expected counts are at least 5. The expected counts here are 176.02, 75.98, 189.98, and 82.02, so this assumption has been met.

Let's use Minitab to calculate the test statistic and p-value.

  • After entering the data, select Stat > Tables > Cross Tabulation and Chi-Square
  • Enter Dog in the Rows box
  • Enter Cat in the Columns box
  • Select the Chi-Square button and in the new window check the box for the Chi-square test and Expected cell counts

Rows: Dog Columns: Cat

  No Yes All
No 183 69 252
  176.02 75.98  
Yes 183 89 272
  189.98 82.02  
Missing 1 0  
All 366 158 524
  Chi-Square DF P-Value
Pearson 1.771 1 0.183
Likelihood Ratio 1.775 1 0.183

Since the assumption was met in step 1, we can use the Pearson chi-square test statistic.

\(Pearson\;\chi^2 = 1.771\)

\(p = 0.183\)

Our p value is greater than the standard 0.05 alpha level, so we fail to reject the null hypothesis.

There is not enough evidence of a relationship between dog ownership and cat ownership in the population of all World Campus STAT 200 students.

11.3.2.2 - Example: Summarized Data

Example: coffee and tea preference.

Is there a relationship between liking tea and liking coffee?

The following table shows data collected from a random sample of 100 adults. Each were asked if they liked coffee (yes or no) and if they liked tea (yes or no).

  

Likes Coffee

  

Yes

No

Likes Tea

Yes

30

25

No

10

35

Let's use the 5 step hypothesis testing procedure to address this research question.

 \(H_0:\) Liking coffee an liking tea are not related (i.e., independent) in the population \(H_a:\) Liking coffee and liking tea are related (i.e., dependent) in the population

Assumption: All expected counts are at least 5.

Enter the table into a Minitab worksheet as shown below:

 

C1

C2

C3

 

Likes Tea

Likes Coffee-Yes

Likes Coffee-No

1

Yes

30

25

2

No

10

35

  • Select Stat > Tables > Cross Tabulation and Chi-Square
  • Select Summarized data in a two-way table from the dropdown
  • Enter the columns Likes Coffee-Yes and Likes Coffee-No in the Columns containing the table box
  • For the row labels enter Likes Tea (leave the column labels blank)
  • Select the Chi-Square button and check the boxes for Chi-square test and Expected cell counts .

Rows: Likes Tea  Columns: Worksheet columns

 

No

Yes

All

Yes

30

25

55

 

22

33

 

No

10

35

45

 

18

27

 

All

40

60

100

 

Chi-Square

DF

P-Value

Pearson

10.774

1

0.001

Likelihood Ratio

11.138

1

0.001

\(Pearson\;\chi^2 = 10.774\)

\(p = 0.001\)

Our p value is less than the standard 0.05 alpha level, so we reject the null hypothesis.

There is convincing evidence of a relationship between between liking coffee and liking tea in the population.

11.3.3 - Relative Risk

A chi-square test of independence will give you information concerning whether or not a relationship between two categorical variables in the population is likely. As was the case with the single sample and two sample hypothesis tests that you learned earlier this semester, with a large sample size statistical power is high and the probability of rejecting the null hypothesis is high, even if the relationship is relatively weak. In addition to examining statistical significance by looking at the p value, we can also examine practical significance by computing the  relative risk .

In Lesson 2 you learned that risk is often used to describe the probability of an event occurring. Risk can also be used to compare the probabilities in two different groups. First, we'll review risk, then you'll be introduced to the concept of relative risk.

The  risk  of an outcome can be expressed as a fraction or as the percent of a group that experiences the outcome.

Examples of Risk

60 out of 1000 teens have asthma. The risk is \(\frac{60}{1000}=.06\). This means that 6% of all teens experience asthma.

45 out of 100 children get the flu each year. The risk is \(\frac{45}{100}=.45\) or 45%

Thus, relative risk gives the risk for group 1 as a multiple of the risk for group 2.

Example of Relative Risk

Suppose that the risk of a child getting the flu this year is .45 and the risk of an adult getting the flu this year is .10. What is the relative risk of children compared to adults?

  • \(Relative\;risk=\dfrac{.45}{.10}=4.5\)

Children are 4.5 times more likely than adults to get the flu this year.

Watch out for relative risk statistics where no baseline information is given about the actual risk. For instance, it doesn't mean much to say that beer drinkers have twice the risk of stomach cancer as non-drinkers unless we know the actual risks. The risk of stomach cancer might actually be very low, even for beer drinkers. For example, 2 in a million is twice the size of 1 in a million but is would still be a very low risk. This is known as the  baseline  with which other risks are compared.

LEARN STATISTICS EASILY

LEARN STATISTICS EASILY

Learn Data Analysis Now!

LEARN STATISTICS EASILY LOGO 2

Understanding the Null Hypothesis in Chi-Square

The null hypothesis in chi square testing suggests no significant difference between a study’s observed and expected frequencies. It assumes any observed difference is due to chance and not because of a meaningful statistical relationship.

Introduction

The chi-square test is a valuable tool in statistical analysis. It’s a non-parametric test applied when the data are qualitative or categorical. This test helps to establish whether there is a significant association between 2 categorical variables in a sample population.

Central to any chi-square test is the concept of the null hypothesis. In the context of chi-square, the null hypothesis assumes no significant difference exists between the categories’ observed and expected frequencies. Any difference seen is likely due to chance or random error rather than a meaningful statistical difference.

  • The chi-square null hypothesis assumes no significant difference between observed and expected frequencies.
  • Failing to reject the null hypothesis doesn’t prove it true, only that data lacks strong evidence against it.
  • A p-value < the significance level indicates a significant association between variables.

 width=

Ad description. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Understanding the Concept of Null Hypothesis in Chi Square

The null hypothesis in chi-square tests is essentially a statement of no effect or no relationship. When it comes to categorical data, it indicates that the distribution of categories for one variable is not affected by the distribution of categories of the other variable.

For example, if we compare the preference for different types of fruit among men and women, the null hypothesis would state that the preference is independent of gender. The alternative hypothesis, on the other hand, would suggest a dependency between the two.

Steps to Formulate the Null Hypothesis in Chi-Square Tests

Formulating the null hypothesis is a critical step in any chi-square test. First, identify the variables being tested. Then, once the variables are determined, the null hypothesis can be formulated to state no association between them.

Next, collect your data. This data must be frequencies or counts of categories, not percentages or averages. Once the data is collected, you can calculate the expected frequency for each category under the null hypothesis.

Finally, use the chi-square formula to calculate the chi-square statistic. This will help determine whether to reject or fail to reject the null hypothesis.

Step Description
1. Identify Variables Determine the variables being tested in your study.
2. State the Null Hypothesis Formulate the null hypothesis to state that there is no association between the variables.
3. Collect Data Gather your data. Remember, this must be frequencies or counts of categories, not percentages or averages.
4. Calculate Expected Frequencies Under the null hypothesis, calculate the expected frequency for each category.
5. Compute Chi Square Statistic Use the chi square formula to calculate the chi square statistic. This will help determine whether to reject or fail to reject the null hypothesis.

Practical Example and Case Study

Consider a study evaluating whether smoking status is independent of a lung cancer diagnosis. The null hypothesis would state that smoking status (smoker or non-smoker) is independent of cancer diagnosis (yes or no).

If we find a p-value less than our significance level (typically 0.05) after conducting the chi-square test, we would reject the null hypothesis and conclude that smoking status is not independent of lung cancer diagnosis, suggesting a significant association between the two.

Observed Table

Smoking Status Cancer Diagnosis No Cancer Diagnosis
Smoker 70 30
Non-Smoker 20 80

Expected Table

Smoking Status Cancer Diagnosis No Cancer Diagnosis
Smoker 50 50
Non-Smoker 40 60

Common Misunderstandings and Pitfalls

One common misunderstanding is the interpretation of failing to reject the null hypothesis. It’s important to remember that failing to reject the null does not prove it true. Instead, it merely suggests that our data do not provide strong enough evidence against it.

Another pitfall is applying the chi-square test to inappropriate data. The chi-square test requires categorical or nominal data. Applying it to ordinal or continuous data without proper binning or categorization can lead to incorrect results.

The null hypothesis in chi-square testing is a powerful tool in statistical analysis. It provides a means to differentiate between observed variations due to random chance versus those that may signify a significant effect or relationship. As we continue to generate more data in various fields, the importance of understanding and correctly applying chi-square tests and the concept of the null hypothesis grows.

Recommended Articles

Interested in diving deeper into statistics? Explore our range of statistical analysis and data science articles to broaden your understanding. Visit our blog now!

  • Simple Null Hypothesis – an overview (External Link)
  • Chi-Square Calculator: Enhance Your Data Analysis Skills
  • Effect Size for Chi-Square Tests: Unveiling its Significance
  • What is the Difference Between the T-Test vs. Chi-Square Test?
  • Understanding the Assumptions for Chi-Square Test of Independence
  • How to Report Chi-Square Test Results in APA Style: A Step-By-Step Guide

Frequently Asked Questions (FAQs)

It’s a statistical test used to determine if there’s a significant association between two categorical variables.

The null hypothesis suggests no significant difference between observed and expected frequencies exists. The alternative hypothesis suggests a significant difference.

No, we never “accept” the null hypothesis. We only fail to reject it if the data doesn’t provide strong evidence against it.

Rejecting the null hypothesis implies a significant difference between observed and expected frequencies, suggesting an association between variables.

Chi-Square tests are appropriate for categorical or nominal data.

The significance level, often 0.05, is the probability threshold below which the null hypothesis can be rejected.

A p-value < the significance level indicates a significant association between variables, leading to rejecting the null hypothesis.

Using the Chi-Square test for improper data, like ordinal or continuous data, without proper categorization can lead to incorrect results.

Identify the variables, state their independence, collect data, calculate expected frequencies, and apply the Chi-Square formula.

Understanding the null hypothesis is essential for correctly interpreting and applying Chi-Square tests, helping to make informed decisions based on data.

Similar Posts

sample size t-test

Sample Size for t-Test: How to Calculate?

Sample Size T-Test: Improve the reliability of your t-test results with our step-by-step guide on calculating the ideal sample size.

mann-whitney u test

Mastering the Mann-Whitney U Test: A Comprehensive Guide

Master the Mann-Whitney U Test with our guide. Understand the assumptions, steps, and interpretation to effectively analyze your data.

types of logistic regression

What Are The 3 Types of Logistic Regression?

Discover the three types of logistic regression: Binary, Ordinal, and Multinomial. Understand their unique applications in statistical analysis and data science.

Data-Cleaning Techniques

Data Cleaning Techniques: A Comprehensive Guide

Master Data Cleaning Techniques with our guide: uncover strategies for pristine data, enhancing accuracy and insights in your analysis.

Principal Component Analysis (PCA)

Principal Component Analysis: Transforming Data into Truthful Insights

This comprehensive guide explores how Principal Component Analysis transforms complex data into insightful, truthful information.

when p value is less than 0.05

When The P-Value is Less Than 0.05: Understanding Statistical Significance

Discover the meaning of “when P value is less than 0.05,” its relevance to statistical significance, and how to interpret and understand its limitations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

null hypothesis for chi squared test

Chi-Square (Χ²) Test & How To Calculate Formula Equation

Benjamin Frimodig

Science Expert

B.A., History and Science, Harvard University

Ben Frimodig is a 2021 graduate of Harvard College, where he studied the History of Science.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

Chi-square (χ2) is used to test hypotheses about the distribution of observations into categories with no inherent ranking.

What Is a Chi-Square Statistic?

The Chi-square test (pronounced Kai) looks at the pattern of observations and will tell us if certain combinations of the categories occur more frequently than we would expect by chance, given the total number of times each category occurred.

It looks for an association between the variables. We cannot use a correlation coefficient to look for the patterns in this data because the categories often do not form a continuum.

There are three main types of Chi-square tests, tests of goodness of fit, the test of independence, and the test for homogeneity. All three tests rely on the same formula to compute a test statistic.

These tests function by deciphering relationships between observed sets of data and theoretical or “expected” sets of data that align with the null hypothesis.

What is a Contingency Table?

Contingency tables (also known as two-way tables) are grids in which Chi-square data is organized and displayed. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

In contingency tables, one variable and each of its categories are listed vertically, and the other variable and each of its categories are listed horizontally.

Additionally, including column and row totals, also known as “marginal frequencies,” will help facilitate the Chi-square testing process.

In order for the Chi-square test to be considered trustworthy, each cell of your expected contingency table must have a value of at least five.

Each Chi-square test will have one contingency table representing observed counts (see Fig. 1) and one contingency table representing expected counts (see Fig. 2).

contingency table representing observed counts

Figure 1. Observed table (which contains the observed counts).

To obtain the expected frequencies for any cell in any cross-tabulation in which the two variables are assumed independent, multiply the row and column totals for that cell and divide the product by the total number of cases in the table.

contingency table representing observed counts

Figure 2. Expected table (what we expect the two-way table to look like if the two categorical variables are independent).

To decide if our calculated value for χ2 is significant, we also need to work out the degrees of freedom for our contingency table using the following formula: df= (rows – 1) x (columns – 1).

Formula Calculation

chi-squared-equation

Calculate the chi-square statistic (χ2) by completing the following steps:

  • Calculate the expected frequencies and the observed frequencies.
  • For each observed number in the table, subtract the corresponding expected number (O — E).
  • Square the difference (O —E)².
  • Divide the squares obtained for each cell in the table by the expected number for that cell (O – E)² / E.
  • Sum all the values for (O – E)² / E. This is the chi-square statistic.
  • Calculate the degrees of freedom for the contingency table using the following formula; df= (rows – 1) x (columns – 1).

Once we have calculated the degrees of freedom (df) and the chi-squared value (χ2), we can use the χ2 table (often at the back of a statistics book) to check if our value for χ2 is higher than the critical value given in the table. If it is, then our result is significant at the level given.

Interpretation

The chi-square statistic tells you how much difference exists between the observed count in each table cell to the counts you would expect if there were no relationship at all in the population.

Small Chi-Square Statistic: If the chi-square statistic is small and the p-value is large (usually greater than 0.05), this often indicates that the observed frequencies in the sample are close to what would be expected under the null hypothesis.

The null hypothesis usually states no association between the variables being studied or that the observed distribution fits the expected distribution.

In theory, if the observed and expected values were equal (no difference), then the chi-square statistic would be zero — but this is unlikely to happen in real life.

Large Chi-Square Statistic : If the chi-square statistic is large and the p-value is small (usually less than 0.05), then the conclusion is often that the data does not fit the model well, i.e., the observed and expected values are significantly different. This often leads to the rejection of the null hypothesis.

How to Report

To report a chi-square output in an APA-style results section, always rely on the following template:

χ2 ( degrees of freedom , N = sample size ) = chi-square statistic value , p = p value .

chi-squared-spss output

In the case of the above example, the results would be written as follows:

A chi-square test of independence showed that there was a significant association between gender and post-graduation education plans, χ2 (4, N = 101) = 54.50, p < .001.

APA Style Rules

  • Do not use a zero before a decimal when the statistic cannot be greater than 1 (proportion, correlation, level of statistical significance).
  • Report exact p values to two or three decimals (e.g., p = .006, p = .03).
  • However, report p values less than .001 as “ p < .001.”
  • Put a space before and after a mathematical operator (e.g., minus, plus, greater than, less than, equals sign).
  • Do not repeat statistics in both the text and a table or figure.

p -value Interpretation

You test whether a given χ2 is statistically significant by testing it against a table of chi-square distributions , according to the number of degrees of freedom for your sample, which is the number of categories minus 1. The chi-square assumes that you have at least 5 observations per category.

If you are using SPSS then you will have an expected p -value.

For a chi-square test, a p-value that is less than or equal to the .05 significance level indicates that the observed values are different to the expected values.

Thus, low p-values (p< .05) indicate a likely difference between the theoretical population and the collected sample. You can conclude that a relationship exists between the categorical variables.

Remember that p -values do not indicate the odds that the null hypothesis is true but rather provide the probability that one would obtain the sample distribution observed (or a more extreme distribution) if the null hypothesis was true.

A level of confidence necessary to accept the null hypothesis can never be reached. Therefore, conclusions must choose to either fail to reject the null or accept the alternative hypothesis, depending on the calculated p-value.

The four steps below show you how to analyze your data using a chi-square goodness-of-fit test in SPSS (when you have hypothesized that you have equal expected proportions).

Step 1 : Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square… on the top menu as shown below:

Step 2 : Move the variable indicating categories into the “Test Variable List:” box.

Step 3 : If you want to test the hypothesis that all categories are equally likely, click “OK.”

Step 4 : Specify the expected count for each category by first clicking the “Values” button under “Expected Values.”

Step 5 : Then, in the box to the right of “Values,” enter the expected count for category one and click the “Add” button. Now enter the expected count for category two and click “Add.” Continue in this way until all expected counts have been entered.

Step 6 : Then click “OK.”

The four steps below show you how to analyze your data using a chi-square test of independence in SPSS Statistics.

Step 1 : Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).

Step 2 : Select the variables you want to compare using the chi-square test. Click one variable in the left window and then click the arrow at the top to move the variable. Select the row variable and the column variable.

Step 3 : Click Statistics (a new pop-up window will appear). Check Chi-square, then click Continue.

Step 4 : (Optional) Check the box for Display clustered bar charts.

Step 5 : Click OK.

Goodness-of-Fit Test

The Chi-square goodness of fit test is used to compare a randomly collected sample containing a single, categorical variable to a larger population.

This test is most commonly used to compare a random sample to the population from which it was potentially collected.

The test begins with the creation of a null and alternative hypothesis. In this case, the hypotheses are as follows:

Null Hypothesis (Ho) : The null hypothesis (Ho) is that the observed frequencies are the same (except for chance variation) as the expected frequencies. The collected data is consistent with the population distribution.

Alternative Hypothesis (Ha) : The collected data is not consistent with the population distribution.

The next step is to create a contingency table that represents how the data would be distributed if the null hypothesis were exactly correct.

The sample’s overall deviation from this theoretical/expected data will allow us to draw a conclusion, with a more severe deviation resulting in smaller p-values.

Test for Independence

The Chi-square test for independence looks for an association between two categorical variables within the same population.

Unlike the goodness of fit test, the test for independence does not compare a single observed variable to a theoretical population but rather two variables within a sample set to one another.

The hypotheses for a Chi-square test of independence are as follows:

Null Hypothesis (Ho) : There is no association between the two categorical variables in the population of interest.

Alternative Hypothesis (Ha) : There is no association between the two categorical variables in the population of interest.

The next step is to create a contingency table of expected values that reflects how a data set that perfectly aligns the null hypothesis would appear.

The simplest way to do this is to calculate the marginal frequencies of each row and column; the expected frequency of each cell is equal to the marginal frequency of the row and column that corresponds to a given cell in the observed contingency table divided by the total sample size.

Test for Homogeneity

The Chi-square test for homogeneity is organized and executed exactly the same as the test for independence.

The main difference to remember between the two is that the test for independence looks for an association between two categorical variables within the same population, while the test for homogeneity determines if the distribution of a variable is the same in each of several populations (thus allocating population itself as the second categorical variable).

Null Hypothesis (Ho) : There is no difference in the distribution of a categorical variable for several populations or treatments.

Alternative Hypothesis (Ha) : There is a difference in the distribution of a categorical variable for several populations or treatments.

The difference between these two tests can be a bit tricky to determine, especially in the practical applications of a Chi-square test. A reliable rule of thumb is to determine how the data was collected.

If the data consists of only one random sample with the observations classified according to two categorical variables, it is a test for independence. If the data consists of more than one independent random sample, it is a test for homogeneity.

What is the chi-square test?

The Chi-square test is a non-parametric statistical test used to determine if there’s a significant association between two or more categorical variables in a sample.

It works by comparing the observed frequencies in each category of a cross-tabulation with the frequencies expected under the null hypothesis, which assumes there is no relationship between the variables.

This test is often used in fields like biology, marketing, sociology, and psychology for hypothesis testing.

What does chi-square tell you?

The Chi-square test informs whether there is a significant association between two categorical variables. Suppose the calculated Chi-square value is above the critical value from the Chi-square distribution.

In that case, it suggests a significant relationship between the variables, rejecting the null hypothesis of no association.

How to calculate chi-square?

To calculate the Chi-square statistic, follow these steps:

1. Create a contingency table of observed frequencies for each category.

2. Calculate expected frequencies for each category under the null hypothesis.

3. Compute the Chi-square statistic using the formula: Χ² = Σ [ (O_i – E_i)² / E_i ], where O_i is the observed frequency and E_i is the expected frequency.

4. Compare the calculated statistic with the critical value from the Chi-square distribution to draw a conclusion.

Print Friendly, PDF & Email

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Chi-Square Test of Independence and an Example

By Jim Frost 88 Comments

The Chi-square test of independence determines whether there is a statistically significant relationship between categorical variables . It is a hypothesis test that answers the question—do the values of one categorical variable depend on the value of other categorical variables? This test is also known as the chi-square test of association.

Star Trek meme that shows doomed red-shirts.

In this post, I’ll show you how the Chi-square test of independence works. Then, I’ll show you how to perform the analysis and interpret the results by working through the example. I’ll use this test to determine whether wearing the dreaded red shirt in Star Trek is the kiss of death!

If you need a primer on the basics, read my hypothesis testing overview .

Overview of the Chi-Square Test of Independence

The Chi-square test of association evaluates relationships between categorical variables. Like any statistical hypothesis test , the Chi-square test has both a null hypothesis and an alternative hypothesis.

  • Null hypothesis: There are no relationships between the categorical variables. If you know the value of one variable, it does not help you predict the value of another variable.
  • Alternative hypothesis: There are relationships between the categorical variables. Knowing the value of one variable does help you predict the value of another variable.

The Chi-square test of association works by comparing the distribution that you observe to the distribution that you expect if there is no relationship between the categorical variables. In the Chi-square context, the word “expected” is equivalent to what you’d expect if the null hypothesis is true. If your observed distribution is sufficiently different than the expected distribution (no relationship), you can reject the null hypothesis and infer that the variables are related.

For a Chi-square test, a p-value that is less than or equal to your significance level indicates there is sufficient evidence to conclude that the observed distribution is not the same as the expected distribution. You can conclude that a relationship exists between the categorical variables.

When you have smaller sample sizes, you might need to use Fisher’s exact test instead of the chi-square version. To learn more, read my post, Fisher’s Exact Test: Using and Interpreting .

Star Trek Fatalities by Uniform Colors

We’ll perform a Chi-square test of independence to determine whether there is a statistically significant association between shirt color and deaths. We need to use this test because these variables are both categorical variables. Shirt color can be only blue, gold, or red. Fatalities can be only dead or alive.

The color of the uniform represents each crewmember’s work area. We will statistically assess whether there is a connection between uniform color and the fatality rate. Believe it or not, there are “real” data about the crew from authoritative sources and the show portrayed the deaths onscreen. The table below shows how many crewmembers are in each area and how many have died.

Blue Science and Medical 136 7
Gold Command and Helm 55 9
Red Operations, Engineering, and Security 239 24
Ship’s total All 430 40

Tip: Because the chi-square test of association assesses the relationship between categorical variables, bar charts are a great way to graph the data. Use clustering or stacking to compare subgroups within the categories.

Bar chart that displays the fatality rates on Star Trek by uniform color.

Related post : Bar Charts: Using, Examples, and Interpreting

Performing the Chi-Square Test of Independence for Uniform Color and Fatalities

For our example, we will determine whether the observed counts of deaths by uniform color are different from the distribution that we’d expect if there is no association between the two variables.

The table below shows how I’ve entered the data into the worksheet. You can also download the CSV dataset for StarTrekFatalities .

Blue Dead 7
Blue Alive 129
Gold Dead 9
Gold Alive 46
Red Dead 24
Red Alive 215

You can use the dataset to perform the analysis in your preferred statistical software. The Chi-squared test of independence results are below. As an aside, I use this example in my post about degrees of freedom in statistics . Learn why there are two degrees of freedom for the table below.

In our statistical results, both p-values are less than 0.05. We can reject the null hypothesis and conclude there is a relationship between shirt color and deaths. The next step is to define that relationship.

Describing the relationship between categorical variables involves comparing the observed count to the expected count in each cell of the Dead column. I’ve annotated this comparison in the statistical output above.

Statisticians refer to this type of table as a contingency table. To learn more about them and how to use them to calculate probabilities, read my post Using Contingency Tables to Calculate Probabilities .

Related post : Chi-Square Table

Graphical Results for the Chi-Square Test of Association

Additionally, you can use bar charts to graph each cell’s contribution to the Chi-square statistic, which is below.

Surprise! It’s the blue and gold uniforms that contribute the most to the Chi-square statistic and produce the statistical significance! Red shirts add almost nothing. In the statistical output, the comparison of observed counts to expected counts shows that blue shirts die less frequently than expected, gold shirts die more often than expected, and red shirts die at the expected rate.

The graph below reiterates these conclusions by displaying fatality percentages by uniform color along with the overall death rate.

The Chi-square test indicates that red shirts don’t die more frequently than expected. Hold on. There’s more to this story!

Time for a bonus lesson and a bonus analysis in this blog post!

2 Proportions test to compare Security Red-Shirts to Non-Security Red-Shirts

The bonus lesson is that it is vital to include the genuinely pertinent variables in the analysis. Perhaps the color of the shirt is not the critical variable but rather the crewmember’s work area. Crewmembers in Security, Engineering, and Operations all wear red shirts. Maybe only security guards have a higher death rate?

We can test this theory using the 2 Proportions test. We’ll compare the fatality rates of red-shirts in security to red-shirts who are not in security.

The summary data are below. In the table, the events represent the counts of deaths, while the trials are the number of personnel.

Events Trials
Security 18 90
Not security 6 149

The p-value of 0.000 signifies that the difference between the two proportions is statistically significant. Security has a mortality rate of 20% while the other red-shirts are only at 4%.

Security officers have the highest mortality rate on the ship, closely followed by the gold-shirts. Red-shirts that are not in security have a fatality rate similar to the blue-shirts.

As it turns out, it’s not the color of the shirt that affects fatality rates; it’s the duty area. That makes more sense.

Risk by Work Area Summary

The Chi-square test of independence and the 2 Proportions test both indicate that the death rate varies by work area on the U.S.S. Enterprise. Doctors, scientists, engineers, and those in ship operations are the safest with about a 5% fatality rate. Crewmembers that are in command or security have death rates that exceed 15%!

Share this:

null hypothesis for chi squared test

Reader Interactions

' src=

July 18, 2024 at 10:27 am

I read this chi-squared example your excellent book on hypothesis testing but there a couple of things that I can’t quite reconcile:

You decribed the proportion of observed Red shirts fatalities as being the same as the expected. Relative to the Blue and Gold comparison the Red Shirt fatality rate (24) is much closer to the expected rate (22.23) but it isn’t exactely the same.

How different would it need to be to conclude that it is different as opposed to just being the least important in a context where we have concluded that there is an association between shirt colour and fatality rates? Would this need to be answered by a series of chi squared tests (or 2 proportion test) that considered the combination of one shirt colour compared with the sum of other shirt colours. I have tried this with the following p values resultinh from the chi-squared test in excel.

Red shirt v non-red shirt pvalue = 0.554809933

Blue shirt v non-blue shirt pvalue = 0.04363293

Gold shirt v non-gold shirt = 0.053533022

This would suggest that if the question was “do gold shirts die more frequently than other colours” the answer would be that the data does not rule out the null hypothesis. For blue shirts this test would suggest that the data can rule out the null hypothesis yet in the full three colour test Gold contributed the most to the chi-squared statistic.

I have a similar example from my work which looks at the proportions of customers using two different websites and considers the proportions which are new customers, existing customers and customers returned after a long gap (reactivated).

For that test the chi-squared test p value was sufficient to rule out the null hypothesis with reactivated customers contributing the most to the chi-squared statistic. But no individual test which set each customer group against the sum of the others would be considered significant.

Is this comparison gold shirt versus non-shirt or reactivated customers v other customers not a valid use of this test?

' src=

February 6, 2024 at 9:55 pm

Hi Jim. I am using R to calclate a chi sqaure of independence. I have an value of 1.486444 with a P value greater than 0.05. My question is how do I interpet the value of 1.48644? Is this a strong association between two variables or a weak association?

' src=

February 6, 2024 at 10:19 pm

You really just look at the p-value. If you assess the chi-square value, you need as the chi-square value in conjunction with a chi-square distribution with the correct degrees of freedom and use that to calculate the probability. But the p-value does that for you!

In your case, the p-value is greater than your significance level. So, you fail to reject the null hypothesis. You have insufficient evidence to conclude that an association exists between your variables.

Also, it’s important to note that this test doesn’t indicate the strength of association. It only tells you whether your sample data provide sufficient evidence to conclude that an association exists in the population. Unfortunately, you can’t conclude that an association exists.

' src=

September 1, 2022 at 5:01 am

Thank you this was such a helpful article.

I’m not sure if you check these comments anymore, yet if you do I did have a quick question for you. I was trying to follow along in SPSS to reproduce your example and I managed to do most of it. I put your data in, used Weight Cases by Frequency of Deaths, and then was able to do the Chi Square analysis that achieved the exact same results as yours.

Unfortunately, I am totally stuck on the next part where you do the 2 graphs, especially the Percentage of Fatalities by Shirt Color. The math makes sense – it’s just e.g., Gold deaths / (Gold deaths + Gold Alive). However, I cannot seem to figure out how to create a bar chart like that in SPSS!? I’ve tried every combination of variables and settings I can think of in the Chart Builder and no luck. I’ve also tried the Compute Variable option with various formulas to create a new column with the death percentages by shirt color but can’t find a way to sum the frequencies.. The best I can get is using an IF statement so it only calculates on the rows with a Death statistic and then I can get the first part: Frequency / ???, but can’t sum the 2 frequencies of Deaths & Alive per shirt colour to calculate the figure properly. And I’m not sure what other things I can try.

So basically I’m totally stuck at the moment. If by some chance you see this, is there any chance you might please be able to help me figure out how to do that Percentage of Fatalities by Shirt Color bar graph in SPSS? The only way I can see at the moment is to manually type the calculated figures into a new dataset and graph it. That would work but doesn’t seem a very practical way of doing things if this was a large dataset instead of a small example one. Hence I’m assuming this must be a better way of doing this?

Thank you in advance for any help you can give me.

September 1, 2022 at 3:38 pm

Yes, I definitely check these comments!

Unfortunately, I don’t have much experience using SPSS, so I’ll be of limited help with that. There must be some way to do that in SPSS though. Worst case scenario, calculate the percentages by hand or in Excel and then enter them into SPSS and graph them. That shouldn’t be necessary but would work in a pinch.

Perhaps someone with more SPSS experience can provide some tips?

' src=

September 18, 2021 at 6:09 pm

Hi. This comment relates to Warren’s post. The null hypothesis is that there is no statistically significant relationship between “Uniform color” and “Status”. During the summing used to calculate the Chi-squared statistic, each of the (6) contributions are included. (3 Uniform colors x 2 status possibilities) The “Alive” column gives the small contributions that bring the total contribution from 5.6129 up to 6.189. Any reasoning specific to the “Dead” column only begins after the 2-dimensional Chi-squared calculation has been completed.

September 19, 2021 at 12:38 am

Hi Bill, thanks for your clarifications. I got confused with whom you were replying!

September 17, 2021 at 5:53 pm

The chi-square formula is: χ2 = ∑(Oi – Ei)2/Ei, where Oi = observed value (actual value) and Ei = expected value.

September 17, 2021 at 5:56 pm

Hi Bill, thanks. I do cover the formula and example calculations in my other post on the topic, How Chi-Squared Works .

' src=

September 16, 2021 at 6:24 pm

Why is the Pearson Chi Square statistic not equal to the sum of the contributions to Chi-Square? I get 5.6129. The p-value for that Chi-Squre statistic is .0604 which is NOT significant in this century OR the 24th.

' src=

September 14, 2021 at 8:25 am

Thank you JIm, Excellent concept teaching!

' src=

July 15, 2021 at 1:05 pm

Thank you so much for the Star Trek example! As a long-time Trek fan and Stats student, I absolutely love the debunking of the red shirt theory!

July 19, 2021 at 10:19 pm

I’m so glad you liked my example. I’m a life-long Trek fan as well! I found the red shirt question to be interesting. One the one hand, part of the answer of the answer is that red shirts comprise just over 50% of the crew, so of course they’ll have more deaths. And then on the other hand, it’s only certain red shirts that actually have an elevated risk, those in security.

' src=

May 16, 2021 at 1:42 pm

Got this response from the gentleman who did the calculation using a Chi Square. Would you mind commenting? “The numbers reported are nominate (counting) numbers not ordinate (measurement) numbers. As such chi-square analysis must be used to statistically compare outcomes. Two-sample student t-tests cannot be used for ordinate numbers. Correlations are also not usually used for ordinate numbers and most importantly correlations do NOT show cause and effect.”

May 16, 2021 at 3:13 pm

I agree with the first comment. However, please note that I recommended the 2-sample proportions test and the other person is mentioning the 2-sample t-test. Very different tests! And, I agree that the t-test is not appropriate for the Pfizer data. Basically, he’s saying you have categorical data and the t-test is for continuous data. That’s all correct. And that’s why I recommended the the proportions test.

As for the other part about “correlations do NOT show cause and effect.” That’s not quite correct. More accurately, you’d say that correlations do not NECESSARILY imply causation. Sometimes they do and sometimes they don’t imply causation. It depends on the context in which the data were collected. Correlations DO suggest causation when you use a randomized controlled trial (RCT) for the experiment and data collection, which is exactly what Pfizer did. Consequently, the Pfizer data DO suggest that the vaccine caused a reduction in the proportion of COVID infections in the vaccine group compared to the control group (no vaccine). RCTs are intentionally designed so you can draw causal inferences, which is why the FDA requires them for vaccine and other medical trials.

If you’re interested, I’ve written an article about why randomized controlled trials allow you to make causal inferences .

May 16, 2021 at 12:41 pm

Mr. Jim Frost…You are Da Man!! Thank you!! Yes, this is the same document I have been looking at, just did not know how to interpret Table 9. Sorry, never intended to ask you for medical advice, just wanted to understand the statistics and feel confident that the calculations were performed correctly. You have made my day! Now just a purely statistics question, assuming I have not worn out your patience with my dumb questions…Can you explain the criteria used to determine when a Chi Square should be used versus a 2-samples proportions test? I think I saw a comment from someone on your website stating that the Chi Sqaure is often misused in the medical field. Fascinating, fascinating field you are in. Thank you so much for sharing your knowledge and expertise.

May 16, 2021 at 3:00 pm

You bet! That’s why I’m here . . . to educate and clarify statistics and statistical analyses!

The chi-squared test of independence (or association) and the two-sample proportions test are related. The main difference is that the chi-squared test is more general while the 2-sample proportions test is more specific. And, it happens that the proportions test it more targeted at specifically the type of data you have.

The chi-squared test handles two categorical variables where each one can have two or more values. And, it tests whether there is an association between the categorical variables. However, it does not provide an estimate of the effect size or a CI. If you used the chi-squared test with the Pfizer data, you’d presumably obtain significant results and know that an association exists, but not the nature or strength of that association.

The two proportions test also works with categorical data but you must have two variables that each have two levels. In other words, you’re dealing with binary data and, hence, the binomial distribution. The Pfizer data you had fits this exactly. One of the variables is experimental group: control or vaccine. The other variable is COVID status: infected or not infected. Where it really shines in comparison to the chi-squared test is that it gives you an effect size and a CI for the effect size. Proportions and percentages are basically the same thing, but displayed differently: 0.75 vs. 75%.

What you’re interested in answering is whether the percentage (or proportion) of infections amongst those in the vaccinated group is significantly different than the percentage of infections for those in control group. And, that’s the exact question that the proportions test answers. Basically, it provides a more germane answer to that question.

With the Pfizer data, the answer is yes, those in the vaccinated group have a significantly lower proportion of infections than those in the control group (no vaccine). Additionally, you’ll see the proportion for each group listed, and the effect size is the difference between the proportion, which you can find on a separate line, along with the CI of the difference.

Compare that more specific and helpful answer to the one that chi-squared provides: yes, there’s an association between vaccinations and infections. Both are correct but because the proportions test is more applicable to the specific data at hand, it gives a more useful answer.

I see you have an additional comment with questions, so I’m off to that one!

May 15, 2021 at 1:00 pm

Hi Jim, So sorry if my response came off as anything but appreciative of your input. I tried to duplicate your results in your Flu Vaccine article using the 2 Proportion test as you recommended. I was able to duplicate your Estimate for Difference of -0.01942, but I could not duplicate your value for Z, so clearly I am not doing the calculation correctly – even when using Z calculators. So since I couldn’t duplicate your correct results for your flu example, I did not have confidence to proceed to Moderna. I was able to calculate effectiveness (the hazard ratio that is widely reported), but as I have reviewed the EUA documents presented to the FDA in December 2020, I know that there is no regression analysis, and most importantly, no data to show an antibody response produced by the vaccine. So they are not showing the vaccine was successful in producing an immune response, just giving simplistic proportions of how many got covid and how many didn’t. And as they did not even factor in the number of people who had had covid prior to vaccine, I just cant understand how these numbers have any significance at all. I mention the PCR test because it too is under an EUA, and has severe limitations. I would think that those limitations would be statistically significant, as are the symptoms which can indicate any bacterial or viral infection. And you state “I’m sure you can find a journal article or documentation that shows the thorough results if you’re interested”. Clearly I am VERY interested, as I love my parents more than life itself, and have seen the VAERS data, and I don’t want them to be the next statistic. But I CANT find the thorough results that you say are so easy to find. If I could I would not be trying to learn to perform statistical calculations. So I went out on a limb, as you are a fellow trekky and seem like a super nice guy, sharing your expertise with others, and thought you might be able to help me understand the statistics so I can help my parents make an informed choice. We are at a point that children and pregnant women are getting these vaccines. Unhealthy, elderly people in nursing homes (all the people excluded in the trials) are getting these vaccines. I simply ask the question…..do these vaccines provide more protection than NOT getting the vaccine? The ENTIRE POPULATION is being forced to get these vaccines. And you tell me “I’m sure you can find a journal article or documentation that shows the thorough results if you’re interested.” I can only ask…how are you NOT interested? This is the most important statistical question of our lifetime, and of your children’s and granchildren’s lifetime. And I find that no physician or statistician able or willing to answer these questions. Respectfully, Chris

May 15, 2021 at 11:00 pm

No worries. On my website, I’m just discussing the statistical nature of Moderna’s study. Of course, everyone is free to make their own determination and decide accordingly.

Pfizer data analyzed by a two-sample proportions test.

You’re obviously free to question the methods and analysis, but as a statistician, I’m satisfied that Moderna performed an appropriate clinical trial and followed that up with a rigorous and appropriate statistical analysis. In my opinion, they have demonstrated that their vaccine is safe and effective. The only caveat is that we don’t have long-term safety data because not enough time has gone by. However, most side effects for vaccines show up in the first 45 days. That timeframe occurred during the trial and all side effects were recorded.

However, I’m not going to get into a debate about whether anyone should get the vaccine or not. I run a statistics website and that’s the aspect I’m focusing on. There are other places to debate the merits of being vaccinated.

May 14, 2021 at 8:05 pm

Hi Jim, thanks for the reply. I have to admit the detail of all the statistical methods you mention are over my head, but by scanning the document it appears you did not actually calculate the vaccine’s efficacy, just stated how the analysis should be done. I am referring to comments like “To analyze the COVID-19 vaccine data, statisticians will use a stratified Cox proportional hazard regression model to assess the magnitude of the difference between treatment and control groups using a one-sided 0.025 significance level”. And “The full data and analyses are currently unavailable, but we can evaluate their interim analysis report. Moderna (and Pfizer) are still assessing the data and will present their analyses to Federal agencies in December 2020.” I am looking at the December 2020 reports that both Pfizer and Moderna presented to the FDA, and I see no “stratified Cox proportional hazard regression model”, just the simplistic hazard ratio you mention in your paper. I don’t see how that shows the results are statistically significant and not chance. Also the PCR test does not confirm disease, just presence of virus (dead or alive) and virus presence doesnt indicate disease. And the symptoms are symptoms of any viral or bacterial infection, or cancer. Just sort of suprised to see no statistical analysis in the December 2020 reports. Was hoping you had done the heavy lifting…lol

May 14, 2021 at 11:38 pm

Hi Christine,

You had asked if Chi-square would work for your data and my response was no, but here are two methods that would. No, I didn’t analyze the Moderna data myself. I don’t have access to their complete data that would allow me to replicate their results. However, in my post, I did calculate the effectiveness, which you can do using the numbers I had, but not the significance.

Based on the data you indicated you had, I’d recommend the two-sample proportions test that I illustrate in the flu vaccine post. That won’t replicate the more complex analyses but is doable with the data that you have.

The Cox proportional hazard regression model analyzes the hazard ratio. The hazard ratio is the outcome measure in this context. They’re tied together and it’s the regression analysis that indicate significance. I’d imagine you’d have to read a thorough report to get the nitty gritty details. I got the details of their analysis straight from Moderna.

I’m not sure what your point with the PCR test. But, I’m just reporting how they did their analysis.

Moderna, Pfizer, and the others have done the “heavy lifting.” When I wrote the post about the COVID vaccination, it was before it was approved for emergency use. By this point, I’m sure you can find a journal article or documentation that shows the thorough results if you’re interested.

May 14, 2021 at 2:56 pm

Hi Jim, my parents are looking into getting the Pfizer vaccine, and I was wondering if I could use a chi square analysis to see if its statistically effective. From the EUA document, 17411 people got the Pfizer vaccine, and of those people – 8 got covid, and 17403 did not. Of the control group of 17511 that did not get the vaccine, 162 got covid, and 17349 did not. My calculations show this is not statistically significant, but wasn’t sure if I did my calculation correctly, or if I can even use a chi square for this data. Can you help? PS. As a Trekky family, I love your analysis…but we all know its the new guy with a speaking part that gets axed…lol

May 14, 2021 at 3:28 pm

There are several ways you can analyze the effectiveness. I write about how they assessed the Moderna vaccine’s effectiveness , which uses a special type of regression analysis.

The other approach is to use a two-sample proportions test. I don’t write about that in the COVID context but I show how it works for flu vaccinations . The same ideas apply to COVID vaccinations. You’re dealing comparing the proportion of infections in the control group to the treatment group. Hence, a two-sample proportions test.

A chi-square analysis won’t get you where you want to go. It would tell you if there is an association, but it’s not going to tell you the effect size.

I’d read those two posts that I wrote. They’ll give you a good insight for possible ways to analyze the data. I also show how they calculate effectiveness for both the COVID and flu shots!

I hope that helps!

' src=

April 9, 2021 at 2:49 am

thank you so much for your response and advice! I will probably go for the logistic regression then 🙂

All the best for you!

April 10, 2021 at 12:39 am

You’re very welcome! Best of luck with your study! 🙂

April 7, 2021 at 4:18 am

thank you so much for your quick response! This actually helps me a lot and I also already thought about doing a binary logistic regression. However, my supervisor wanted me to use a chi-square test, as he thinks it is easier to perform and less work. So now I am struggling to decide, which option would be more feasible.

Coming back to the chi-square test – could I create a new variable which differentiates between the four experimental conditions and use this as a new ID? Or can I use the DV to weight the frequencies in the chi-square test? – I did that once in a analysis using a continuous DV as weight. Yet, I am not sure if or how that works with a binary variable. Do you have an idea what would work best in the case of a chi-square test?

Thank you so much!!

April 8, 2021 at 11:25 pm

You’re very welcome!

I don’t think either binary logistic regression or chi-square are more less work than the other. However, Chi-square won’t give you the answers you want. You can’t do interaction effects with chi-square. You won’t get nice odds ratios which are a much more intuitive way to interpret the results than chi-square, at least in my opinion. With chi-square, you don’t get a p-value/significance for each variable, just the overall analysis. With logistic regression, you get p-values for each variable and the interaction term if you include it.

I think you can do chi-square analyses with more than one independent variable. You’d essentially have a three dimensional table rather than a two-dimensional table. I’ve never done that myself so I don’t have much advice to offer you there. But, I strongly recommend using logistic regression. You’ll get results that are more useful.

April 6, 2021 at 10:59 am

thank you so much for this helpful post!

April 6, 2021 at 5:36 am

thank you for this very helpful post. Currently, I am working on my master’s thesis and I am struggling with identifying the right way to test my hypothesis as in my case I have three dummy variables (2 independent and 1 dependent).

The experiment was on the topic advice taking. It was a 2×2 between sample design manipulating the source of advice to be a human (0) or an algorithm (1) and the task to be easy (0) or difficult (1). Then, I measured whether the participants followed (1) or not followed (0) the advice. Now, I want to test if there is an interaction effect. In the easy task I expect that the participants rather follow the human advice and in the difficult task the participants rather follow the algorithmic advice.

I want to test this using a chi-square independence test, but I am not sure how to do that with three variables. Should I rather use the variable “Follow/Notfollow” as a weight or should I combine two of the variables so that I have a new variable with four categories, e.g. Easy.Human, Easy.Algorithm, Difficult.Human, Difficult.Algorithm or Human.Follow, Human.NotFollow, Algorithm.Follow, Algorithm.NotFollow

I am not sure, if this is scientifically correct. I would highly appreciate your help and your advice.

Thank you so much in advance! Best, Anni

April 7, 2021 at 1:58 am

I think using binary logistic regression would be your best bet. You can use your dummy DV with that type. And have two dummy IVs also works. You can also include an interaction term, which isn’t possible in chi-square tests. This model would tell you whether source of advice, difficulty of task, and their interaction relate to the probability of participants following the advice.

' src=

March 29, 2021 at 12:43 pm

Hi Jim, I want to thank you for all the content that you have posted online. It has been very helpful for me to apply simple principles of statistics at work. I wanted your thoughts on how to approach the following problem, which appeared to be slightly different from the examples that you shared above. We have two groups – test group (exposed to an ad for brand A) and control group (not exposed to any ads for brand A). We asked both groups a qn: Have you heard of brand A? The possible answers were a Y/N. We then did a t-test to determine if the answers were significantly different for the test and control groups (they were) We asked both groups a follow-up qn as well: How likely are you to buy any of the following brands in the next 3 months? The options were as follows (any one could be picked. B,C & D are competing brands with A) 1.A 2.B 3.C 4.D We wanted to check if the responses we received from both groups were statistically different. Based on my reading, it seemed like the Chi-Square test was the right one to run here. However, I wasn’t too sure what the categorical variables would be in this case and how we could run the Chi-square test here. Would like to get our inputs on how to approach this. Thanks

March 29, 2021 at 2:53 pm

For the first question, I’d typically recommend a 2-sample proportions test. You have two groups and the outcome variable is binary, which is good for proportions. Using a 2-sample proportions test will tell you whether the proportion of individuals who have heard of Brand A differs by the two groups (ads and no ads). You could use the chi-squared test of independence for this case but I recommend the proportions test because it’s designed specifically for this scenario. The procedure can also estimate the effect size and a CI for the effect size (depending on your software). A t-test is not appropriate for these data.

For the next question, yes, the chi-square test is good choice as long as they can only pick one of the options. Maybe, which brand are you most likely to purchase in the next several months. The categories must be mutually exclusive to use chi-square. One variable could be exposed to ad with yes and no as levels. The other would be the purchase question with A, B, C, D as levels. That gives you a 2 X 4 table for your chi-squared test of independence.

' src=

March 29, 2021 at 5:08 am

I don’t see the relationship between the table of shirt color and status and the tabulated statistics. Sam

March 29, 2021 at 3:39 pm

I show the relationship several ways in this post. The key is to understand how the actual counts compare to the expected counts. The analysis calculates the expected counts under the assumption that there is no relationship between the variables. Consequently, when there are differences between the actual and expected accounts, a relationship potentially exists.

In the Tabulated Statistics output, I circle and explain how the actual counts compare to the expected counts. Blue uniforms have fewer deaths than expected while Gold uniforms have more deaths than expected. Red uniforms equal the expect amount, although I explore that in more detail later in the post. You can also see these relationships in the graph titled Percentage of Fatalities.

Overall, the results show the relationship between uniform color and deaths and the p-value indicates that this relationship is statistically significant.

' src=

February 20, 2021 at 8:51 am

Suppose you have two variables that checking out books and means to get to the central library. How might you formulate null hypothesis and alternative hypothesis for the independence test? please answer anyone

February 21, 2021 at 3:15 pm

In this case, the null hypothesis states that there is no relationship between means to get to the library and checking out a book. The alternative hypothesis states that there is a relationship between them.

' src=

November 18, 2020 at 12:39 pm

Hi there I’m just wondering if it would be appropriate to use a Chi square test in the following scenario; – A data set of 1000 individuals – Calculate Score A for all 1000 individuals; results are continuous numerical data eg. 2.13, 3.16, which then allow individuals to be placed in categories; low risk (3.86) -Calculate Score B for the same 1000 individuals; results are discrete numerical data eg. 1, 6, 26 ,4 which the allow individuals to be placed in categories; low risk (26). – I then want to compared the two scoring systems A & B ; to see if (1) the individuals are scoring similarly on both scores (2) I have reason to believe one of the scores overestimates the risk, I’d like tot test this.

Thank you, I haven’t been able to find any similar examples and its stressing me out 🙁

' src=

November 13, 2020 at 1:53 pm

Would you be able to advise?

My organization is sending out 6 different emails to employees, in which they have to click on a link in the email. We want to see if one variation in language might get a higher click rate rate for the link. So we have 6 between subjects conditions, and the response can either be a ‘clicked on the link’ or ‘NOT clicked on the link’.

Is this a Chi-Square of Independence test? Also, how would I know where the difference lies, if the test is significant? (i.e., what is the non-parametric equivalent of running an ANOVA and followup pairwise comparisons?

Thanks Jim!

' src=

October 15, 2020 at 11:05 pm

I am working on the press coverage of civil military relations in Pakistani press from 2008 to 2018, I want to check that whether is a difference of coverage between two tenures ie 2008 to 2013 and 2013 to 2018. Secondly I want to check the difference of coverage between two types of newspapers ie english newspapers and urdu newspapers. furthermore I also want to check the category wise difference of coverage from the tenure 2008 to 2018.

I have divided my data into three different distributions, 1 is pro civilian, 2 is pro military and 3 is neutral.

' src=

October 4, 2020 at 4:07 am

Hi thank you so much for this. I would like to ask, if the study Is about whether factors such as pricing, marketing, and brand affects the intention of the buyer to purchase the product. Can I use Chi-test for the statistic treatment? and if it is not can I ask what statistical treatment would you suggest? Thank you so much again.

October 3, 2020 at 2:51 pm

Jim, Thank you for the post. You displayed a lot of creativity linking the two lessons to Star Trek. Your website and ebook offerings are very inspiring to me. Bill

October 4, 2020 at 12:53 am

Thanks so much, Bill. I really appreciate the kind words and I’m happy that the website and ebooks have been helpful!

' src=

September 29, 2020 at 7:10 am

Thank-you for your explanation. I am trying to help my son with his final school year investigation. He has raw data which he collected from 21 people of varying experience. They all threw a rugby ball at a target and the accuracy, time of ball in the air and experience (rated from 1-5) were all recorded. He has calculated the speed and the displacement, and used correlation to compare speed versus accuracy and experience versus accuracy. He needs to incrementally increase the difficulty of maths he uses in his analysis and he was thinking of the Chi Square test as a next step, however from your explanation above the current form of his data would not be suitable for this test. Is there a way of re-arranging the data so that we can use the Chi Square test? Thanks!

September 30, 2020 at 4:33 pm

Hi Rhonwen,

The chi-squared test of independence looks for correlation between categorical variables. From your description, I’m not seeing a good pair of categorical variables to test for correlation. To me, the next step for this data appears to be regression analysis.

' src=

September 12, 2020 at 5:37 pm

Thank you for the detailed teaching! I think this explains chi square much better than other websites I have found today. Do you mind sharing which software you use to get Expected Count and contribution to Chi square? Thank you for your help.

' src=

August 22, 2020 at 1:06 pm

Good day jim! I was wondering what kind of data analysis should i use if i am going to have a research on knowledge, attitude and practices? Looking forward to your reply! Thank you!

' src=

June 25, 2020 at 8:43 am

Very informative and easy to understand it. Thank you so much sir

' src=

June 2, 2020 at 11:03 am

Hi I wanted to know how the significance probability can be calculated if the significance level wasn’t given. Thank you

June 3, 2020 at 7:39 pm

Hi, you don’t need to know the significance level to be able to calculate the p-value. For calculating the p-value, you must know the null hypothesis, which we do for this example.

However, I do use a significance level of 0.05 for this example, making the results statistically significant.

' src=

May 26, 2020 at 5:55 am

What summary statistics can I use to describe the graph of a categorical data? Good presentation by the way. Very Insightful

May 26, 2020 at 8:39 pm

Hi Michael,

For categorical data like the type in this example, which is in a two-way contingency table, you’d often use counts or percentages. A bar chart is often a good choice for graphing counts or percentages by multiple categories. I show an example of graphing data for contingency tables in my Introduction to Statistics ebook .

' src=

May 25, 2020 at 10:27 am

Thank you for your answer. I saw online that bar graphs can be used to visualise the data (I guess it would be the percentage of death in my case) with 95% Ci intervals for the error bar. Is this also applicable if I only have a 2×2 contingency table? If not, what could be my error bar?

May 26, 2020 at 8:59 pm

Hi John, you can obtain CIs for proportions, which is basically a percentage. And, bar charts are often good for graphing contingency tables.

May 24, 2020 at 9:34 am

Hi! So I am working on this little project where I am trying to find a relationship between sex and mortality brought by this disease so my variables are: sex (male or female) and status (dead or alive). I am new to statistics so I do not know much. Is there any way to check the normality of categorical data? There is a part wherein our data must be based on data normality but I am not sure it this applies to categorical data. Thank you for your answer!

May 24, 2020 at 4:23 pm

The normal distribution is for continuous data. You have discrete data values–two binary variables to be precise. So, the normal distribution is not applicable to your data.

' src=

May 21, 2020 at 11:26 pm

Hi Jim, this was really helpful. I am in the midst of my proposal on a research to determine the association between burnout and physical activity among anaesthesia trainees.

They are both categorial variable physical activity – 3 categories: high, moderate, low burnout – 2 categories: high and low

How do I calculate my sample size for my study?

May 22, 2020 at 2:13 pm

Hi Jaishree,

I suggest you download a free sample size and power calculation program called G*Power . Then do the following:

  • In G*Power, under Test Family, choose, χ². Under Statistical test, choose Goodness-of-fit tests: Contingency tables.
  • In Effect size w, you’ll need to enter a value. 0.1 = weak. 0.3 medium, and 0.5 large. That’s based on subject area knowledge.
  • In β/α ratio, that’s the ratio of the Type II error rate/Type I error rate. They have a default value of 1, but that seems too low. 2-3 might be more appropriate but you can try different values to see how this affects the results.
  • Then you need to enter your sample size and DF. Read my post about Degrees of Freedom , which includes a section about calculating it for chi-square tests.
  • Click Calculate.

Experiment and adjust values to see how that changes the output. You want to find a sample size that produces sufficient power while incorporating your best estimates of the other parameters (effect size, etc.).

' src=

May 16, 2020 at 10:55 am

Learned so much from this post!! This was such a clear example that it is the first time for me that some statistic tests really make sense to me. Thank you so much for sharing your knowledge, Jim!!

' src=

May 5, 2020 at 11:46 am

the information that you have given here has been so useful to me – really understand it much better now. So, thank you very much! Just a quick question, how did you graph the contribution to chi-square statistics? Only, I’ve been using stata to do some data analysis and I’m not sure how it is that I would be able to create a graph like that for my own data. Any insight into that, that you can give would be extremely useful.

May 6, 2020 at 1:30 am

I used Minitab statistical software for the graphs. I think graphs often bring the data to life more than just a table of numbers.

' src=

March 20, 2020 at 2:38 pm

I have the results of two Exit Satisfaction Surveys related to two cohorts (graduates of 2017-18 and graduates of 2018-19). The information I received was just the “number” of ratings on each of the 5 points on the Likert Scale (e.g., 122 respondents Strongly Agreed to a given item). I changed the raw ratings into percentages for comparison, e.g., for Part A of the Survey (Proficiency and Knowledge in my major field), I calculated the minimum and maximum percentages on the Strongly Agree point and did the same for other points on the scale. My questions are (1) can I report the range of percentages on each point on the scale for each item or is it better to report an overall agreement/disagreement? and (2) what’s the best statistics to compare the satisfaction of the two cohorts in the same survey? The 2017-18 cohorts included 126, and the 2018-19 cohort included 296 graduates.

I checked out your Introduction to Statistics book that I purchased, but I couldn’t decide about the appropriate statistics for the analysis of each of the surveys as well as comparison of both cohorts.

My sincere thanks in advance for your time and advice,

All the best, Ellie

' src=

March 20, 2020 at 7:30 am

Thank you for an excellent post! I am myself will soon perform a Chi-square test of independence on survey responses with two variables, and now think it might be good to start with a 2 proportion test (is a Z-test with 2 proportions what you use in this example?). Since you don’t discuss whether the Star Trek data meets the assumptions of the two tests you use, I wonder if they share approximately the same assumptions? I have already made certain that my data may be used with the Chi-square (my data is by the way not necessarily normally distributed, and has unkown mean and variance), can I therefore be comfortable with using a 2 proportions Z-test too? I hope you have the time to help me out here!

' src=

February 18, 2020 at 8:53 am

Excellent post. Btw, is it similar to what they called Test of Association that uses contingency table? The way they compute for the expected value is (row total × column total)/(sample total) . And to check if there is a relationship between two variable, check if the calculate chi-squared value is greater that the critical value of the chi-squared. Is it just the same?

February 20, 2020 at 11:09 am

Hi Hephzibah,

Yes, they’re the same test–test of independence and test of association. I’ll add something to that effect to the article to make that more clear.

' src=

January 6, 2020 at 9:24 am

Jim, thanks for creating and publishing this great content. In the initial chi-square test for independence we determined that shirt color does have a relationship with death rate. The Pearson ch-square measurement is 6.189, is this number meaningful? How do we interpret this in plain english?

January 6, 2020 at 3:09 pm

There’s really no direct interpretation of the chi-square value. That’s the test statistic, similar to the t-value in t-tests and the F-value in F-tests. These values are placed in the chi-square probability distribution that has the specified degrees of freedom (df=2 for this example). By placing the value into the probability distribution, the procedure can calculate probabilities, such as the p-value. I’ve been meaning to write a post that shows how this works for chi-squared tests. I show how this works for t-tests and F-tests for one-way ANOVA . Read those to get an idea of the process. Of course, for this chi-squared test uses chi-squared as the test statistic and probability distribution.

I’ll write a post soon about how this test works, both in terms of calculating the chi-square value itself and then using it in the probability distribution.

' src=

January 5, 2020 at 7:28 am

Would Chi-squared test be the statistical test of choice, for comparing the incidence rates of disease X between two states? Many thanks.

January 6, 2020 at 1:20 am

Hi Michaela,

It sounds like you’d need to use a two-sample proportions test. I show an example of this test using real data in my post about the effective of flu vaccinations . The reason you’d need to use a proportions test is because your observed data are presumably binary (diseased/not diseased).

You could use the chi-squared test, but I think for your case the results are easier to understand using a two-sample proportions test.

' src=

June 3, 2019 at 6:57 pm

Lets say the expected salary for a position is 20,000 dollars. In our observed salary we have various figures a little above and below 20,000 and we want to do a hypothesis test. These salaries are ratio, so does that mean we cannot use Chi Square? Do we have to convert? How? In fact, when you run a chi square on the salary data Chi Square turns out to be very high, sort of off the Chi Square Critical Value chart.

June 3, 2019 at 10:28 pm

Chi-square analysis requires two or more categorical (nominal) variables. Salary is a continuous (ratio) variable. Consequently, you can’t use chi-square.

If you have the one continuous variable of salary and you want to determine whether the difference between the mean salary and $20,000 is statistically significant or not, you’d need to use a one-sample t-test. My post about the different forms of t-tests should be helpful for you.

April 13, 2019 at 4:23 am

I don’t know how to thank you for your detailed informative reply. And I am happy that a specialist like you found this study interesting yoohoo 🙂

As to your comment on how we (me and my graduate student whose thesis I am directing) tracked the errors from Sample writing 1 to 5 for each participant, We did it manually through a close content analysis. I had no idea of a better alternative since going through 25 pieces of writing samples needed meticulous comparison for each participant. I advised my student to tabulate the number, frequency, and type of errors for each participant separately so we could keep track of their (lack of) improvement depending on the participant’s proficiency level.

Do you have any suggestion to make it more rigorous?

Very many thanks, Ellie

April 10, 2019 at 11:52 am

Hi, Jim. I first decided to choose chi-square to analyze my data but now I am thinking of poisson regression since my dependent variable is ‘count.’. I want to see if there is any significant difference between Grade 10 students’ perceptions of their writing problems and the frequency of their writing errors in the five paragraphs they wrote. Here is the detailed situation:

1. Five sample paragraphs were collected from 5 students at 5 proficiency levels based on their total marks in English final exam in the previous semester (from Outstanding to Poor). 2. The students participated in an interview and expressed their perceptions of their problem areas in writing. 3. The students submitted their paragraphs every 2 weeks during the semester. 4. The paragraphs were marked based on the school’s marking rubrics. 5. Errors were categorized under five components (e.g., grammar, word choice, etc.). 6. Paragraphs were compared for measuring the students’ improvement by counting errors manually in each and every paragraph. 7. The students’ errors were also compared to their perceived problem areas to study the extent of their awareness of their writing problems. This comparison showed that students were not aware of a major part of their errors while their perceived errors were not necessarily observed in their writing samples. 8. Comparison of Paragraphs 1 and 5 for each student showed decrease in the number of errors in some language components while some errors still persisted. 9. I’m also interested to see if proficiency level has any impact on students’ perceptions of their real problem areas and the frequency of their errors in each language category.

My question is which test should be used to answer Qs 7 and 8? As to Q9, one of the dependent variables is count and the other one is nominal. One correlation I’m thinking is eta squared (interval-nominal) but for the proficiency-frequency I’m not sure.

My sincere apologies for this long query and many thanks for any clues to the right stats.

April 11, 2019 at 12:25 am

That sounds like a very interesting study!

I think that you’re correct to use some form of regression rather than chi-square. The chi-squared test of independence doesn’t work with counts within an observation. Chi-squared looks at the multiple characteristics of an observations and essentially places in a basket for that combination. For example, you have a red shirt/dead basket and a red-shirt/alive basket. The procedure looks at each observation and places it into one of the baskets. Then it counts the observations in each basket.

What you have are counts (of errors) within each observation. You want to understand that IVs that relate to those counts. That’s a regression thing. Now, what form of regression. Because it involves counts, Poisson regression is a good possibility. You might also read up on negative binomial regression, which is related. Sometimes you can have count data that doesn’t meet certain requirements of the Poisson distribution, but you can use Negative Binomial regression. For more information, look on page 321-322 of my ebook that you just bought! 🙂 I talk a bit about regression with counts.

And, there’s a chance that you might be able to use OLS regression. That depends on how you’re handling the multiple assessments and the average number of errors. The Poisson distribution begins to approximate the normal distribution at around a mean of 25-ish. If the number of errors tend to fall around here or higher, OLS might be the ticket! If you’re summing multiple observations together, that might help in this regard.

I don’t understand the design of how you’re tracking changing the number of errors over time, and how you’ll model that. You might included lagged values of errors to explain current errors, along with other possible IVs.

I found point number 7 to be really interesting. Is it that the blind spot allows the error to persist in greater numbers and that awareness of errors had reduced numbers of those types? Your interpretation of that should be very interesting!

Oh, and for the nominal dependent variable, use nominal logistic regression (p. 319-320)!

I hope this helps!

' src=

March 27, 2019 at 11:53 am

Thanks for your clear posts, Could you please give some insight like in T test and F test, how can we calculate a chi- square test statistic value and how to convert to p value?

March 29, 2019 at 12:26 am

I have that exact topic in mind for a future blog post! I’ll write one up similar to the t-test and F-test posts in the near future. It’s too much to do in the comments section, but soon an entire post for it! I’ll aim for sometime in the next couple of months. Stay tuned!

' src=

November 16, 2018 at 1:47 pm

This was great. 🙂

' src=

September 21, 2018 at 10:47 am

thanks i have learnt alot

' src=

February 5, 2018 at 4:26 pm

Hello, Thanks for the nice tutorial. Can you please explain how the ‘Expected count’ is being calculated in the table “tabulated statistics: Uniform color, Status” ?

February 5, 2018 at 10:25 pm

Hi Shihab, that’s an excellent question!

You calculate the expected value for each cell by first multiplying the column proportion by the row proportion that are associated with each cell. This calculation produces the expected proportion for that cell. Then, you take the expected proportion and multiply it by the total number of observations to obtain the expected count. Let’s work through an example!

I’ll calculate the expected value for wearing a Blue uniform and being Alive. That’s the top-left cell in the statistical output.

At the bottom of the Alive column, we see that 90.7% of all observations are alive. So, 0.907 is the proportion for the Alive column. The output doesn’t display the proportion for the Blue row, but we can calculate that easily. We can see that there are 136 total counts in the Blue row and there are 430 total crew members. Hence, the proportion for the Blue row is 136/430 = 0.31627.

Next, we multiply 0.907 * 0.31627 = 0.28685689. That’s the expected proportion that should fall in that Blue/Alive cell.

Now, we multiply that proportion by the total number of observations to obtain the expected count for that cell: 0.28685689 * 430 = 123.348

You can see in the statistical output that has been rounded up to 123.35.

You simply repeat that procedure for the rest of the cells.

' src=

January 18, 2018 at 2:29 pm

very nice, thanks

' src=

January 1, 2018 at 8:51 am

Amazing post!! In the tabulated statistics section, you ran a Pearson Chi Square and a Likelihood Ratio Chi Square test. Are both of these necessary and do BOTH have to fall below the significance level for the null to be rejected? I’m assuming so. I don’t know what the difference is between these two tests but I will look it up. That was the only part that lost me:)

January 2, 2018 at 11:16 am

Thanks again, Jessica! I really appreciate your kind words!

When the two p-values are in agreement (e.g., both significant or insignificant), that’s easy. Fortunately, in my experience, these two p-values usually do agree. And, as the sample size increases, the agreement between them also increases.

I’ve looked into what to do when they disagree and have not found any clear answers. This paper suggests that as long as all expected frequencies are at least 5, use the Pearson Chi-Square test. When it is less than 5, the article recommends an adjusted Chi-square test, which is neither of the displayed tests!

These tests are most likely to disagree when you have borderline results to begin with (near your significance level), and particularly when you have a small sample. Either of these conditions alone make the results questionable. If these tests disagree, I’d take it as a big warning sign that more research is required!

' src=

December 8, 2017 at 6:58 am

December 8, 2017 at 11:10 am

' src=

December 7, 2017 at 8:18 am

A good presentation. My experience with researchers in health sciences and clinical studies is that very often people do not bother about the hypotheses (null and alternate) but run after a p-value, more so with Chi-Square test of independence!! Your narration is excellent.

' src=

December 7, 2017 at 4:08 am

Helpful post. I can understand now

' src=

December 6, 2017 at 9:47 pm

Excellent Example, Thank you.

December 6, 2017 at 11:24 pm

You’re very welcome. I’m glad it was helpful!

Comments and Questions Cancel reply

8. The Chi squared tests

The χ²tests.

Tablet 8.1

  • Building Guide
  • Departments
  • Directions & Parking
  • Faculty & Staff
  • Give to University Libraries
  • Library Instructional Spaces
  • Mission & Vision
  • Newsletters
  • Circulation
  • Course Reserves / Core Textbooks
  • Equipment for Checkout
  • Interlibrary Loan
  • Library Instruction
  • Library Tutorials
  • My Library Account
  • Open Access Kent State
  • Research Support Services
  • Statistical Consulting
  • Student Multimedia Studio
  • Citation Tools
  • Databases A-to-Z
  • Databases By Subject
  • Digital Collections
  • Discovery@Kent State
  • Government Information
  • Journal Finder
  • Library Guides
  • Connect from Off-Campus
  • Library Workshops
  • Subject Librarians Directory
  • Suggestions/Feedback
  • Writing Commons
  • Academic Integrity
  • Jobs for Students
  • International Students
  • Meet with a Librarian
  • Study Spaces
  • University Libraries Student Scholarship
  • Affordable Course Materials
  • Copyright Services
  • Selection Manager
  • Suggest a Purchase

Library Locations at the Kent Campus

  • Architecture Library
  • Fashion Library
  • Map Library
  • Performing Arts Library
  • Special Collections and Archives

Regional Campus Libraries

  • East Liverpool
  • College of Podiatric Medicine

null hypothesis for chi squared test

  • Kent State University
  • SPSS Tutorials

Chi-Square Test of Independence

Spss tutorials: chi-square test of independence.

  • The SPSS Environment
  • The Data View Window
  • Using SPSS Syntax
  • Data Creation in SPSS
  • Importing Data into SPSS
  • Variable Types
  • Date-Time Variables in SPSS
  • Defining Variables
  • Creating a Codebook
  • Computing Variables
  • Computing Variables: Mean Centering
  • Computing Variables: Recoding Categorical Variables
  • Computing Variables: Recoding String Variables into Coded Categories (Automatic Recode)
  • rank transform converts a set of data values by ordering them from smallest to largest, and then assigning a rank to each value. In SPSS, the Rank Cases procedure can be used to compute the rank transform of a variable." href="https://libguides.library.kent.edu/SPSS/RankCases" style="" >Computing Variables: Rank Transforms (Rank Cases)
  • Weighting Cases
  • Sorting Data
  • Grouping Data
  • Descriptive Stats for One Numeric Variable (Explore)
  • Descriptive Stats for One Numeric Variable (Frequencies)
  • Descriptive Stats for Many Numeric Variables (Descriptives)
  • Descriptive Stats by Group (Compare Means)
  • Frequency Tables
  • Working with "Check All That Apply" Survey Data (Multiple Response Sets)
  • Pearson Correlation
  • One Sample t Test
  • Paired Samples t Test
  • Independent Samples t Test
  • One-Way ANOVA
  • How to Cite the Tutorials

Sample Data Files

Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:

  • Data definitions (*.pdf)
  • Data - Comma delimited (*.csv)
  • Data - Tab delimited (*.txt)
  • Data - Excel format (*.xlsx)
  • Data - SAS format (*.sas7bdat)
  • Data - SPSS format (*.sav)

The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test.

This test is also known as:

  • Chi-Square Test of Association.

This test utilizes a contingency table to analyze the data. A contingency table (also known as a cross-tabulation , crosstab , or two-way table ) is an arrangement in which data is classified according to two categorical variables. The categories for one variable appear in the rows, and the categories for the other variable appear in columns. Each variable must have two or more categories. Each cell reflects the total count of cases for a specific pair of categories.

There are several tests that go by the name "chi-square test" in addition to the Chi-Square Test of Independence. Look for context clues in the data and research question to make sure what form of the chi-square test is being used.

Common Uses

The Chi-Square Test of Independence is commonly used to test the following:

  • Statistical independence or association between two categorical variables.

The Chi-Square Test of Independence can only compare categorical variables. It cannot make comparisons between continuous variables or between categorical and continuous variables. Additionally, the Chi-Square Test of Independence only assesses associations between categorical variables, and can not provide any inferences about causation.

If your categorical variables represent "pre-test" and "post-test" observations, then the chi-square test of independence is not appropriate . This is because the assumption of the independence of observations is violated. In this situation, McNemar's Test is appropriate.

Data Requirements

Your data must meet the following requirements:

  • Two categorical variables.
  • Two or more categories (groups) for each variable.
  • There is no relationship between the subjects in each group.
  • The categorical variables are not "paired" in any way (e.g. pre-test/post-test observations).
  • Expected frequencies for each cell are at least 1.
  • Expected frequencies should be at least 5 for the majority (80%) of the cells.

The null hypothesis ( H 0 ) and alternative hypothesis ( H 1 ) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:

H 0 : "[ Variable 1 ] is independent of [ Variable 2 ]" H 1 : "[ Variable 1 ] is not independent of [ Variable 2 ]"

H 0 : "[ Variable 1 ] is not associated with [ Variable 2 ]" H 1 :  "[ Variable 1 ] is associated with [ Variable 2 ]"

Test Statistic

The test statistic for the Chi-Square Test of Independence is denoted Χ 2 , and is computed as:

$$ \chi^{2} = \sum_{i=1}^{R}{\sum_{j=1}^{C}{\frac{(o_{ij} - e_{ij})^{2}}{e_{ij}}}} $$

\(o_{ij}\) is the observed cell count in the i th row and j th column of the table

\(e_{ij}\) is the expected cell count in the i th row and j th column of the table, computed as

$$ e_{ij} = \frac{\mathrm{ \textrm{row } \mathit{i}} \textrm{ total} * \mathrm{\textrm{col } \mathit{j}} \textrm{ total}}{\textrm{grand total}} $$

The quantity ( o ij - e ij ) is sometimes referred to as the residual of cell ( i , j ), denoted \(r_{ij}\).

The calculated Χ 2 value is then compared to the critical value from the Χ 2 distribution table with degrees of freedom df = ( R - 1)( C - 1) and chosen confidence level. If the calculated Χ 2 value > critical Χ 2 value, then we reject the null hypothesis.

Data Set-Up

There are two different ways in which your data may be set up initially. The format of the data will determine how to proceed with running the Chi-Square Test of Independence. At minimum, your data should include two categorical variables (represented in columns) that will be used in the analysis. The categorical variables must include at least two groups. Your data may be formatted in either of the following ways:

If you have the raw data (each row is a subject):

Example of a dataset structure where each row represents a case or subject. Screenshot shows a Data View window with cases 1-5 and 430-435 from the sample dataset, and columns ids, Smoking and Gender.

  • Cases represent subjects, and each subject appears once in the dataset. That is, each row represents an observation from a unique subject.
  • The dataset contains at least two nominal categorical variables (string or numeric). The categorical variables used in the test must have two or more categories.

If you have frequencies (each row is a combination of factors):

An example of using the chi-square test for this type of data can be found in the Weighting Cases tutorial .

Example of a dataset structure where each row represents a frequency. Screenshot shows a Data View window with three columns (ClassRank, PickedAMajor, and Freq) and six rows.

  • Each row in the dataset represents a distinct combination of the categories.
  • The value in the "frequency" column for a given row is the number of unique subjects with that combination of categories.
  • You should have three variables: one representing each category, and a third representing the number of occurrences of that particular combination of factors.
  • Before running the test, you must activate Weight Cases, and set the frequency variable as the weight.

Run a Chi-Square Test of Independence

In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs procedure. Recall that the Crosstabs procedure creates a contingency table or two-way table , which summarizes the distribution of two categorical variables.

To create a crosstab and perform a chi-square test of independence, click  Analyze > Descriptive Statistics > Crosstabs .

null hypothesis for chi squared test

A Row(s): One or more variables to use in the rows of the crosstab(s). You must enter at least one Row variable.

B Column(s): One or more variables to use in the columns of the crosstab(s). You must enter at least one Column variable.

Also note that if you specify one row variable and two or more column variables, SPSS will print crosstabs for each pairing of the row variable with the column variables. The same is true if you have one column variable and two or more row variables, or if you have multiple row and column variables. A chi-square test will be produced for each table. Additionally, if you include a layer variable, chi-square tests will be run for each pair of row and column variables within each level of the layer variable.

C Layer: An optional "stratification" variable. If you have turned on the chi-square test results and have specified a layer variable, SPSS will subset the data with respect to the categories of the layer variable, then run chi-square tests between the row and column variables. (This is not equivalent to testing for a three-way association, or testing for an association between the row and column variable after controlling for the layer variable.)

D Statistics: Opens the Crosstabs: Statistics window, which contains fifteen different inferential statistics for comparing categorical variables.

In the Crosstabs: Statistics window, check the box next to Chi-square.

To run the Chi-Square Test of Independence, make sure that the Chi-square box is checked.

E Cells: Opens the Crosstabs: Cell Display window, which controls which output is displayed in each cell of the crosstab. (Note: in a crosstab, the cells are the inner sections of the table. They show the number of observations for a given combination of the row and column categories.) There are three options in this window that are useful (but optional) when performing a Chi-Square Test of Independence:

null hypothesis for chi squared test

1 Observed : The actual number of observations for a given cell. This option is enabled by default.

2 Expected : The expected number of observations for that cell (see the test statistic formula).

3 Unstandardized Residuals : The "residual" value, computed as observed minus expected.

F Format: Opens the Crosstabs: Table Format window, which specifies how the rows of the table are sorted.

null hypothesis for chi squared test

Example: Chi-square Test for 3x2 Table

Problem statement.

In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker. There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Suppose we want to test for an association between smoking behavior (nonsmoker, current smoker, or past smoker) and gender (male or female) using a Chi-Square Test of Independence (we'll use α = 0.05).

Before the Test

Before we test for "association", it is helpful to understand what an "association" and a "lack of association" between two categorical variables looks like. One way to visualize this is using clustered bar charts. Let's look at the clustered bar chart produced by the Crosstabs procedure.

This is the chart that is produced if you use Smoking as the row variable and Gender as the column variable (running the syntax later in this example):

null hypothesis for chi squared test

The "clusters" in a clustered bar chart are determined by the row variable (in this case, the smoking categories). The color of the bars is determined by the column variable (in this case, gender). The height of each bar represents the total number of observations in that particular combination of categories.

This type of chart emphasizes the differences within the categories of the row variable. Notice how within each smoking category, the heights of the bars (i.e., the number of males and females) are very similar. That is, there are an approximately equal number of male and female nonsmokers; approximately equal number of male and female past smokers; approximately equal number of male and female current smokers. If there were an association between gender and smoking, we would expect these counts to differ between groups in some way.

Running the Test

  • Open the Crosstabs dialog ( Analyze > Descriptive Statistics > Crosstabs ).
  • Select Smoking as the row variable, and Gender as the column variable.
  • Click Statistics . Check Chi-square , then click Continue .
  • (Optional) Check the box for Display clustered bar charts .

The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both smoking behavior and gender can be used in the test.

Case Processing Summary table for the crosstab of smoking by gender. There are 402 valid cases (92.4%) and 33 cases with missing values on one or both variables (7.6%).

The next tables are the crosstabulation and chi-square test results.

Crosstabulation between smoking and gender, based on 402 valid cases.

The key result in the Chi-Square Tests table is the Pearson Chi-Square.

  • The value of the test statistic is 3.171.
  • The footnote for this statistic pertains to the expected cell count assumption (i.e., expected cell counts are all greater than 5): no cells had an expected count less than 5, so this assumption was met.
  • Because the test statistic is based on a 3x2 crosstabulation table, the degrees of freedom (df) for the test statistic is $$ df = (R - 1)*(C - 1) = (3 - 1)*(2 - 1) = 2*1 = 2 $$.
  • The corresponding p-value of the test statistic is p = 0.205.

Decision and Conclusions

Since the p-value is greater than our chosen significance level ( α = 0.05), we do not reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking.

Based on the results, we can state the following:

  • No association was found between gender and smoking behavior ( Χ 2 (2)> = 3.171, p = 0.205).

Example: Chi-square Test for 2x2 Table

Let's continue the row and column percentage example from the Crosstabs tutorial, which described the relationship between the variables RankUpperUnder (upperclassman/underclassman) and LivesOnCampus (lives on campus/lives off-campus). Recall that the column percentages of the crosstab appeared to indicate that upperclassmen were less likely than underclassmen to live on campus:

  • The proportion of underclassmen who live off campus is 34.8%, or 79/227.
  • The proportion of underclassmen who live on campus is 65.2%, or 148/227.
  • The proportion of upperclassmen who live off campus is 94.4%, or 152/161.
  • The proportion of upperclassmen who live on campus is 5.6%, or 9/161.

Suppose that we want to test the association between class rank and living on campus using a Chi-Square Test of Independence (using α = 0.05).

The clustered bar chart from the Crosstabs procedure can act as a complement to the column percentages above. Let's look at the chart produced by the Crosstabs procedure for this example:

null hypothesis for chi squared test

The height of each bar represents the total number of observations in that particular combination of categories. The "clusters" are formed by the row variable (in this case, class rank). This type of chart emphasizes the differences within the underclassmen and upperclassmen groups. Here, the differences in number of students living on campus versus living off-campus is much starker within the class rank groups.

  • Select RankUpperUnder as the row variable, and LiveOnCampus as the column variable.
  • (Optional) Click Cells . Under Counts, check the boxes for Observed and Expected , and under Residuals, click Unstandardized . Then click Continue .

The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both class rank and living on campus can be used in the test.

The case processing summary for the crosstab of class rank by living on campus. There were 388 valid cases (89.2%) and 47 cases with missing values of one or both variables (10.8%).

The next table is the crosstabulation. If you elected to check off the boxes for Observed Count, Expected Count, and Unstandardized Residuals, you should see the following table:

The crosstabulation of class rank by living on campus.

With the Expected Count values shown, we can confirm that all cells have an expected value greater than 5.

Computation of the expected cell counts and residuals (observed minus expected) for the crosstabulation of class rank by living on campus.
  Off-Campus On-Campus Total
Underclassman

Row 1, column 1

$$ o_{\mathrm{11}} = 79 $$

$$ e_{\mathrm{11}} = \frac{227*231}{388} = 135.147 $$

$$ r_{\mathrm{11}} = 79 - 135.147 = -56.147 $$

Row 1, column 2

$$ o_{\mathrm{12}} = 148 $$

$$ e_{\mathrm{12}} = \frac{227*157}{388} = 91.853 $$

$$ r_{\mathrm{12}} = 148 - 91.853 = 56.147 $$

row 1 total = 227
Upperclassmen

Row 2, column 1

$$ o_{\mathrm{21}} = 152 $$

$$ e_{\mathrm{21}} = \frac{161*231}{388} = 95.853 $$

$$ r_{\mathrm{21}} = 152 - 95.853 = 56.147 $$

Row 2, column 2

$$ o_{\mathrm{22}} = 9 $$

$$ e_{\mathrm{22}} = \frac{161*157}{388} = 65.147 $$

$$ r_{\mathrm{22}} = 9 - 65.147 = -56.147 $$

row 2 total = 161
Total col 1 total = 231 col 2 total = 157 grand total = 388

These numbers can be plugged into the chi-square test statistic formula:

$$ \chi^{2} = \sum_{i=1}^{R}{\sum_{j=1}^{C}{\frac{(o_{ij} - e_{ij})^{2}}{e_{ij}}}} = \frac{(-56.147)^{2}}{135.147} + \frac{(56.147)^{2}}{91.853} + \frac{(56.147)^{2}}{95.853} + \frac{(-56.147)^{2}}{65.147} = 138.926 $$

We can confirm this computation with the results in the Chi-Square Tests table:

The table of chi-square test results, based on the crosstab of class rank by living on campus. The Pearson chi-square test statistic is 138.926 with 1 degree of freedom and a p-value less than 0.001.

The row of interest here is Pearson Chi-Square and its footnote.

  • The value of the test statistic is 138.926.
  • Because the crosstabulation is a 2x2 table, the degrees of freedom (df) for the test statistic is $$ df = (R - 1)*(C - 1) = (2 - 1)*(2 - 1) = 1 $$.
  • The corresponding p-value of the test statistic is so small that it is cut off from display. Instead of writing "p = 0.000", we instead write the mathematically correct statement p < 0.001.

Since the p-value is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that there is an association between class rank and whether or not students live on-campus.

  • There was a significant association between class rank and living on campus ( Χ 2 (1) = 138.9, p < .001).
  • << Previous: Analyzing Data
  • Next: Pearson Correlation >>
  • Last Updated: Jul 10, 2024 11:08 AM
  • URL: https://libguides.library.kent.edu/SPSS

Street Address

Mailing address, quick links.

  • How Are We Doing?
  • Student Jobs

Information

  • Accessibility
  • Emergency Information
  • For Our Alumni
  • For the Media
  • Jobs & Employment
  • Life at KSU
  • Privacy Statement
  • Technology Support
  • Website Feedback

Using Chi-Square Statistic in Research

Understanding the chi-square test: a simple guide.

What’s the Chi-Square Test For? The Chi-Square test helps us figure out if two things we’re interested in (like voter intent and political party membership) are related or just a coincidence. In technical terms, it tests if there’s a significant relationship between two categorical variables—things you can put into categories, like types of fruit or movie genres.

Starting Point: The Null Hypothesis The test starts with a basic assumption called the null hypothesis, which suggests that there’s no connection between the variables in the larger population. They’re independent. For example, it would assume that voter intent doesn’t depend on political party membership.

How Does It Work? Imagine you have a big table (called a crosstabulation or bivariate table) that shows how different categories, like voter intent and political party, overlap. Each cell in the table shows the count of how many people or things fall into each combined category.

The Chi-Square test looks at the numbers in this table in two steps:

Expected vs. Observed : First, it calculates what the numbers in each cell of the table would be if there were no relationship between the variables—these are the expected counts. Then, it compares these expected counts to the actual counts (observed) in your data.

The Chi-Square Statistic : Using these comparisons, it calculates a number (the Chi-Square statistic). If this number is big enough (based on a critical value from the Chi-Square distribution), it suggests that the observed counts are too different from the expected counts to be just a coincidence. This means there’s likely a significant relationship between the variables.

Example Question

“Is there a significant relationship between voter intent and political party membership?”

Using the Chi-Square test, we can analyze data from surveys or polls to see if voter intent really varies by political party, or if any patterns we see could just be random.

Key Takeaways

The Chi-Square test is a handy tool for exploring relationships between categorical variables. By comparing what we observe in the real world to what we would expect if there were no relationship, it helps us understand if our variables are truly independent or if there’s something more going on.

Need help with your research?

Schedule a time to speak with an expert using the calendar below.

User-friendly Software

Transform raw data to written, interpreted, APA formatted Chi-Square results in seconds.

The calculation of the Chi-Square statistic is quite straight-forward and intuitive:

null hypothesis for chi squared test

where f o = the observed frequency (the observed counts in the cells) and f e = the expected frequency if NO relationship existed between the variables

As depicted in the formula, the Chi-Square statistic is based on the difference between what is actually observed in the data and what would be expected if there was truly no relationship between the variables.

How is the Chi-Square statistic run in SPSS and how is the output interpreted?

The Chi-Square statistic appears as an option when requesting a crosstabulation in SPSS. The output is labeled Chi-Square Tests; the Chi-Square statistic used in the Test of Independence is labeled Pearson Chi-Square. This statistic can be evaluated by comparing the actual value against a critical value found in a Chi-Square distribution (where degrees of freedom is calculated as # of rows – 1 x # of columns – 1), but it is easier to simply examine the p -value provided by SPSS. To make a conclusion about the hypothesis with 95% confidence, the value labeled Asymp. Sig. (which is the p -value of the Chi-Square statistic) should be less than .05 (which is the alpha level associated with a 95% confidence level).

Is the p -value (labeled Asymp. Sig.) less than .05?  If so, we can conclude that the variables are not independent of each other and that there is a statistical relationship between the categorical variables.

null hypothesis for chi squared test

In this example, there is an association between fundamentalism and views on teaching sex education in public schools.  While 17.2% of fundamentalists oppose teaching sex education, only 6.5% of liberals are opposed.  The p -value indicates that these variables are not independent of each other and that there is a statistically significant relationship between the categorical variables.

What are special concerns with regard to the Chi-Square statistic?

There are a number of important considerations when using the Chi-Square statistic to evaluate a crosstabulation.  Because of how the Chi-Square value is calculated, it is extremely sensitive to sample size – when the sample size is too large (~500), almost any small difference will appear statistically significant.  It is also sensitive to the distribution within the cells, and SPSS gives a warning message if cells have fewer than 5 cases. This can be addressed by always using categorical variables with a limited number of categories (e.g., by combining categories if necessary to produce a smaller table).

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )

Please call 727-442-4290 to request a quote based on the specifics of your research, schedule a consultation here , or email [email protected]

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples

Chi-Square Goodness of Fit Test | Formula, Guide & Examples

Published on May 24, 2022 by Shaun Turney . Revised on June 22, 2023.

A chi-square (Χ 2 ) goodness of fit test is a type of Pearson’s chi-square test . You can use it to test whether the observed distribution of a categorical variable differs from your expectations.

You recruit a random sample of 75 dogs and offer each dog a choice between the three flavors by placing bowls in front of them. You expect that the flavors will be equally popular among the dogs, with about 25 dogs choosing each flavor.

The chi-square goodness of fit test tells you how well a statistical model fits a set of observations. It’s often used to analyze genetic crosses .

Table of contents

What is the chi-square goodness of fit test, chi-square goodness of fit test hypotheses, when to use the chi-square goodness of fit test, how to calculate the test statistic (formula), how to perform the chi-square goodness of fit test, when to use a different test, practice questions and examples, other interesting articles, frequently asked questions about the chi-square goodness of fit test.

A chi-square (Χ 2 ) goodness of fit test is a goodness of fit test for a categorical variable . Goodness of fit is a measure of how well a statistical model fits a set of observations.

  • When goodness of fit is high , the values expected based on the model are close to the observed values.
  • When goodness of fit is low , the values expected based on the model are far from the observed values.

The statistical models that are analyzed by chi-square goodness of fit tests are distributions . They can be any distribution, from as simple as equal probability for all groups, to as complex as a probability distribution with many parameters.

  • Hypothesis testing

The chi-square goodness of fit test is a hypothesis test . It allows you to draw conclusions about the distribution of a population based on a sample. Using the chi-square goodness of fit test, you can test whether the goodness of fit is “good enough” to conclude that the population follows the distribution.

With the chi-square goodness of fit test, you can ask questions such as: Was this sample drawn from a population that has…

  • Equal proportions of male and female turtles?
  • Equal proportions of red, blue, yellow, green, and purple jelly beans?
  • 90% right-handed and 10% left-handed people?
  • Offspring with an equal probability of inheriting all possible genotypic combinations (i.e., unlinked genes)?
  • A Poisson distribution of floods per year?
  • A normal distribution of bread prices?
Observed and expected frequencies of dogs’ flavor choices
Garlic Blast 22 25
Blueberry Delight 30 25
Minty Munch 23 25

To help visualize the differences between your observed and expected frequencies, you also create a bar graph:

bar-graph-chi-square-test-goodness-of-fit

The president of the dog food company looks at your graph and declares that they should eliminate the Garlic Blast and Minty Munch flavors to focus on Blueberry Delight. “Not so fast!” you tell him.

You explain that your observations were a bit different from what you expected, but the differences aren’t dramatic. They could be the result of a real flavor preference or they could be due to chance.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

null hypothesis for chi squared test

Like all hypothesis tests, a chi-square goodness of fit test evaluates two hypotheses: the null and alternative hypotheses. They’re two competing answers to the question “Was the sample drawn from a population that follows the specified distribution?”

  • Null hypothesis ( H 0 ): The population follows the specified distribution.
  • Alternative hypothesis ( H a ):   The population does not follow the specified distribution.

These are general hypotheses that apply to all chi-square goodness of fit tests. You should make your hypotheses more specific by describing the “specified distribution.” You can name the probability distribution (e.g., Poisson distribution) or give the expected proportions of each group.

  • Null hypothesis ( H 0 ): The dog population chooses the three flavors in equal proportions ( p 1 = p 2 = p 3 ).
  • Alternative hypothesis ( H a ): The dog population does not choose the three flavors in equal proportions.

The following conditions are necessary if you want to perform a chi-square goodness of fit test:

  • You want to test a hypothesis about the distribution of one categorical variable . If your variable is continuous , you can convert it to a categorical variable by separating the observations into intervals. This process is known as data binning.
  • The sample was randomly selected from the population .
  • There are a minimum of five observations expected in each group.
  • You want to test a hypothesis about the distribution of one categorical variable. The categorical variable is the dog food flavors.
  • You recruited a random sample of 75 dogs.
  • There were a minimum of five observations expected in each group. For all three dog food flavors, you expected 25 observations of dogs choosing the flavor.

The test statistic for the chi-square (Χ 2 ) goodness of fit test is Pearson’s chi-square:

Formula Explanation
is the chi-square test statistic is the summation operator (it means “take the sum of”) is the observed frequency is the expected frequency

The larger the difference between the observations and the expectations ( O − E in the equation), the bigger the chi-square will be.

To use the formula, follow these five steps:

Step 1: Create a table

Create a table with the observed and expected frequencies in two columns.

Garlic Blast 22 25
Blueberry Delight 30 25
Minty Munch 23 25

Step 2: Calculate O − E

Add a new column called “ O −  E ”. Subtract the expected frequencies from the observed frequency.

Garlic Blast 22 25 22 25 = 3
Blueberry Delight 30 25 5
Minty Munch 23 25 2

Step 3: Calculate ( O − E ) 2

Add a new column called “( O −  E ) 2 ”. Square the values in the previous column.

Garlic Blast 22 25 3 ( 3) = 9
Blueberry Delight 30 25 5 25
Minty Munch 23 25 2 4

Step 4: Calculate ( O − E ) 2 / E

Add a final column called “( O − E )² /  E “. Divide the previous column by the expected frequencies.

− )² / 
Garlic Blast 22 25 3 9 9/25 = 0.36
Blueberry Delight 30 25 5 25 1
Minty Munch 23 25 2 4 0.16

Step 5: Calculate Χ 2

Add up the values of the previous column. This is the chi-square test statistic (Χ 2 ).

Garlic Blast 22 25 3 9 9/25 = 0.36
Blueberry Delight 30 25 5 25 1
Minty Munch 23 25 2 4 0.16

The chi-square statistic is a measure of goodness of fit, but on its own it doesn’t tell you much. For example, is Χ 2 = 1.52 a low or high goodness of fit?

To interpret the chi-square goodness of fit, you need to compare it to something. That’s what a chi-square test is: comparing the chi-square value to the appropriate chi-square distribution to decide whether to reject the null hypothesis .

To perform a chi-square goodness of fit test, follow these five steps (the first two steps have already been completed for the dog food example):

Step 1: Calculate the expected frequencies

Sometimes, calculating the expected frequencies is the most difficult step. Think carefully about which expected values are most appropriate for your null hypothesis .

In general, you’ll need to multiply each group’s expected proportion by the total number of observations to get the expected frequencies.

Step 2: Calculate chi-square

Calculate the chi-square value from your observed and expected frequencies using the chi-square formula.

\begin{equation*}X^2 = \sum{\dfrac{(O-E)^2}{E}}\end{equation*}

Step 3: Find the critical chi-square value

Find the critical chi-square value in a chi-square critical value table or using statistical software. The critical value is calculated from a chi-square distribution. To find the critical chi-square value, you’ll need to know two things:

  • The degrees of freedom ( df ): For chi-square goodness of fit tests, the df is the number of groups minus one.
  • Significance level (α): By convention, the significance level is usually .05.

Step 4: Compare the chi-square value to the critical value

Compare the chi-square value to the critical value to determine which is larger.

Critical value = 5.99

Step 5: Decide whether the reject the null hypothesis

  • The data allows you to reject the null hypothesis and provides support for the alternative hypothesis.
  • The data doesn’t allow you to reject the null hypothesis and doesn’t provide support for the alternative hypothesis.

Whether you use the chi-square goodness of fit test or a related test depends on what hypothesis you want to test and what type of variable you have.

When to use the chi-square test of independence

There’s another type of chi-square test, called the chi-square test of independence .

  • Use the chi-square goodness of fit test when you have one categorical variable and you want to test a hypothesis about its distribution .
  • Use the chi-square test of independence when you have two categorical variables and you want to test a hypothesis about their relationship .

When to use a different goodness of fit test

The Anderson–Darling and Kolmogorov–Smirnov goodness of fit tests are two other common goodness of fit tests for distributions.

  • Use the Anderson–Darling or the Kolmogorov–Smirnov goodness of fit test when you have a continuous variable (that you don’t want to bin).
  • Use the chi-square goodness of fit test when you have a categorical variable (or a continuous variable that you want to bin).

Do you want to test your knowledge about the chi-square goodness of fit test? Download our practice questions and examples with the buttons below.

Download Word doc Download Google doc

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

You can use the CHISQ.TEST() function to perform a chi-square goodness of fit test in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value .

You can use the chisq.test() function to perform a chi-square goodness of fit test in R. Give the observed values in the “x” argument, give the expected values in the “p” argument, and set “rescale.p” to true. For example:

chisq.test(x = c(22,30,23), p = c(25,25,25), rescale.p = TRUE)

Chi-square goodness of fit tests are often used in genetics. One common application is to check if two genes are linked (i.e., if the assortment is independent). When genes are linked, the allele inherited for one gene affects the allele inherited for another gene.

Suppose that you want to know if the genes for pea texture (R = round, r = wrinkled) and color (Y = yellow, y = green) are linked. You perform a dihybrid cross between two heterozygous ( RY / ry ) pea plants. The hypotheses you’re testing with your experiment are:

  • This would suggest that the genes are unlinked.
  • This would suggest that the genes are linked.

You observe 100 peas:

  • 78 round and yellow peas
  • 6 round and green peas
  • 4 wrinkled and yellow peas
  • 12 wrinkled and green peas

To calculate the expected values, you can make a Punnett square. If the two genes are unlinked, the probability of each genotypic combination is equal.

RRYY RrYy RRYy RrYY
RrYy rryy Rryy rrYy
RRYy Rryy RRyy RrYy
RrYY rrYy RrYy rrYY

The expected phenotypic ratios are therefore 9 round and yellow: 3 round and green: 3 wrinkled and yellow: 1 wrinkled and green.

From this, you can calculate the expected phenotypic frequencies for 100 peas:

Round and yellow 78 100 * (9/16) = 56.25
Round and green 6 100 * (3/16) = 18.75
Wrinkled and yellow 4 100 * (3/16) = 18.75
Wrinkled and green 12 100 * (1/16) = 6.21
Round and yellow 78 56.25 21.75 473.06 8.41
Round and green 6 18.75 −12.75 162.56 8.67
Wrinkled and yellow 4 18.75 −14.75 217.56 11.6
Wrinkled and green 12 6.21 5.79 33.52 5.4

Χ 2 = 8.41 + 8.67 + 11.6 + 5.4 = 34.08

Since there are four groups (round and yellow, round and green, wrinkled and yellow, wrinkled and green), there are three degrees of freedom .

For a test of significance at α = .05 and df = 3, the Χ 2 critical value is 7.82.

Χ 2 = 34.08

Critical value = 7.82

The Χ 2 value is greater than the critical value .

The Χ 2 value is greater than the critical value, so we reject the null hypothesis that the population of offspring have an equal probability of inheriting all possible genotypic combinations. There is a significant difference between the observed and expected genotypic frequencies ( p < .05).

The data supports the alternative hypothesis that the offspring do not have an equal probability of inheriting all possible genotypic combinations, which suggests that the genes are linked

The two main chi-square tests are the chi-square goodness of fit test and the chi-square test of independence .

A chi-square distribution is a continuous probability distribution . The shape of a chi-square distribution depends on its degrees of freedom , k . The mean of a chi-square distribution is equal to its degrees of freedom ( k ) and the variance is 2 k . The range is 0 to ∞.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Chi-Square Goodness of Fit Test | Formula, Guide & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/chi-square-goodness-of-fit/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, chi-square (χ²) tests | types, formula & examples, chi-square (χ²) distributions | definition & examples, chi-square test of independence | formula, guide & examples, what is your plagiarism score.

logo

Stats and R

Chi-square test of independence by hand.

  • Hypothesis test
  • Inferential statistics

Introduction

How the test works, observed frequencies, expected frequencies, test statistic, critical value, conclusion and interpretation.

null hypothesis for chi squared test

Chi-square tests of independence test whether two qualitative variables are independent, that is, whether there exists a relationship between two categorical variables. In other words, this test is used to determine whether the values of one of the 2 qualitative variables depend on the values of the other qualitative variable.

If the test shows no association between the two variables (i.e., the variables are independent), it means that knowing the value of one variable gives no information about the value of the other variable. On the contrary, if the test shows a relationship between the variables (i.e., the variables are dependent), it means that knowing the value of one variable provides information about the value of the other variable.

This article focuses on how to perform a Chi-square test of independence by hand and how to interpret the results with a concrete example. To learn how to do this test in R, read the article “ Chi-square test of independence in R ”.

The Chi-square test of independence is a hypothesis test so it has a null ( \(H_0\) ) and an alternative hypothesis ( \(H_1\) ):

  • \(H_0\) : the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable
  • \(H_1\) : the variables are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable

The Chi-square test of independence works by comparing the observed frequencies (so the frequencies observed in your sample) to the expected frequencies if there was no relationship between the two categorical variables (so the expected frequencies if the null hypothesis was true).

If the difference between the observed frequencies and the expected frequencies is small , we cannot reject the null hypothesis of independence and thus we cannot reject the fact that the two variables are not related . On the other hand, if the difference between the observed frequencies and the expected frequencies is large , we can reject the null hypothesis of independence and thus we can conclude that the two variables are related .

The threshold between a small and large difference is a value that comes from the Chi-square distribution (hence the name of the test). This value, referred as the critical value, depends on the significance level \(\alpha\) (usually set equal to 5%) and on the degrees of freedom. This critical value can be found in the statistical table of the Chi-square distribution. More on this critical value and the degrees of freedom later in the article.

For our example, we want to determine whether there is a statistically significant association between smoking and being a professional athlete. Smoking can only be “yes” or “no” and being a professional athlete can only be “yes” or “no”. The two variables of interest are qualitative variables so we need to use a Chi-square test of independence, and the data have been collected on 28 persons.

Note that we chose binary variables (binary variables = qualitative variables with two levels) for the sake of easiness, but the Chi-square test of independence can also be performed on qualitative variables with more than two levels. For instance, if the variable smoking had three levels: (i) non-smokers, (ii) moderate smokers and (iii) heavy smokers, the steps and the interpretation of the results of the test are similar than with two levels.

Our data are summarized in the contingency table below reporting the number of people in each subgroup, totals by row, by column and the grand total:

  Non-smoker Smoker Total
14 4 18
0 10 10
14 14 28

Remember that for the Chi-square test of independence we need to determine whether the observed counts are significantly different from the counts that we would expect if there was no association between the two variables. We have the observed counts (see the table above), so we now need to compute the expected counts in the case the variables were independent. These expected frequencies are computed for each subgroup one by one with the following formula:

\[\text{exp. frequencies} = \frac{\text{total # of obs. for the row} \cdot \text{total # of obs. for the column}}{\text{total number of observations}}\]

where obs. correspond to observations. Given our table of observed frequencies above, below is the table of the expected frequencies computed for each subgroup:

  Non-smoker Smoker Total
(18 * 14) / 28 = 9 (18 * 14) / 28 = 9 18
(10 * 14) / 28 = 5 (10 * 14) / 28 = 5 10
14 14 28

Note that the Chi-square test of independence should only be done when the expected frequencies in all groups are equal to or greater than 5. This assumption is met for our example as the minimum number of expected frequencies is 5. If the condition is not met, the Fisher’s exact test is preferred.

Talking about assumptions, the Chi-square test of independence requires that the observations are independent. This is usually not tested formally, but rather verified based on the design of the experiment and on the good control of experimental conditions. If you are not sure, ask yourself if one observation is related to another (if one observation has an impact on another). If not, it is most likely that you have independent observations.

If you have dependent observations (paired samples), the McNemar’s or Cochran’s Q tests should be used instead. The McNemar’s test is used when we want to know if there is a significant change in two paired samples (typically in a study with a measure before and after on the same subject) when the variables have only two categories. The Cochran’s Q tests is an extension of the McNemar’s test when we have more than two related measures.

We have the observed and expected frequencies. We now need to compare these frequencies to determine if they differ significantly. The difference between the observed and expected frequencies, referred as the test statistic (or t-stat) and denoted \(\chi^2\) , is computed as follows:

\[\chi^2 = \sum_{i, j} \frac{\big(O_{ij} - E_{ij}\big)^2}{E_{ij}}\]

where \(O\) represents the observed frequencies and \(E\) the expected frequencies. We use the square of the differences between the observed and expected frequencies to make sure that negative differences are not compensated by positive differences. The formula looks more complex than what it really is, so let’s illustrate it with our example. We first compute the difference in each subgroup one by one according to the formula:

  • in the subgroup of athlete and non-smoker: \(\frac{(14 - 9)^2}{9} = 2.78\)
  • in the subgroup of non-athlete and non-smoker: \(\frac{(0 - 5)^2}{5} = 5\)
  • in the subgroup of athlete and smoker: \(\frac{(4 - 9)^2}{9} = 2.78\)
  • in the subgroup of non-athlete and smoker: \(\frac{(10 - 5)^2}{5} = 5\)

and then we sum them all to obtain the test statistic:

\[\chi^2 = 2.78 + 5 + 2.78 + 5 = 15.56\]

The test statistic alone is not enough to conclude for independence or dependence between the two variables. As previously mentioned, this test statistic (which in some sense is the difference between the observed and expected frequencies) must be compared to a critical value to determine whether the difference is large or small. One cannot tell that a test statistic is large or small without putting it in perspective with the critical value.

If the test statistic is above the critical value, it means that the probability of observing such a difference between the observed and expected frequencies is unlikely. On the other hand, if the test statistic is below the critical value, it means that the probability of observing such a difference is likely. If it is likely to observe this difference, we cannot reject the hypothesis that the two variables are independent, otherwise we can conclude that there exists a relationship between the variables.

The critical value can be found in the statistical table of the Chi-square distribution and depends on the significance level, denoted \(\alpha\) , and the degrees of freedom, denoted \(df\) . The significance level is usually set equal to 5%. The degrees of freedom for a Chi-square test of independence is found as follow:

\[df = (\text{number of rows} - 1) \cdot (\text{number of columns} - 1)\]

In our example, the degrees of freedom is thus \(df = (2 - 1) \cdot (2 - 1) = 1\) since there are two rows and two columns in the contingency table (totals do not count as a row or column).

We now have all the necessary information to find the critical value in the Chi-square table ( \(\alpha = 0.05\) and \(df = 1\) ). To find the critical value we need to look at the row \(df = 1\) and the column \(\chi^2_{0.050}\) (since \(\alpha = 0.05\) ) in the picture below. The critical value is \(3.84146\) . 1

null hypothesis for chi squared test

Chi-square table - Critical value for alpha = 5% and df = 1

Now that we have the test statistic and the critical value, we can compare them to check whether the null hypothesis of independence of the variables is rejected or not. In our example,

\[\text{test statistic} = 15.56 > \text{critical value} = 3.84146\]

Like for many statistical tests , when the test statistic is larger than the critical value, we can reject the null hypothesis at the specified significance level.

In our case, we can therefore reject the null hypothesis of independence between the two categorical variables at the 5% significance level.

\(\Rightarrow\) This means that there is a significant relationship between the smoking habit and being an athlete or not. Knowing the value of one variable helps to predict the value of the other variable.

Thanks for reading.

I hope the article helped you to perform the Chi-square test of independence by hand and interpret its results. If you would like to learn how to do this test in R, read the article “ Chi-square test of independence in R ”.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

For readers that prefer to check the \(p\) -value in order to reject or not the null hypothesis, I also created a Shiny app to help you compute the \(p\) -value given a test statistic. ↩︎

Related articles

  • Wilcoxon test in R: how to compare 2 groups under the non-normality assumption?
  • Correlation coefficient and correlation test in R
  • One-proportion and chi-square goodness of fit test
  • How to do a t-test or ANOVA for more than one variable at once in R?

Liked this post?

  • Get updates every time a new article is published (no spam and unsubscribe anytime):

Yes, receive new posts by email

  • Support the blog

FAQ Contribute Sitemap

Google Search

Sociology 3112

Department of sociology, main navigation, the chi-square test for independence, learning objectives.

  • Understand the characteristics of the chi-square distribution
  • Carry out the chi-square test and interpret its results
  • Understand the limitations of the chi-square test

Chi-Square Distribution: a family asymmetrical, positively skewed distributions, the exact shape of which is determined by their respective degrees of freedom Observed Frequencies: the cell frequencies actually observed in a bivariate table Expected Frequencies: The cell frequencies that one might expect to see in a bivariate table if the two variables were statistically independent

The primary use of the chi-square test is to examine whether two variables are independent or not. What does it mean to be independent, in this sense? It means that the two factors are not related. Typically in social science research, we're interested in finding factors that are dependent upon each other—education and income, occupation and prestige, age and voting behavior. By ruling out independence of the two variables, the chi-square can be used to assess whether two variables are, in fact, dependent or not. More generally, we say that one variable is "not correlated with" or "independent of" the other if an increase in one variable is not associated with an increase in the another. If two variables are correlated, their values tend to move together, either in the same or in the opposite direction. Chi-square examines a special kind of correlation: that between two nominal variables.

The Chi-Square Distribution

The chi-square distribution, like the t distribution, is actually a series of distributions, the exact shape of which varies according to their degrees of freedom. Unlike the t distribution, however, the chi-square distribution is asymmetrical, positively skewed and never approaches normality. The graph below illustrates how the shape of the chi-square distribution changes as the degrees of freedom (k) increase:

null hypothesis for chi squared test

The Chi-Square Test

Earlier in the semester, you familiarized yourself with the five steps of hypothesis testing: (1) making assumptions (2) stating the null and research hypotheses and choosing an alpha level (3) selecting a sampling distribution and determining the test statistic that corresponds with the chosen alpha level (4) calculating the test statistic and (5) interpreting the results. Like the t tests we discussed previously, the chi-square test begins with a handful of assumptions, a pair of hypotheses, a sampling distribution and an alpha level and ends with a conclusion obtained via comparison of an obtained statistic with a critical statistic. The assumptions associated with the chi-square test are fairly straightforward: the data at hand must have been randomly selected (to minimize potential biases) and the variables in question must be nominal or ordinal (there are other methods to test the statistical independence of interval/ratio variables; these methods will be discussed in subsequent chapters). Regarding the hypotheses to be tested, all chi-square tests have the same general null and research hypotheses. The null hypothesis states that there is no relationship between the two variables, while the research hypothesis states that there is a relationship between the two variables. The test statistic follows a chi-square distribution, and the conclusion depends on whether or not our obtained statistic is greater that the critical statistic at our chosen alpha level .

In the following example, we'll use a chi-square test to determine whether there is a relationship between gender and getting in trouble at school (both nominal variables). Below is the table documenting the raw scores of boys and girls and their respective behavior issues (or lack thereof):

Gender and Getting in Trouble at School

  Got in Trouble Did Not Get in Trouble Total
Boys 46 71 117
Girls 37 83 120
Total 83 154 237

To examine statistically whether boys got in trouble in school more often, we need to frame the question in terms of hypotheses. The null hypothesis is that the two variables are independent (i.e. no relationship or correlation) and the research hypothesis is that the two variables are related. In this case, the specific hypotheses are:

H0: There is no relationship between gender and getting in trouble at school H1: There is a relationship between gender and getting in trouble at school

As is customary in the social sciences, we'll set our alpha level at 0.05

Next we need to calculate the expected frequency for each cell. These values represent what we would expect to see if there really were no relationship between the two variables. We calculate the expected frequency for each cell by multiplying the row total by the column total and dividing by the total number of observations. To get the expected count for the upper right cell, we would multiply the row total (117) by the column total (83) and divide by the total number of observations (237). (83 x 117)/237 = 40.97. If the two variables were independent, we would expect 40.97 boys to get in trouble. Or, to put it another way, if there were no relationship between the two variables, we would expect to see the number of students who got in trouble be evenly distributed across both genders.

We do the same thing for the other three cells and end up with the following expected counts (in parentheses next to each raw score):

  Got in Trouble Did Not Get in Trouble Total
Boys 46 (40.97) 71 (76.02) 117
Girls 37 (42.03) 83 (77.97) 120
Total 83 154 237

With these sets of figures, we calculate the chi-square statistic as follows:

null hypothesis for chi squared test

For each cell, we square the difference between the observed frequency and the expected frequency (observed frequency – expected frequency) and divide that number by the expected frequency. Then we add all of the terms (there will be four, one for each cell) together, like so:

null hypothesis for chi squared test

After we've crunched all those numbers, we end up with an obtained statistic of 1.87. ( Please note: a chi-square statistic can't be negative because nominal variables don't have directionality. If your obtained statistic turns out to be negative, you might want to check your math.) But before we can come to a conclusion, we need to find our critical statistic, which entails finding our degrees of freedom. In this case, the number of degrees of freedom is equal to the number of columns in the table minus one multiplied by the number of rows in the table minus one, or (r-1)(c-1). In our case, we have (2-1)(2-1), or one degree of freedom.

Finally, we compare our obtained statistic to our critical statistic found on the chi-square table posted in the "Files" section on Canvas. We also need to reference our alpha, which we set at .05. As you can see, the critical statistic for an alpha level of 0.05 and one degree of freedom is 3.841, which is larger than our obtained statistic of 1.87. Because the critical statistic is greater than our obtained statistic, we can't reject our null hypothesis.

The Limitations of the Chi-Square Test

There are two limitations to the chi-square test about which you should be aware. First, the chi-square test is very sensitive to sample size. With a large enough sample, even trivial relationships can appear to be statistically significant. When using the chi-square test, you should keep in mind that "statistically significant" doesn't necessarily mean "meaningful." Second, remember that the chi-square can only tell us whether two variables are related to one another. It does not necessarily imply that one variable has any causal effect on the other. In order to establish causality, a more detailed analysis would be required.

Main Points

  • The chi-square distribution is actually a series of distributions that vary in shape according to their degrees of freedom.
  • The chi-square test is a hypothesis test designed to test for a statistically significant relationship between nominal and ordinal variables organized in a bivariate table. In other words, it tells us whether two variables are independent of one another.
  • The obtained chi-square statistic essentially summarizes the difference between the frequencies actually observed in a bivariate table and the frequencies we would expect to see if there were no relationship between the two variables.
  • The chi-square test is sensitive to sample size.
  • The chi-square test cannot establish a causal relationship between two variables.

Carrying out the Chi-Square Test in SPSS

To perform a chi square test with SPSS, click "Analyze," then "Descriptive Statistics," and then "Crosstabs." As was the case in the last chapter, the independent variable should be placed in the "Columns" box, and the dependent variable should be placed in the "Rows" box. Now click on "Statistics" and check the box next to "Chi-Square." This test will provide evidence either in favor of or against the statistical independence of two variables, but it won't give you any information about the strength or direction of the relationship.

After looking at the output, some of you are probably wondering why SPSS provides you with a two-tailed p-value when chi-square is always a one-tailed test. In all honesty, I don't know the answer to that question. However, all is not lost. Because two-tailed tests are always more conservative than one-tailed tests (i.e., it's harder to reject your null hypothesis with a two-tailed test than it is with a one-tailed test), a statistically significant result under a two-tailed assumption would also be significant under a one-tailed assumption. If you're highly motivated, you can compare the obtained statistic from your output to the critical statistic found on a chi-square chart. Here's a video walkthrough with a slightly more detailed explanation:

  • Using the World Values Survey data, run a chi-square test to determine whether there is a relationship between sex ("SEX") and marital status ("MARITAL"). Report the obtained statistic and the p-value from your output. What is your conclusion?
  • Using the ADD Health data, run a chi-square test to determine whether there is a relationship between the respondent's gender ("GENDER") and his or her grade in math ("MATH"). Again, report the obtained statistic and the p-value from your output. What is your conclusion?

StatAnalytica

Step-by-step guide to hypothesis testing in statistics

hypothesis testing in statistics

Hypothesis testing in statistics helps us use data to make informed decisions. It starts with an assumption or guess about a group or population—something we believe might be true. We then collect sample data to check if there is enough evidence to support or reject that guess. This method is useful in many fields, like science, business, and healthcare, where decisions need to be based on facts.

Learning how to do hypothesis testing in statistics step-by-step can help you better understand data and make smarter choices, even when things are uncertain. This guide will take you through each step, from creating your hypothesis to making sense of the results, so you can see how it works in practical situations.

What is Hypothesis Testing?

Table of Contents

Hypothesis testing is a method for determining whether data supports a certain idea or assumption about a larger group. It starts by making a guess, like an average or a proportion, and then uses a small sample of data to see if that guess seems true or not.

For example, if a company wants to know if its new product is more popular than its old one, it can use hypothesis testing. They start with a statement like “The new product is not more popular than the old one” (this is the null hypothesis) and compare it with “The new product is more popular” (this is the alternative hypothesis). Then, they look at customer feedback to see if there’s enough evidence to reject the first statement and support the second one.

Simply put, hypothesis testing is a way to use data to help make decisions and understand what the data is really telling us, even when we don’t have all the answers.

Importance Of Hypothesis Testing In Decision-Making And Data Analysis

Hypothesis testing is important because it helps us make smart choices and understand data better. Here’s why it’s useful:

  • Reduces Guesswork : It helps us see if our guesses or ideas are likely correct, even when we don’t have all the details.
  • Uses Real Data : Instead of just guessing, it checks if our ideas match up with real data, which makes our decisions more reliable.
  • Avoids Errors : It helps us avoid mistakes by carefully checking if our ideas are right so we don’t make costly errors.
  • Shows What to Do Next : It tells us if our ideas work or not, helping us decide whether to keep, change, or drop something. For example, a company might test a new ad and decide what to do based on the results.
  • Confirms Research Findings : It makes sure that research results are accurate and not just random chance so that we can trust the findings.

Here’s a simple guide to understanding hypothesis testing, with an example:

1. Set Up Your Hypotheses

Explanation: Start by defining two statements:

  • Null Hypothesis (H0): This is the idea that there is no change or effect. It’s what you assume is true.
  • Alternative Hypothesis (H1): This is what you want to test. It suggests there is a change or effect.

Example: Suppose a company says their new batteries last an average of 500 hours. To check this:

  • Null Hypothesis (H0): The average battery life is 500 hours.
  • Alternative Hypothesis (H1): The average battery life is not 500 hours.

2. Choose the Test

Explanation: Pick a statistical test that fits your data and your hypotheses. Different tests are used for various kinds of data.

Example: Since you’re comparing the average battery life, you use a one-sample t-test .

3. Set the Significance Level

Explanation: Decide how much risk you’re willing to take if you make a wrong decision. This is called the significance level, often set at 0.05 or 5%.

Example: You choose a significance level of 0.05, meaning you’re okay with a 5% chance of being wrong.

4. Gather and Analyze Data

Explanation: Collect your data and perform the test. Calculate the test statistic to see how far your sample result is from what you assumed.

Example: You test 30 batteries and find they last an average of 485 hours. You then calculate how this average compares to the claimed 500 hours using the t-test.

5. Find the p-Value

Explanation: The p-value tells you the probability of getting a result as extreme as yours if the null hypothesis is true.

Example: You find a p-value of 0.0001. This means there’s a very small chance (0.01%) of getting an average battery life of 485 hours or less if the true average is 500 hours.

6. Make Your Decision

Explanation: Compare the p-value to your significance level. If the p-value is smaller, you reject the null hypothesis. If it’s larger, you do not reject it.

Example: Since 0.0001 is much less than 0.05, you reject the null hypothesis. This means the data suggests the average battery life is different from 500 hours.

7. Report Your Findings

Explanation: Summarize what the results mean. State whether you rejected the null hypothesis and what that implies.

Example: You conclude that the average battery life is likely different from 500 hours. This suggests the company’s claim might not be accurate.

Hypothesis testing is a way to use data to check if your guesses or assumptions are likely true. By following these steps—setting up your hypotheses, choosing the right test, deciding on a significance level, analyzing your data, finding the p-value, making a decision, and reporting results—you can determine if your data supports or challenges your initial idea.

Understanding Hypothesis Testing: A Simple Explanation

Hypothesis testing is a way to use data to make decisions. Here’s a straightforward guide:

1. What is the Null and Alternative Hypotheses?

  • Null Hypothesis (H0): This is your starting assumption. It says that nothing has changed or that there is no effect. It’s what you assume to be true until your data shows otherwise. Example: If a company says their batteries last 500 hours, the null hypothesis is: “The average battery life is 500 hours.” This means you think the claim is correct unless you find evidence to prove otherwise.
  • Alternative Hypothesis (H1): This is what you want to find out. It suggests that there is an effect or a difference. It’s what you are testing to see if it might be true. Example: To test the company’s claim, you might say: “The average battery life is not 500 hours.” This means you think the average battery life might be different from what the company says.

2. One-Tailed vs. Two-Tailed Tests

  • One-Tailed Test: This test checks for an effect in only one direction. You use it when you’re only interested in finding out if something is either more or less than a specific value. Example: If you think the battery lasts longer than 500 hours, you would use a one-tailed test to see if the battery life is significantly more than 500 hours.
  • Two-Tailed Test: This test checks for an effect in both directions. Use this when you want to see if something is different from a specific value, whether it’s more or less. Example: If you want to see if the battery life is different from 500 hours, whether it’s more or less, you would use a two-tailed test. This checks for any significant difference, regardless of the direction.

3. Common Misunderstandings

  • Clarification: Hypothesis testing doesn’t prove that the null hypothesis is true. It just helps you decide if you should reject it. If there isn’t enough evidence against it, you don’t reject it, but that doesn’t mean it’s definitely true.
  • Clarification: A small p-value shows that your data is unlikely if the null hypothesis is true. It suggests that the alternative hypothesis might be right, but it doesn’t prove the null hypothesis is false.
  • Clarification: The significance level (alpha) is a set threshold, like 0.05, that helps you decide how much risk you’re willing to take for making a wrong decision. It should be chosen carefully, not randomly.
  • Clarification: Hypothesis testing helps you make decisions based on data, but it doesn’t guarantee your results are correct. The quality of your data and the right choice of test affect how reliable your results are.

Benefits and Limitations of Hypothesis Testing

  • Clear Decisions: Hypothesis testing helps you make clear decisions based on data. It shows whether the evidence supports or goes against your initial idea.
  • Objective Analysis: It relies on data rather than personal opinions, so your decisions are based on facts rather than feelings.
  • Concrete Numbers: You get specific numbers, like p-values, to understand how strong the evidence is against your idea.
  • Control Risk: You can set a risk level (alpha level) to manage the chance of making an error, which helps avoid incorrect conclusions.
  • Widely Used: It can be used in many areas, from science and business to social studies and engineering, making it a versatile tool.

Limitations

  • Sample Size Matters: The results can be affected by the size of the sample. Small samples might give unreliable results, while large samples might find differences that aren’t meaningful in real life.
  • Risk of Misinterpretation: A small p-value means the results are unlikely if the null hypothesis is true, but it doesn’t show how important the effect is.
  • Needs Assumptions: Hypothesis testing requires certain conditions, like data being normally distributed . If these aren’t met, the results might not be accurate.
  • Simple Decisions: It often results in a basic yes or no decision without giving detailed information about the size or impact of the effect.
  • Can Be Misused: Sometimes, people misuse hypothesis testing, tweaking data to get a desired result or focusing only on whether the result is statistically significant.
  • No Absolute Proof: Hypothesis testing doesn’t prove that your hypothesis is true. It only helps you decide if there’s enough evidence to reject the null hypothesis, so the conclusions are based on likelihood, not certainty.

Final Thoughts 

Hypothesis testing helps you make decisions based on data. It involves setting up your initial idea, picking a significance level, doing the test, and looking at the results. By following these steps, you can make sure your conclusions are based on solid information, not just guesses.

This approach lets you see if the evidence supports or contradicts your initial idea, helping you make better decisions. But remember that hypothesis testing isn’t perfect. Things like sample size and assumptions can affect the results, so it’s important to be aware of these limitations.

In simple terms, using a step-by-step guide for hypothesis testing is a great way to better understand your data. Follow the steps carefully and keep in mind the method’s limits.

What is the difference between one-tailed and two-tailed tests?

 A one-tailed test assesses the probability of the observed data in one direction (either greater than or less than a certain value). In contrast, a two-tailed test looks at both directions (greater than and less than) to detect any significant deviation from the null hypothesis.

How do you choose the appropriate test for hypothesis testing?

The choice of test depends on the type of data you have and the hypotheses you are testing. Common tests include t-tests, chi-square tests, and ANOVA. You get more details about ANOVA, you may read Complete Details on What is ANOVA in Statistics ?  It’s important to match the test to the data characteristics and the research question.

What is the role of sample size in hypothesis testing?  

Sample size affects the reliability of hypothesis testing. Larger samples provide more reliable estimates and can detect smaller effects, while smaller samples may lead to less accurate results and reduced power.

Can hypothesis testing prove that a hypothesis is true?  

Hypothesis testing cannot prove that a hypothesis is true. It can only provide evidence to support or reject the null hypothesis. A result can indicate whether the data is consistent with the null hypothesis or not, but it does not prove the alternative hypothesis with certainty.

Related Posts

how-to-find-the=best-online-statistics-homework-help

How to Find the Best Online Statistics Homework Help

why-spss-homework-help-is-an-important-aspects-for-students

Why SPSS Homework Help Is An Important aspect for Students?

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

IMAGES

  1. Writing null hypothesis for chi square

    null hypothesis for chi squared test

  2. PPT

    null hypothesis for chi squared test

  3. Chi Square Test

    null hypothesis for chi squared test

  4. PPT

    null hypothesis for chi squared test

  5. PPT

    null hypothesis for chi squared test

  6. PPT

    null hypothesis for chi squared test

VIDEO

  1. Test of Hypothesis, Chi-Square distribution vvi 6th level,4th level bank exam

  2. How to Determine Statistical Significance for a 2x2 Table #shorts

  3. Statistics: Testing of Hypothesis -Chi square test : Goodness of fit in Tamil

  4. Null Hypothesis

  5. Chi Square Test Part 1 Introduction Hypothesis Set Up Types MBS first Semester Statistics Solution

  6. Chi-squared hypothesis tests

COMMENTS

  1. Chi-Square Test of Independence: Definition, Formula, and Example

    Learn how to use a Chi-Square test of independence to determine if two categorical variables are associated. The degrees of freedom are equal to (#rows-1)* (#columns-1). See a step-by-step example with gender and political party preference.

  2. Hypothesis Testing

    Uji Hipotesis â€" Uji Chi Kuadrat

  3. Chi-Square Test of Independence

    Learn how to use a chi-square test of independence to test whether two categorical variables are related. The p value is the probability of obtaining a test statistic as extreme or more extreme than the observed value, given the null hypothesis is true.

  4. Chi-Square (Χ²) Tests

    Learn how to use chi-square tests to analyze categorical data and test hypotheses about frequency distributions. Find out the difference between chi-square goodness of fit and chi-square test of independence, and see examples and practice questions.

  5. 9.1: Chi-square test and goodness of fit

    Obtaining Probability Values for the \(\chi^{2}\) goodness-of-fit test of the null hypothesis: As you can see from the equation of the chi-square, a perfect fit between the observed and the expected would be a chi-square of zero. ... Chi-squared test for given probabilities data: c(16, 80) X-squared = 3.5556, df = 1, p-value = 0.05935.

  6. 8.1

    Learn how to test the independence of two categorical variables using the Chi-Square test. Find the expected counts for each cell under the null hypothesis and compare them with the observed counts to calculate the test statistic.

  7. Chi-squared test

    A chi-squared test is a statistical hypothesis test used to examine the independence of two categorical variables in a contingency table. The test statistic follows a chi-squared distribution under the null hypothesis, and the p-value is the probability of obtaining a value as extreme or more extreme than the observed one.

  8. 11.3

    Learn how to use the chi-square test of independence to test for a relationship between two categorical variables. See an example with gender and online learning data, and follow the five-step hypothesis testing procedure.

  9. 9.6: Chi-Square Tests

    Computational Exercises. In each of the following exercises, specify the number of degrees of freedom of the chi-square statistic, give the value of the statistic and compute the P -value of the test. A coin is tossed 100 times, resulting in 55 heads. Test the null hypothesis that the coin is fair.

  10. 11.1: Chi-Square Tests for Independence

    Learn how to use chi-square tests to judge whether two factors are independent, based on a 2 x 2 contingency table. Find out how to compute the degrees of freedom, the expected values, and the test statistic for each factor.

  11. Understanding the Null Hypothesis in Chi-Square

    Formulating the null hypothesis is a critical step in any chi-square test. First, identify the variables being tested. Then, once the variables are determined, the null hypothesis can be formulated to state no association between them. Next, collect your data. This data must be frequencies or counts of categories, not percentages or averages.

  12. PDF The Chi Square Test

    Learn how to conduct a chi-square test to determine if there is a significant association between two categorical variables. The alternative hypothesis for a chi-square test is always two-sided and states that there is a difference in the distribution of the variables.

  13. What Is Chi Square Test & How To Calculate Formula Equation

    Chi-square test is a nonparametric statistic that compares observed and expected frequencies in a contingency table. It measures the significance of the association between two or more categorical variables. Learn how to calculate, interpret, and report chi-square test results with SPSS and examples.

  14. How the Chi-Squared Test of Independence Works

    Learn how to calculate the expected frequencies and the chi-squared test statistic for a contingency table with two categorical variables. The expected frequencies are the values we'd expect if the null hypothesis is true, and they are based on the row and column totals.

  15. Chi-Square Test of Independence and an Example

    The null hypothesis for the chi-square test of independence is that there is no relationship between the categorical variables. Learn how to perform the test and interpret the results with a Star Trek example.

  16. 8. The Chi squared tests

    Learn how to use the chi squared test to compare the distribution of a categorical variable in two samples. See the formula, the quick method, and the table of chi squared distribution with degrees of freedom.

  17. SPSS Tutorials: Chi-Square Test of Independence

    The null hypothesis (H 0) and alternative hypothesis (H 1) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways: H 0: "[Variable 1] is independent of ... Since the p-value is less than our chosen significance level α = 0.05, we can reject the null hypothesis, ...

  18. Understanding Chi-Square

    Learn how to use the Chi-Square test to explore relationships between categorical variables, such as voter intent and political party membership. See how to calculate the Chi-Square statistic, interpret the p-value, and avoid common pitfalls in SPSS.

  19. Khan Academy

    Learn how to use the chi-square statistic to test if a sample follows a certain distribution, such as equal probabilities for four choices. See examples, formulas, and tips for hypothesis testing with chi-square goodness of fit.

  20. Chi-Square Goodness of Fit Test

    Learn how to use a chi-square goodness of fit test to test whether the observed distribution of a categorical variable differs from your expectations. The null hypothesis is that the population follows the specified distribution, while the alternative hypothesis is that it does not.

  21. Pearson's chi-squared test

    Learn how to use Pearson's chi-squared test to compare categorical data and evaluate the likelihood of chance differences. Find out the test statistic, degrees of freedom, p-value, and examples of applications.

  22. 11.2: Chi-Square One-Sample Goodness-of-Fit Tests

    Learn how to use a chi-square test to judge whether a sample fits a particular population well. See examples of die tossing and ethnic distribution data, and how to compute the test statistic and critical value.

  23. Chi-square test of independence by hand

    The Chi-square test of independence is a hypothesis test so it has a null (\(H_0\)) and an alternative hypothesis (\(H_1\)): \(H_0\): the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable

  24. The Chi-Square Test for Independence

    The Chi-Square Test. Earlier in the semester, you familiarized yourself with the five steps of hypothesis testing: (1) making assumptions (2) stating the null and research hypotheses and choosing an alpha level (3) selecting a sampling distribution and determining the test statistic that corresponds with the chosen alpha level (4) calculating ...

  25. Step-by-step guide to hypothesis testing in statistics

    Misunderstanding 2: A Small p-value Means the Null Hypothesis is False. Clarification: A small p-value shows that your data is unlikely if the null hypothesis is true. It suggests that the alternative hypothesis might be right, but it doesn't prove the null hypothesis is false. Misunderstanding 3: The Significance Level (Alpha) Can Be Chosen ...