How to Calculate income differences among regions with ANOVA

0
(0)
one-businessman-standing-on-highest-coins-stacking-and-many-implementing-ANOVA

History

The development of ANOVA dates back to the early 20th century when British statistician Ronald A. The method to compare the means of different groups was devised by Fisher. Fisher’s significant research helped ANOVA become widely used in the statistical community.

Definition

Researchers can compare the means of two or more groups using the statistical test known as ANOVA to see if there is a statistically significant difference between the groups.

Formula

Calculating the Total Sum of Squares (TSS), Between-Group Sum of Squares (BSS), and Within-Group Sum of Squares is part of the ANOVA formula (WSS). TSS, BSS, and WSS each represent a different aspect of the data’s overall variability: the variability within individual groups, between individual groups, and overall.

ANOVA Calculation example

Assume that you are an economist who is attempting to establish whether there is a sizable variation in the mean wages of workers across three locations (North, South, and East). 100 workers from each region’s income have been the subject of your data collection.

To find out if there is a significant difference in the average worker salary across the three locations, you would conduct an ANOVA as follows:

The mean income for all workers

1. Calculate the overall mean income, we add up the income of all workers and divide by the total number of workers (300 in this case).

Mean Income = (Income of worker 1 + Income of worker 2 + … + Income of worker 300) / 300

The mean income for each region

2. To calculate the mean income for each region, we add up the income of all workers in the region and divide it by the number of workers in that region (100 in this case).

Mean Income for North = (Income of worker 1 + Income of worker 2 + … + Income of worker 100) / 100 Mean Income for South = (Income of worker 101 + Income of worker 102 + … + Income of worker 200) / 100 Mean Income for East = (Income of worker 201 + Income of worker 202 + … + Income of worker 300) / 100

Deviation of each worker’s income

3. To calculate the deviation of each worker’s income from the mean income for their region, we subtract the mean income for their region from their actual income.

Deviation for Worker 1 = Income of Worker 1 – Mean Income for North Deviation for Worker 101 = Income of Worker 101 – Mean Income for South Deviation for Worker 201 = Income of Worker 201 – Mean Income for East

Squared deviation of each worker’s income

4. To calculate the squared deviation of each worker’s income, we square the deviation calculated in step 3.

Squared Deviation for Worker 1 = Deviation for Worker 1^2 Squared Deviation for Worker 101 = Deviation for Worker 101^2 Squared Deviation for Worker 201 = Deviation for Worker 201^2

The sum of sqaured deviations for each region

5 To calculate the sum of squared deviations for each region, we add up the squared deviations of all workers in that region.

Sum of Squared Deviations for North = Squared Deviation for Worker 1 + Squared Deviation for Worker 2 + … + Squared Deviation for Worker 100 Sum of Squared Deviations for South = Squared Deviation for Worker 101 + Squared Deviation for Worker 102 + … + Squared Deviation for Worker 200 Sum of Squared Deviations for East = Squared Deviation for Worker 201 + Squared Deviation for Worker 202 + … + Squared Deviation for Worker 300

Variance for each region

6. To find the variance for each region, multiply the total of its squared deviations by its employees’ number, minus one.

Variance for North = Sum of Squared Deviations for North / (100 – 1) Variance for South = Sum of Squared Deviations for South / (100 – 1) Variance for East = Sum of Squared Deviations for East / (100 – 1)

F-statistic

7. To calculate the F-statistic, divide the variation between regions by the variance within regions, which is obtained by adding the variances for each region and dividing by the total number of regions.

F-statistic = Variance between regions / Variance within regions

Identifying Significance

8. The calculated F-statistic is compared to a crucial value from an F-distribution table to ascertain whether there is a significant variation in the average worker income across the three regions. The null hypothesis that there is no discernible difference in the average worker income across areas is rejected if the F-statistic is greater than the critical value.

We fail to reject the null hypothesis and come to the conclusion that there is insufficient evidence to support a substantial difference in the average worker income across the areas and the condition is met when the F-statistic is less than or equal to the critical value..

Interpretation

9. If we reject the null hypothesis and discover that there is a significant difference in the average worker income between the regions, we can infer that at least one of the regions has a significantly higher average worker income than the others.

We can compare the mean income of each location to the others using posthoc tests, such as Tukey’s test or Bonferroni correction, to identify which regions have significantly different average incomes.

ANOVA Interpretation

The average worker salary in at least one of the regions will differ from the others if the test findings reveal a substantial difference. There is insufficient information to conclude that the average worker income across the regions varies if the test results do not reveal a substantial difference.

Readers can infer that there is a sizable difference in the average worker income in the three regions if the F-statistic is higher than the critical value.

ANOVA Calculator

ANOVA Calculator

Enter the number of groups:

Enter the sample sizes and corresponding means, separated by a comma:




Scroll to Top