Monday, April 24, 2017

Correlation and Spatial Autocorrelation

Introduction:

The purpose of this assignment is to become familiar with correlation and spatial autocorrelation using SPSS and GEODA. The first part of this assignment uses census tracts in Milwaukee to look at the correlations between different fields such as the white population and number of retail employees. The second part of this assignment uses election and population data in Texas to look at the spatial autocorrelation of voter turnout, demarcate voters, and Hispanic populations.

Part One:

For the first part of this assignment, different attributes of the census tracts in Milwaukee Wisconsin are compared to see if there is a correlation between the attributes. A table that shows the correlation between the different attributes can be found below in table 1. This table is a result from the SPSS
table 1. Table of correlations between attributes for Milwaukee census tracts
 software and data that was provided by the instructor. The attributes that this table is comparing are number of manufacturing employees, number of retail employees, number of finance employees, the white population, the black population, the Hispanic population and the median household income. In this table, if an entry has two starts, it means that there is a correlation between the variables to a significance level of 99%, which means there is a strong correlation. The variables that have this strong correlations are the number of manufacturing employees to all the other variables, the number of retail employees to all other variables except Hispanic population, the number of finance employees to all the other variables, the white population compared to all other variables except Hispanic population, the black population compared to all other variables, the Hispanic population to all other variables except number of retail employees and white populations and median household income, and the median household income compared to everything except Hispanic populations. All these entries have a strong correlation between the two variables and a trend can be clearly seen between the variables with either a positive or negative correlation. The entries that only have one star have a correlation to a significance level of 95%. These entries are the white population compared to the Hispanic population and the Hispanic population compared to median household income. These entries still have a correlations between the two variables like the other entries, however these correlations are not quite as strong as the previous ones but a clear correlation is visible. If there is no star next to an entry, then there is no correlation between the variables. The entry that has no correlation is Hispanic populations compared to the number of retail employees. This means that there is no clear correlation between the variable in the data. The table does not only show the strength of the correlation, but it also shows the direction of the correlation. The direction can either be positive or negative. If the correlation is positive, then when one variable goes up the other variable should go up also. If the correlation is negative, then when one variable goes up the other variable should go down. All of the correlations in the table above are positive except for the black population compared to all other variables and the Hispanic population compared to number of finance employees, black populations, and median household income, This means, for example, that when the Hispanic population goes up then the black population goes down or if the black population goes up then the Hispanic population goes down. Using this data it is easy to infer that black and Hispanic populations are not doing very well in the Milwaukee area because they both have a negative correlation with median household income. This means that when there is an increase in black or Hispanic populations, then there is a decrease in median household income or when there is an increase in household income, then the black or Hispanic population decreases. This is compared to the correlation between the white population and the median household income which is a strong positive correlation. This means that when the white population goes up then the median household income in that area goes up.

Part Two:

Introduction:

For this part of the assignment, the Texas Election Commission (TEC) provided data about the 1980 and 2016 presidential elections and wants analysis done on the patterns. They want to know if there are clustering of voting patterns in the state, as well as voter turnout. They also want to know if the election patterns have changed over 36 years. As well as election data, population data is also analyzed to see if there is clustering of Hispanic populations in Texas.

Methodology:

For this assignment, the election data for  1980 and 2016 was provided by the TEC. The population data needed to be downloaded separately from the US Census website with a shapefile of Texas. The population data from the US Census is very cluttered so all the fields can be deleted except for the geo id field and the percent of Hispanic population field. Once the table is simplified, it can be joined to the Texas shapefile and exported as a new feature. Next, the election data can be joined to the new feature and exported to create a feature that has all the attributes that are desired. Once this feature is complete, it needs to be exported and saved as a shapefile. Next, inside GEODA, the shapefile needs to be opened and a new "weights manager" needs to be created and an id variable needs to be added. Once this is complete, the "Moran's scatter plot" button can be clicked and the scatter plot and LISA cluster map need to be selected as an output. After this process is done running, a scatter plot and a cluster map will appear that show the spatial autocorrelation of the variables in Texas.

Results:

The results from this process are a cluster map and a scatter plot for each of the variables. The first variable that a scatter plot and cluster map were made from was the voter turn out in 1980. The map and scatter plot can be seen below in map 1 and graph 1. The red areas are counties that have high
map 1. Texas voter turnout for 1980


graph 1. Texas voter turnout for 1980













voter turnout with areas of high voter turnout around it. The light red areas are counties that have a high voter turnout but are surrounded by counties that have low voter turnout. The light blue areas are counties that have low voter turnout surrounded by counties that have a high voter turnout. The blue areas are counties that have low voter turnout that are surrounded by counties that also have low voter turnout. Looking at the cluster map, map 1, the north and central part of Texas have clusters of high voter turnout and the south and east side of the state have clusters of low voter turnouts. For most of the state, there is no signification clustering. Graph 1 show a scatter plot for the voter turnout in 1980 and provided the Moran's I value. Moran's I value tells how much the data is grouped from -1 to 1. 1 being perfect grouping and -1 being no grouping at all. The Moran's I value for this scatter plot is 0.468, which shows that the voter turnout in 1980 did have some clustering, but not to a great extent. The next cluster map and scatter plot were made from the voter turnout in 2016. The map and scatter plot can be seen below in map 2 and graph 2. The map colors are the same as above. Looking at this

map 2. Texas voter turnout for 2016



graph 2. Texas voter turnout for 2016


















 map, there is a pattern that emerges. There is a lot of clustering of low voter turnout in the south edge of the state with a small cluster on the north west edge of the state. The north and central part of the state have clusters of higher voter turnout. The Moran's I value for this scatter plot is 0.287, which is lower than the Moran's I value for 1980. This means that there is even less clustering in 2016 than 1980. When map 1 and map 2 are compared, the change in voter turnout can be seen. The cluster that is the south end of the state stays about the same but does decrease a little bit. The cluster of high voter turnout in the middle of the state gets smaller and looks like there are a lot less counties that have high voter turnout. The cluster of low voter turnout on the west side of the state is gone entirely and the west side of the state has a lot lower voter turnout than it did in 1980.

The second set of maps and scatter plots deal with the percent of democratic voters in each county in 1980 and 2016. Map 3 and graph 3 below show the percent of democratic vote in each county in Texas in 1980. Like the previous map, the colors represent the same thing, however instead of voter

map 3. Percent of democratic vote by county in 1980


graph 3. Percent of democratic vote by county in 1980















turnout, the map shows the percent of democratic vote. This map shows the clusters of counties that had a high percent of democratic votes and cluster of counties that had a low percent of democratic votes. The south and east part of the state had a large amount of counties clustered together that had a high percentage of democratic votes. The north and west edge of the state had a large amount of counties clustered together that had a low percentage of democratic votes. The Moran's I value for this graph 3 is 0.575, which shows that there is a good amount of clustering going on, but it could still be better.The next map and scatter plot are on the percent democratic vote in Texas in 2016. Below are map 4 and graph 4 that show the spatial autocorrelation for this attribute. Like above, the colors represent the same thing. Using this map, a pattern can be seen. This map shows that there is heavy

map 4. Percent of democratic vote in 1980


graph 4. Percent of democratic vote in 1980














clustering of counties that had a high democratic vote percentage on the south edge of the state and a heavy clustering of counties that had a low democratic vote percentage in the north and central part of the state. The Moran's I value for graph 4 is 0.685. This means that there is even more clustering in 2016 than 1980 in terms of percent of democratic vote. Comparing these maps, the change of democratic voters can be seen. from 1980 to 2016 the south has gained a lot more clustering of high percent of democratic voters, especially towards the west. The north stays the same but the central part of the state shifts to the east. Also, the high cluster area on the east edge of the state disappears completely.

The last map and scatter plot is the percent of Hispanic population per county in Texas. The map and scatter plot can be seen below in map 5 and graph 5. This map shows a strong pattern of where like
map 5. Percent Hispanic population



graph 5. Percent Hispanic population














counties in terms of percent Hispanic population are clustered. There is a high clustering of counties with high percentage of Hispanic population in the south and west edge of the state and a high clustering of counties with low percentage of Hispanic population to the north and east. The Moran's I value for graph 5 is 0.778. This means that there is a lot of clustering going on, in fact it is the highest Moran's I value for all the data and has the most clustering. When map 5 is compared to map 2 and 4, A correlation can be seen. between the Hispanic population and the voters turnout and percent of democratic voters. Table 2 shows the correlation matrix for this data. The percent_1
table 2. Correlation matrix for Texas data
column is the percent Hispanic. This table shows that there is a correlation between the percent of Hispanic population and the percent of democratic voters. This means that when there is a higher percentage of Hispanic population, then the percent of democratic votes goes up or vise versa  There is also a negative correlation between the percent of Hispanic population and the voter turnout for 2016 and 1980. This means that when the percent of Hispanic population goes up then the voter turnout goes down.

Conclusion:

In conclusion for part 2 of this assignment, it seems that when the percent of Hispanic population goes up in a county, then the voter turnout for those counties goes down and the percent of democratic voters goes up. This correlation leads to the assumption that Hispanic populations tend to vote for democratic candidates if they turn out to vote at all.

No comments:

Post a Comment