Monday, February 20, 2017

Descriptive Statistics and Mean Centers

Part 1:

The first part of this assignment takes a look at two different cycling teams and uses statistics to determine which team to invest money into. The two teams are team Astana and team Tobler and the times for their last race are listed below in table 1. 
table 1. Team Astana and Team Tobler individual race times in minutes
With this data it is easy to figure out the different statistics for this data to get a better idea of what this information is showing. The statistics that will be applied to this data are range, mean, median, mode, kurtosis, skewness, and standard deviation. Below, these terms will be defined and applied to the data. 

  • Range: This refers to the extent of the information that is available. It is found by finding the difference between the highest and lowest value.   
  • Mean:  This is the average of the data. It is the middle of the data and is heavily influenced by outliers. It is found by adding all the values together and dividing by the total number of values. 
  • Median: This is the exact middle of the data set. This is different from mean because it is the middle spot of the data that is available and not the actual middle. If there is an odd number of values the median is the middle value and if there is an even number of values, the middle two values need to be added together and divided by 2. This measure is also more resistant to outliers than the mean is. 
  • Mode: This is the most common value that is in the data set. There needs to be at least two of the same values in the data set to have a mode. 
  • Kurtosis: This refers to the shape of the graphed data. Kurtosis is a measure of how peaked or flat the graph is. A positive kurtosis means the graph is relatively peaked and that is called leptokurtic. A negative kurtosis means the graph is relatively flat and that is called platykurtic.
  • Skewness: This, like kurtosis, refers to the shape of the graphed data. Skewness is a measure of how symmetrical the graph is. (-1) - 1 means that the distribution is normal or acceptable. If skewness is positive that means that the graph is shifted to the right and there are large outliers effecting the data. If skewness is negative that means that the graph is shifted to the left and there are small outliers effecting the data. 
  • Standard Deviation: This measures how spread out the data is. There are 6 standard deviations for every data set, 3 positive and 3 negative, and fall on equal intervals from the mean. Between the first positive and negative standard deviation is 68% of all observations. Between the second positive and negative standard deviation is 95% of all observations. Between the third positive and negative standard deviations is 99% of  all observations. If the graph is flatter the data is more spread out and the standard deviation will be larger and if the graph is more peaked the data is closer together  and the standard deviation will be smaller. A population standard deviation is found by finding the difference between the individual observation and the average. Then squaring all those values and adding them together. The next step is to divide that number by the total number of observations and finding the square root of that number. If the whole population is not known then all the same steps are followed except when dividing the sum of the squared values by the total number of observations. Instead subtract one from the total number of observations and use that value for the total number if observations. 
Above are the definitions of the statics that are used to better understand the data that was given. Below is a table (table 2.) that shows the value of each of these measures for both teams as well as the work done by hand to calculate standard deviation (figure 1).      
table 2. statistics applied to team Astana and team Tobler race times (time in hours, minutes)  
figure 1. standard deviation calculations by hand for team Astana and team Tobler 
Based on this information from the data that was given, the team that should be invested in is team Astana. This team was chosen because it has a higher average time and it has the fastest individual. assuming that a team wins by having the best average time it makes sense to go with the team that has the better average time. Also, team Astana has the fastest individual so the team would have the individual that would most likely win. This means that the owner of team Astana would get  the 25% of $300,00 as well as the 35% of the $400,000 instead of $0 that the owner of team Tobler would get. The most important team statistic is the average or mean because that is most likely what the teams will be judged on.

Part 2:

The second part of the assignment  looks at geographic mean centers in Wisconsin and weighted geographic mean center in Wisconsin. Mean centers take the coordinates (X, Y) for a series of points and finds the mean (average) of the X and the Y values separately. When the mean of the X and Y values are found, the new coordinate set can be plotted and the mean center is shown. For this assignment, the first mean center that was found was for all the counties in Wisconsin. The coordinates used for this are the geographic center of each county. This is shown by the green dot on the map below (map 1). The second mean center that was found for this assignment was weighted by the population in 2000. The purple dot on the map below (map 1) shows the mean center for Wisconsin weighted by county population. The third mean center that was found was weighted for the population in 2015. This is shown by the blue dot on the map below (map 1). 
map 1. weighted and geographic mean centers by county in Wisconsin; 2000, 2015


The geographic mean is the green dot and represents the spacial center of Wisconsin based on county that is not weighted. It looks like it is in the center of Wisconsin and should be. The next point is the mean center for population by county in 2000. This dot is shifted extensively to the south east. This is because the high population of Milwaukee county is causing that county to be weighted much heavier than the other counties so it drags the dot towards the county. The last point is the mean center for population by  county  in 2015. Here, the dot shifts to the west and a little to the north. This is either because Milwaukee is losing some of its population or more people are moving to the center of western Wisconsin around Eau Claire and close to Minneapolis. 









No comments:

Post a Comment