Most stat analysis involves at least two variables - dependent and independent variable - and involves hypothesis testing that changes in the the independent variable are assoicated with changes in the dependent variable.
1. A public agency or company may want to see if the sex of an employee affects promotion. Possible lawsuit or complaint.
2. Police dept may be concerned about whether race affects attitudes towards the police. Maybe the blacks or Hispanics or Hmongs are more hostile toward the police than whites. Will affect issues like training and community relations.
3. Public health dept - does use of chemical fertiliziers in farming affect the quality of drinking water?
All policy related issues that involve a hypothesis about the relation between 2 variables:
Gender and promotion - women are promoted more slowly than men. Race and attitude toward police - blacks have less favorable attitudes toward police. Chemical fertilizers and water quality - counties where fertilizer use is low have more pure water.
Educ and poverty example - handout from Mpls Star Tribune.
Contingency Table is a way to use a table to display the relationship between 2 variables.
Example: men get promoted faster than women, ceteris paribus.
Table 6-1 to 6-4 on pages 136-137.
38% of men get promoted in YR 1 vs. 17% for women.
70% of men get promoted by YR 2 vs. 46% for women.
We have a hypothesis that gender affects promotion.
Promotion = f ( Sex) or in Null form: Sex has NO effect on promotion.
Contingency Table is set up like a frequency distribution table. In this case, we suspect that sex is the independent variable, so we set it up to assess the affect of the independent var (sex) on the dependent var (promotion).
Advantage of Contingency Table: Easy to set up, easy to understand.
Disadvantage - doesn't allow for testing of statistical
significance. Doesn't allow for us to precisely MEASURE association between
MEASURES OF ASSOCIATION - allows us to numerically, quantitatively measure the degree of association in a single statistic.
Example: Covariance. Measures how closely two variables X and Y vary together.
COVX,Y = SIGMA (Xi - XBAR) (Yi - YBAR) / N
The calculation is based on "deviations from the mean." When an observation of X (Xi) is above the mean (X-Bar), is:
a) Y above its mean? If so, there will be Pos COV. Positive relationship between Y and X.
b) is Y below its mean? If so, there will be neg COV. Neg association between Y and X.
Also, if X is far above its mean, is Y far above its mean or not? COV can be large or small, pos or neg.
COV can range from -infinity to +infinity.
Problem: the COV statistic is not easily interpreted. Not scaled. What does COV = 100 mean? Hard to tell.
To overcome the shortcoming of the COV, we can use the correlation coefficient, rho.
rho = COVxy / Sigmax Sigmay
We divide the COVx,y by the product of the standard deviation of X and the standard deviation of Y. This calculation forces the value of rho to be between -1 and +1.
rho = -1 means perfect, negative association between X
rho = +1 means perfect positive correlation between X and Y.
rho = 0 means no stat relationship/no correlation between X and Y.
Correlation coefficient measures the degree of co-movement between two variables, simple measure of statistical association.
Application: Portfolio theory. "The lower the degree of correlation between two stocks, the __________ the benefits of combining those stocks in a portfolio." Diversification.
In EVIEWS, to calculate the correlation coefficient between variables X and Y, type: cor X Y. To calculate the covariance type: cov X Y.
Or select Quick/Group Statistics/Correlations or Quick/Group Statistics/Covariances.