We discuss several measurement and data collection issues:
a) Levels of Measurement
b) Criteria for Good Measures
c) Different types of data
LEVELS OF MEASUREMENT
Natural/Hard sciences - measurements are exact and precise - grams, meters, seconds, etc.
Social science/Policy research - measurements are often imprecise or inexact.
Four basic Levels of Measurement/Types of Measurement
in Social Sciences:
1. Nominal Level of Measurement- Qualitative Measures or Classification. There is no Ranking or Ordering or Quantitative component like age or weight or IQ or income, etc.
Examples: Sex: M/F
Household: Married-No Kids, Married-Children, Single Parent, Single, etc.
Marital Status: M/S/D
For statistical purposes, if we assign numbers for different groups, the numbers have no meaning, or order.
Example: Dummy Variables M = 1 F = 0, or vice versa.
Marital Status: 1=S, 2=M, 3=D
The numbers don't mean anything, they are just for convenience, and help sort data into more than one category. Nominal classifications are somewhat of a lower level of measurment compared to the other measures, they don't reflect a precise amount of property (e.g. income or age), and therefore certain statistical procedures are inappropriate. For example, it wouldn't make sense to calculate the "average" or "mean" or "range", etc. and we also wouldn't talk about "variance" or "dispersion", etc.
2. Ordinal Measurement - Ranking or ordering where the numbers DO mean something.
Example: Teaching evaluations: "This course was taught well." or "Instructor was prepared for class."
1 = Strongly agree
2 = Agree
3 = Neutral
4 = Disagree
5 = Strongly Disagree
See book page 47 for another example.
Ordinal measures are ORDERED from Highest to Lowest, Max to Min, Strongly agree to strongly disagree,etc.
Another example: Book page 39. Survey on attitudes toward nuclear power. Four questions. Agree = 1, Disagree = 0 for each question. Range 0 - 4 Max = 4 (strongly favor nuclear) to Min = 0 (strongly oppose nuclear power)
Caution: these scores are not "multiplicative." A score of 4 vs. 2. We can't say that a person with a 4 is twice as much in favor of nuclear as a person with 2, just that they favor nuclear MORE than someone else. We don't assume that intervals are exactly equal on our scale, in this case from 0-4.
3. Interval level of measurement - assumes fixed and equal intervals based on some accepted standard - weight, height, time, money. Pounds, inches, years/age, years of education, dollars, numbers of deaths, etc. would all be examples of interval level measures.
Interval measure conveys the most information because we can classify, order and rank data on a clearly delineated scale. They are precise numerical/quantitative measures and allow us to use the most powerful stat techniques.
Interval measures ARE multiplicative. 20 years of education is twice as much as 10 years. $1m is half as much as $2m, etc.
4. Ratio scales of measurement - fixed endpoints with zero as one point. Percentage is an example of a common ratio scale. Examples: percent unemployment (un rate), percent female, percent urban, poverty rate, GDP/capita, etc.
Or crime rate/1000 people. Traffic accidents/mile drive. MPG. GPA. Student/Teacher Ratio.
Since both Interval and Ratio measures are precise quantitative measures and are multiplicative, they convey the most information, and we can use the most powerful statistical techniques to summarize and analyze them, as we will see in Ch 8 and 9.
Exercise 3-1 on page 40.
CRITERIA FOR MEASUREMENT - Assessment of our measurement. Quality evaluation.
1. Validity - A valid measure is one that measures what it is supposed to measure. Valid measure has a lack of BIAS. There is no test to see if a measure is valid. We have to rely on accepted research practices, and consensus of the profession.
Example: Inflation as a measure of cost-of-living. CPI has been criticized recently. Presidential commission: CPI/Inflation overstates the TRUE cost of living by 1-1.5% Due to: 1) using a fixed basket of goods and 2) inability to fully and accurately measure quality improvements.
Or controversy about teaching evaluations: How accurately do they measure teaching effectiveness??
a. Face Validity - Common sense test. Has this measure been used before? Is it accepted in the profession? Does "percent of children receiving free lunch" measure the level of poverty in a school district?
b. Content Validity - Is the measure comprehensive for what we are trying to measure? Do the fifteen questions on the teaching evaluation forms cover what we are trying to measure - teaching effectiveness?
c. Predictive Ability - does the measure have predictive power? Example: GPA, class rank and test scores for admittance to college, grad school, law school, etc.
SAT - predictive power for college? GMAT - predictive power for grad B-school? GRE - grad school, LSAT - law school, etc. Grades - predictive power for job performance
2. Reliability - Is the measurement reliable and stable over time? Many times it is not. Example: crime rates differ over time due to: changes in administration, variations in reporting across govt. agencies, amount of crime that goes unreported, etc. How to report multiple crimes? If two people commit robbery and murder on one victim, who many crimes is that? In the Oklahoma bombing, is that one crime or 175? The Uniform Crime Report might not be reliable over time.
How is validity related to reliability? Reliability = precision, Validity = bias. Example: scale always weighs ten pounds over, no matter what. It is precise but biased. Or it is reliable/consistent, but it is NOT valid since it is inaccurate and biased.
A scale can be reliable without being valid. Example: gun that is very accurate but it always shoots to the right of the target. It is consistent and precise, but biased or NOT valid.
Reliability can especially be a problem for survey data. Example in the book, page 42-43.
How to test for reliability? One method: Test/Retest Method. Example: Verbal Ability Assessment. Give two tests to the same children and see if the results are correlated. Correlation coefficient is a precise measure of co-movement or association: -1 to +1.
3. Comprehensibility - another consideration. Are your measurements/variables understandable and comprehensible to the audience? Audience could be fellow academics, newspaper readers, legislators, jury, judge, bureaucrats, members of the public, professor, etc. Make sure that they can understand your measure.
4. Cost - consideration. Amount/quality of data may depend on budget. Example: census data, only every ten years. Survey/Interview data: how many people in sample? Might depend on budget/cost. And if data is already available somewhere, it would not be cost-effective to replicate data collection.
5. Completeness - Do the 10-15 questions on a course
evaluation completely cover the important factors for teaching effectiveness?
There is an enormous amount of data that is available, especially now with the Internet. Before you embark on an expensive and extensive data collection project, check to see if the data is already available.
In most research projects, the actual stat analysis part of the project only takes seconds on a computer. In many cases, getting the data is the hard part - time consuming. Whatever can be done to save time collecting data is valuable.
Pages 56-61 in text show Internet data sources.
Used to evaluate citizen satisfaction with public services, for example. Survey data is the most costly type of data to get, so it should only be used when necessary.
1. Mail questionnaires. Advantage: Cheapest survey method. Disadvantages: 1) Low response rate and 2) Responders and Non-responders may be different, so your results may be biased, you violate the assumption of randomness. Maybe unemployed people are more like to have the time to respond.
2. Phone surveys. 97% of households have phones, so it is easy to get a random sample. Disadvantage: More costly than mail, you need to hire people to make phone calls. Advantages: 1) Better response rate and 2) more complete information (people might skip questions by mail). Used extensively for polling (Gallup, etc.), Consumer Confidence Index, etc.
3. Face-to-face Interviews. Advantage:
Best response rate. Disadvantages: 1) Costly and 2) people
are reluctant to let interviewer in their house. Rarely used, except
for consulting projects, or in-house research.