Chapter 2 - Hypotheses and Research Design

We will discuss how to scientifically approach applied research using generally accepted methods. For example, suppose we wanted to investigate the following issues:

1. Do gun control laws reduce crime?  Or do concealed weapon permits decrease violent crime?
2. Do HMOs provide better health care than fee-for-service plans?
3. Do lower speed limits reduce accidents/deaths?
4. Does educational choice improve academic performance?
5. Do issues influence voting behavior more than party?
6. Do minimum wage laws increase unemployment?
7. What effect does not completing high school have on income?

Each of these issues is different, but to answer these questions we would take a scientific approach using accepted statistical methods.

Part of the scientific approach that is common to all applied statistical research is the idea of: DEPENDENT VS. INDEPENDENT VARIABLES.

Independent variables are the potentially causal, or explanatory variables that influence the Dependent variable in a systematic, predictable way.

Cause (independent) - Effect (dependent)

Examples: DEPENDENT = f (INDEPENDENT VARS)                [f = function of]

                             +          +         +           -
Corn Yield = f (fertilizer, rain, sunshine, weeds, etc.)

Demand GM cars = f (price of GMs, income, interest rates, advertising, price of gas, price of Fords/Toyotas, unemployment rate, consumer confidence, tariffs, etc.)

                                   +
Traffic Deaths = f (Speed limits) DRAW GRAPH

                                -
Crime Rate = f (Law Enforcement) DRAW GRAPH
 

Note: Causality is very tricky. Warnings:

1. Causality can be bi-directional. There may not be a clear distinction between dep and indpt vars.

Example: Income = f (EDUC); and EDUC = f (INC)

2. All we can really prove is statistical correlation, or statistical relationships, or statistical association.

Proving statistical association is not the same as proving causality. Two vars can move together due to chance, or there could be bi-directional causality. We can never PROVE causality. There is NO statistical test for causality!

3. Post Hoc Ergo Propter Hoc - Temporal precedence doesn't prove causality. Event A now - Event B latter. A doesn't necessarily "cause" B.

Examples: Day off work, rains.
 

Once we establish that there is a potential statistical relationship between a Dependent and an Independent Variable, we next state the relationship formally in the form of a TESTABLE hypothesis.

Examples of testable hypotheses:

1. Decreasing the speed limit will decrease traffic fatalities.
2. Increasing the number of police will reduce the crime rate.
3. Providing tax breaks for new industry will increase job formation.

Each statement implies a causal relationship: an indpt var is hypothesized to affect a dependent variable.

Independent Vars - speed limit, number of police, tax breaks
Dept Vars - traffic fatalities, crime rate, jobs

In policy analysis, the policy is usually the independent variable, the dependent variable is the intended or predicted or desired effect.

What effect does the minimum wage law have on unemployment of teenagers?
What effect does NAFTA have on U.S. farm income?
What effect do vouchers have on academic performance?
What effect do tax breaks have on the number of new businesses in Genesee or Oakland County?

A hypothesis has to be in a testable format, as above. In each case we can measure the dependent and independent vars and we can perform a statistical test of the hypothesis. We could accept or reject each hypothesis.

For reasons we will cover later, most hypotheses are stated as a Null Hypothesis (Ho). "Null" meaning "no effect."

Examples:
1. Speed limits have NO effect on traffic fatalities.
2. The number of police has NO effect on crime.
3. Tax breaks have NO effect on jobs.
4. There is NO difference between M/F salaries.
5. Race has NO effect on traffic stops/tickets/searches.
 

In each case, we might expect to REJECT the Ho if there is a statistical relationship between the two variables.

Example: We would expect to reject the Ho that speed limits have NO effect, and find that speed limits DO have an effect.  We would expect to reject the Null, and find that the number of police DO have an effect on crime, etc.

In applied research, we have to eventually think in terms of stating relationships between dependent and independents vars in the form of a testable hypothesis.

Other Issues:

Unit of analysis or level of analysis.

Important concept: Units can be individuals, cities, counties, states, countries, schools, school districts, families, households, companies, years, months, quarters, etc.

The dependent and independent variables are measures of some properties or characteristics of the units.

Examples:
Education (variable) is a property of individuals (level of analysis).
Population (variable) is a property of a city or county or state or country (levels of analysis).
Income (variable) could be a property of an individual, or a state or a country or a company, etc. (level of analysis).

We have to determine the appropriate level/unit of analysis in our applied research. It should usually be obvious: individuals, school districts, county, states, etc.  Ask the question: I am going to go out and get data on WHAT?

Example: you want to study the effect of vouchers on educational performance, what would be the possible Units of Analysis?????
 

Control Variables - One of the important considerations in statistical research is the very critical issue of trying to control for all of the possible independent effects and look at the one critical variable under consideration. This comes under the concept of control variables. Ceteris paribus - all other things equal....

Examples: After controlling for education (years), quality of education (Ivy League), years of continuous work experience, marital status, race, we find that women make 95% of what men make.

We try to isolate the wage differential between men and women in the labor force, controlling for all possible variables that would affect income.

For example, it is statistically flawed to compare the average wage of males and females and make a statistical inference from that data. To say that on average, women make only 74% of what men make, and then assume that much of that difference is due to sex discrimination, is not statistically valid. We are violating the ceteris paribus condition - holding everything constant.

There could be reasons that women make less than men besides discrimination - there are possible differences in education, number of years continuous experience, etc. that are explanatory variables.

We have to be concerned about "spurious relationships" - statistical association that is not based on true causality.

Income Differentials between M/F = f (Sex Discrimination)

Income Differentials between M/F = f (Sex Discrimination, Educ, Experience, Marital Status, etc.)

Important Point: We need to try to have a well-specified model that includes all relevant variables. A well specified model controls for all of the possible factors that affect the dependent variable.

Example: Smoking research tries to control for all of the relevant factors that affect health - diet, exercise, weight, health history, age, marital status, presence of second hand smoke, alcohol consumption, etc. - and then isolate the effects of smoking only. Two groups that are identical except that one group smokes. All other variables are being controlled for.

For any hypothesis, there are potentially hundred of control variables.  How to decide which ones to use?  Most well-specified equations might have from 4 - 10 independent or control variables, based on 1) what makes most sense and 2) which variables have data that is available.

Some general guides to selecting control variables:

1. Demographic factors: age, education, income, race, sex, socio-economic status, etc. are very often relevant to the issue under consideration, and are very often used in research.  Example: testing for sex or racial discrimination in the labor market - many demographic control variables are used to control for differences in income that are caused by demographic factors.  Or you are testing for the effects of increased gun control laws on homicide rates over time or across states or counties - you would want to control for other factors that might contribute to differences in homicide rates over time or across states or counties: degree of urbanization, age distribution of population, poverty rate, etc.

Goal of a well-specified model is that you could say: after controlling for differences in age, income and urbanization, we find that stricter gun control laws adopted at the state level result in an X% change in homicide rates.  (Unit of analysis: ??)

2. Historical Trends / Time: Sometimes time is an important control variable.  Example: after Policy X is implemented, Variable Y goes up or down significantly.  Possible conclusion: Policy X achieved the intended result.  Another possibility: Variable Y was already increasing/decreasing, there was already an historical trend in place, so Policy X may not have been as effective as advocates suggested.

Example: Suppose that some states (or counties) pass laws that make it easier for people to carry concealed weapons and homicide rates go down.  Conclusion: increased gun ownership lowers crime.  Alternative conclusion: Homicide rates were already declining over the previous ten year period, so that the change in policy did not have the suggested effect.

3. Political and Economic Factors that might have an effect on the dependent variable under investigation.  For example, when looking at the effect of a change in a state's gun policy on crime, we would want to consider economic, legal and political factors that might have an effect on crime: unemployment, homelessness, state sentencing guidelines, availability of public defenders in a state, funding for public safety, changes in administration or public office (new sheriff, governor, mayor, police chief, etc.), or recent changes in public policy, etc.

4. Other Factors - Weather, seasonal factors, holidays, technological change, change in lifestyles, etc. can all influence the dependent variable.  For example, weather can affect crime and traffic accidents, so you wouldn't want to compare crime or traffic accidents in December with crime or traffic accidents during the following July, after a change in policy in June.  One alternative to deal with seasonal influences on variables, is to "seasonally adjust" the data, to account for seasonality (EVIEWS will do this procedure).  Examples of technological change affecting crime rates or traffic accidents would be cell phones, ATM machines, Internet, etc.
 

OPERATIONALIZING THE HYPOTHESES

Translation: Explicitly expressing the null hypothesis in a measurable and testable format.

Example: Issue? Do gun control laws reduce crime? To operationalize this question, we have to express it in testable form, in the form of a testable hypothesis.

Ho: "More gun control laws will result in less crime."

Problems? Too general, not specific enough.

Which gun control laws? We need to look at a specific proposal/legislation. Banning guns, banning hand guns, gun buy backs, tougher gun registration, banning gun shows, mandatory safety locks, waiting periods, etc.

And which crime rates would we expect to be lower?  If handgun ownership becomes illegal, then overall crime rates could go up if people are charged with illegal gun ownership.  Adding a new gun law that people can be charged with could increase overall crime rates?

Alternative Hos that are more specific and "operationalize" the hypothesis of interest:

1. Passing stricter gun registration laws for handguns in MI will lower homicide rates in MI.

2. Passing laws that require longer waiting periods for handgun ownership in MI will lower armed robbery rates in MI.

These are explicitly stated testable hypotheses, they are operational, meaning testable and measurable.

Examples of hypotheses that are NOT operational - see page 15 in book.

Each of these hypotheses contains vague terms that are not MEASURABLE.

"Increase employee cooperation" - how to measure?
"Make the budget process more efficient." - how to measure?
"Improve employee morale and efficiency." - how to measure?

Example: instead of "improve employee morale" we could instead say "reduce employee turnover," which IS measurable.

Main point: We need to express issues in the form of a hypothesis that is MEASURABLE AND TESTABLE; then the issue and hypothesis will be OPERATIONAL.
 

RESEARCH DESIGN

The process of planning the exact format of the research project, according to the scientific method and generally accepted research methods.

We start with the "Experimental Design" which is the most theoretically perfect approach - laboratory study where we have complete control over the conditions of the experiment.  We can create a "control" group, like with rats or mice.  Two groups that identical except for one variable - cyclamates, e.g., to see if cyclamates cause cancer.  Experimental Design means a situation where we can easily control the "ceteris paribus" condition. All things equal - two groups of identical rats except for one variable - cyclamates.

In social science area, we rarely have complete control over the ceteris paribus condition. We usually can't conduct laboratory-type experiments like in the hard sciences ( physics, chemistry, biology, medicine, pharmacology, etc.).  For example, we can't impose gun control laws in MI in 1997, see what happens over the next three years, then go back to 1997 and NOT impose gun laws, wait three years and compare the exact differences.  That would really be the only way to completely control for all other variables except ONE.   Even though the experimental design is not usually realistic for social sciences, it is important to understand why it is the theoretically ideal format (you can control "ceteris paribus").  We have to make compromises doing research in economics, political science, public policy, public administration, etc. since we don't have the luxury of the ideal "experimental design" (laboratory) approach.  Ideal research format to control for "ceteris paribus": randomization, measurement at more than one point in time, and control groups.

Real world example to help us examine the relevant issues in Research Design:

Analysis of the effectiveness of federal Head Start programs - where low income households are eligible to send children to a pre-school program for several years before kindergarten.  To operationalize the issue and express it in a testable form, we could state:

Sample HoVerbal abilities of children after one year in a Head Start program are greater than children NOT in a Head Start program.

Assume that we have conventional, standardized tests to measure verbal abilities and assume that there is agreement on what a Head Start program is, etc. There could be issues here about standardized tests, but we will ignore that for now, and assume that verbal ability is accurately MEASURABLE.

Possible, but incorrect approach, would be to test before and after Head Start. Violates the ceteris paribus condition of holding all other variables constant, except Head Start participation.

Threats to Internal Validity (Book) = violating Ceteris Paribus.  Internally Valid experiment is one where the ceteris paribus condition prevails.

What are the Threats to Internal Validity in the Head Start example of just testing before and after one year? Violations of Ceteris Paribus include:

1. Maturation - over a one year period, most children would naturally increase their verbal abilities, whether in Head Start or not.  We need to control for the natural improvement over a year that CANNOT be attributed to Head Start.

2. Selection Bias - assume that you compared Head Start children to non-Head Start children on a before and after basis.  There might be a "selection bias" because Head Start participation is not random - it might be that concerned parents are most likely to enroll their children in Head Start leading to above average performance, due to Parental Concern, and NOT necessarily Head Start participation.

Rats - chosen at random, Rats don't volunteer to be part of one group or the other. Statistical validity requires random selection, not Self-selection, to avoid "selection bias."

If Head Start is voluntary, then the group that volunteers to participate might be different than the "control" group that doesn't participate. We need to make sure the control group and the non-control group have exactly the same characteristics for internal validity.

3. Sample mortality - drop out problem. Some children may drop out of Head Start.  Not necessarily a problem, except that dropping out may not be RANDOM.  Those that drop out may be the ones that aren't learning/developing well.  If the weakest children academically are those most likely to drop out, then the results will be biased upward.  On the other hand, if the brightest children were most likely to drop out because they were bored, then the results would also be biased.

Again, rats don't drop out of a research project, so sample mortality is not necessarily a problem there.  Maybe a rat would die occasionally during the experiment, but we would hope it was random, and not dependent on whether it was in the control group or not.

4. Testing - Perhaps there is some increase in test scores due to learning how to take the test.  For the first pre-test, the students may be nervous and unfamiliar with the test and the environment; For the post-test, they may be more confident and familiar with the test and more comfortable with the environment, which could result in an upward bias in post-test scores for reasons NOT due to Head Start.

5. Instrumentation - was the second test comparable to the first test? If not there may be a problem with instrumentation - variation in the test instrument. If the second test was easier, or harder, or if the time limit changed, then the results could be biased.  Was it the same test giver for both tests, and were they consistent in their approach?

6. History - possible positive (or negative) external influences during the testing period.  Examples: children watched Sesame Street during the school year, increasing verbal ability. Other possible activities besides Head Start that may have contributed to verbal ability, computer programs like Reader Rabbit, Jump Start, etc.

7. Regression to the mean. The tendency over time for variables to regress towards a historical, long-run mean or average. Example - if both parents are tall, the children will tend to be taller than average, but shorter than the parents. If both parents are short, the children tend to be shorter than average, but taller than their parents. There is a statistical tendency of "regression toward the mean."  If there weren't a regression towards the mean, then we would have a world of giants and midgets.

Therefore, if the pre-test scores of Head Start children are way below the national average, there would be a tendency for the post-test scores to be higher.

Book - if a student gets a 0 on the first test, it is likely that the second test would be higher, even without any education.  If a student gets a 100% on the first test, then it is likely that on the second test that they would get less than 100%.

Important part of applied research is to overcome these potential threats to internal validity - violations of the ceteris paribus condition. We want to follow the scientific method as closely as possible, and conform accepted standards of statistical validity.
 

For example, to strictly conform to the experimental design idea (scientific method) we should proceed as follows:

1. From a list of all eligible children for Head Start, based on age, parents' income, other qualifying criteria, we should RANDOMLY divide the children between the Treatment/Experimental group (Head Start) and the control group (no Head Start).  50% Head Start, 50% no Head Start. This would insure randomization and conform to the Ceteris Paribus condition.

Both groups would be tested before and after one year to assess the effectiveness of Head Start in raising verbal ability. The improvement between pretest and posttest would measure the gain in verbal ability.
 
 

Example:
                                                                     HEAD START
                           Control         Experimental(Treatment)         
Pre-test                     65                      70

Post-test                    82                      90                     
Gain                       +17 points          +20 points

If we just looked at pretest/posttest of the Experiment group, we may have been impressed with a 20 point increase for Head Start children.  But we now have the control group to compare as a benchmark.  However, it is very possible that the 3 point difference is NOT statistically significant; we would need a statistical test to test for statistical significance (we will cover this in Chapter 7).  A 3 point difference can happen easily by CHANCE, notice the 5 point difference in Pre-Test scores of the two RANDOMLY selected groups (65 vs. 70).   Conclusion might be: Head Start students score slightly higher than non-Head Start students, but the difference in performance is NOT statistically significant.

Note: SP500 used as benchmark for investment performance.

The advantage of the experimental design is that we have minimized the violations of the ceteris paribus by having a randomly selected control and experimental group.

Random selection makes sure that both groups have the same characteristics except for the variable under consideration - Head Start. Random selection eliminates Selection Bias. The potential problems of History, Maturation, and Instrumentation are minimized by having a control group. For example, both groups will be equally affected by External Influences like watching Sesame Street.

However, having the control group does not solve the potential problem of Testing - improvements due to learning how to take the test, which would be a factor for both the Control Group and the Treatment Group.

A strategy to control for the problem of Testing is to have 4 groups instead of 2. 2 Control groups and 2 Experimental groups. In the two control groups, one will take the pretest and one will not. Same for the experimental group. See page 21.

We also still have not fully solved the potential problem of attrition, or sample mortality. We would need to keep detailed records of the students who drop out of the program. One possible approach is to completely eliminate all students who drop out from the sample, in other words we would ignore their pretest score. It would not be appropriate to include only their pretest score, because those who drop out might be consistently either behind or ahead of the average.

This example illustrates the use of the Experimental Design approach in policy research. It also illustrates many of the real world problems with ever being able to accurately design a true laboratory-type experiment. We never have as much control over the variables in the real world as in the laboratory.

Rats are excellent subjects, human beings are difficult to work with.

Another problem with the study of Head Start is that it illustrates the real problem of random samples. From a pure statistical viewpoint selection should always be random - 100 rats, 50 in each group at random.

In the Head Start example, there is a legal and ethical problem in creating the control group because the control would consist of children eligible for Head Start who want to participate but are rejected to be in the control group, could result in lawsuits, bad publicity, etc.

On the other hand, we can't force people who are eligible for Head Start to participate in the program as part of our study. And we wouldn't necessarily want to provide incentives, like a cash bonus, because could lead to selection bias.

And so we have to rely on voluntary participation, but that approach has selection bias, because those parents who willingly volunteer may be more supportive of learning in general. In that case, those children may show improvement, but it is hard to separate the effects of Head Start and the positive effects of parental support.

So the human element makes pure experimental design very difficult in the area of social science research, because we a) can't exclude eligible participants from Head Start and b) can't force eligible participants to be in Head Start, therefore we cannot conform strictly to the scientific method.

In addition, a program like Head Start is always changing and evolving and adapting to new ideas, new technology, etc. so that the "Head Start" program is not constant. Since the program is continually changing, it presents a challenge to experimental design, which assumes a certain consistency in the variable under consideration.

For example, if we study cyclamates, we assume that the chemical structure of cyclamates is constant and stable. In contrast, the variable Head Start is not constant and stable.

And there is another potential research problem with Generalization, or a Threat to External Validity. Are our results specific to our sample group or are the results Generalizable to the entire population?

For example, if we have results from a study of Head Start in Flint, MI, are those results applicable to Chicago and/or the rest of the country?

Are the results from a study on the effects of cyclamates on rats applicable to humans?

Are the results of a study on college students applicable to older people, and vice-versa?

Are results on the success/failure of bilingual education for Hispanic students applicable to Vietnamese students?

Generalization is not always a problem, but it can be. For example, if the sample size is very small, our findings might be biased by a small sample size. In general, the larger the sample the better. Depends on costs, budget, time, etc. We should be aware of the problem.

Ex pede Herculeum.

Experimental Design is the ideal applied research approach, but we have to be aware of the potential problems of coercion, exclusion, etc.

Examples of actual approaches to Experimental Design (page 22-25):

1. Testing the effectiveness of work release programs.  Florida Division of Corrections allowed prisoners from a fairly homogeneous group to be randomly assigned to work release program. 2/3 of those prisoners eligible were assigned to work release (Treatment/Experimental Group) for between 6-24 months, and the other 1/3 were kept in prison as a control group.  Prisoners were tracked over time for recidivism factors like further arrests, convictions, incarceration, severity of crimes, etc. to see if work release was effective at reducing recidivism.  Finding: work release did not have a POS or NEG effect on recidivism.

Experimental Design was possible because of the cooperation of the Dept of Corrections.

2. Testing the effects of different penalties for moving traffic violations:
a. Tickets payable by mail
b. Tickets requiring court appearance
c. Warning ticket only
d. Standard ticket - pay ticket at city clerk's office

Selection bias was eliminated by having four different ticket books - officers would get a different ticket book each day and all tickets that day would be one of the four types.  Ticket books would rotate.  Assured randomization.

Cooperation of the police department made experiment possible.

Suppose that you cannot structure a controlled experiment, you can't use Experimental Design. What are the alternatives? Quasi-Experimental Designs, Modified Designs.

1. Time Series Design:

Two basic types of data:

a. Cross-sectional (latitudinal) Data. Observations across individuals, counties, states, countries, etc. at the same point in time. Examples: Census, income across 1000 individuals, income across counties, state or country, etc. at the same point in time.

b. Time series (longitudinal) Data. Observations of the same variable over time. Examples: monthly unemployment rates in Michigan from 1980-1997, monthly inflation rates, monthly stock returns, annual per capita income in Michigan from 1900-1997.

c. Cross-sectional time series. Monthly inflation (times series) in the G-7 countries (cross-sectional) from 1980-1990. Annual Per capita incomes in Michigan counties from 1980-1990.

Time series design - you look at the movement of the variable under consideration over time both before and after a change in policy or some event, and then see if there is an obvious change in the pattern after the implementation of some policy or program or event.

Example: What were the effects of when the Fed govt. raised the speed limit to 65 m.p.h. in 1987 on traffic fatalities?  We could look at traffic fatalities annually from 1980-2000, and then try to determine the effect of raising the speed limit to 65 m.p.h. on fatalities (Note: the unit of analysis is YEARS).

In figure 2-2 on page 26, we can see a times series graph of traffic fatalities/1m from 1980-2000.  We can conclude that the time series evidence shows that the 65 mph speed limit increased traffic fatalities, since there was a downward trend before 1987 and an upward trend after 1987.  But what if people were driving more at around that time, that would be a threat to internal validity, a violation of "ceteris paribus."  Solution: change the dependent variable from Fatalities/1m people to Fatalities/mile driven to control for increased (or decreased) driving.

Suppose that the time series pattern looked like either A, B or C on page 26, our conclusion would be different. In the case of pattern A, fatalities were already increasing, and therefore the law had no significant effect.

In case B, fatalities were increasing, but the law accelerated the rate of increase, so the change in speed limit did have an effect.

In case C, the pattern seem to fluctuating up and down, and it would hard to conclude that the law changed the cyclical, fluctuating pattern.

Caveat: Graphs convey a lot of useful information, but graphical analysis by itself is not true statistical analysis, you are just "eyeballing" the data graphically, sometimes called "inter-ocular least squares."

Caveat: possible instrumentation problem. How exactly is the variable traffic fatality measured? And is the measure consistent over time 1980-2000? Do only fatalities at the accident scene count? What about people who die later from injuries? What about people who are braindead and in a coma for a year? We need to be sure that measurement of dependent variable is consistent over time. Concern of times series research - what has changed over time that needs to accounted/controlled for?

Another way to isolate the effects of a policy change/legislation using time series analysis is to use a neighboring county or state if possible as a control group. If the speed limit is changed in one state only, we could use a neighboring state as a control group. In the speed limit example, the whole country was affected by the 65 mph limit, so we would have to use Canada as a control group, for example. Or Germany.  Enactment of seat belt laws has varied by state, so states could be used as a control group in this situation.

The choice of a control group should be based on trying to find a control group as close to the experimental group as possible. Using Canada or Germany would make sense for the U.S., since many of the characteristics of the economies are the same. We would want to be cautious about using Mexico, Slovokia, China or Zimbabwe since there are many demographic and economic differences.  Using a neighboring state or county would also make sense in regional or local studies.

An example of a time series design is given on page 28-29 in the text.
 

2. Before-and-after-with-control-group design approach.

Time series data is most readily available from governmental units or companies, rather than individuals: Unemployment rates, population, crime rates, stock prices, sales, etc.  When working with individuals, we don't have time series data and we can't always create a control group through a process of random selection, so we try to generate/find a control group with the same characteristics as the experimental group.

Example - Head Start. The experimental group is those children who have voluntarily enrolled in Head Start, say 100 students. We then try to find a group of 100 students NOT enrolled in Head Start that have the same age/sex/race/socio-economic status mix as those in Head Start.  Since verbal abilities differ according to sex and age, we would to replicate the Head Start group as closely as possible, and get the same age/sex mix.

After we identify the control group, we carry out before and after testing as before in the Experimental design.

Disadvantage: selection process is not random, so that control group and experimental group may not be the same.

Advantage: we can carry out an empirical study of Head Start without forcing anyone to participate and without denying participation to anyone.

Reasonable compromise. Traded purity for practicality.

Key: choosing the proper control group.
 

3. Before-and-after-design with NO control group.

In the Head Start example, we would do pretesting and posttesting without a control group.

Problem: it would be hard to distinguish gains in reading due to Head Start from gains due to other factors.

Might work better for academic programs with older children or adults. If we looked at the effects of remedial reading programs for prisoners, then we could isolate the effects better.  All children learn a lot between the age of 4 and 5, whereas most adults don't necessarily learn a lot just because we are a year older.  A 40 year old prisoner who is illiterate wouldn't be learning to read better just because he/she was a year older, for example.

See page 31 for an example of Before-and-After Design with no control group.
 

4. After-only-with-comparison.

No pretest, only posttest, using a control group. Head Start program with control group, comparing posttest scores only.

If Head Start students score 75 on the posttest and the control group scores 82 what is the interpretation? Difficult to say. How close was the experimental group to the control group?

Even with these difficulties, this design is the one most commonly used by political scientists and sociologists in applied research.  In many cases, an evaluation is called for after the program is started, so there is no opportunity to do pretesting.

Survey research is common in after-only-with-comparison design. For example, surveys can be given to users and non-users of public services for comparison.

Example: study of citizen attitudes toward police and effectiveness of a policy outreach program. Survey those who had contact with police and those who didn't. You could control for factors such as age, race and income that might affect attitudes toward police.

Problem with survey data: Relying on people to give you truthful, accurate information.

Example: Drug-use survey among high school or college students. Some drug-using students may be afraid to tell the truth. Some non drug-using students might say they use heroin weekly just be funny.

Innovative way to create a control group and do an After-Only-With-Comparison is illustrated on page 34-35.  Study of pedestrian fatalities in NYC.  Looked at 50 pedestrian fatalities over a 6 month period. To create a control group, the researchers, with the aid of a police officer, went to the exact scene of the accident on the same time and same day of the week as the accident, and they surveyed the first four pedestrians to walk by who matched the sex of the victim.  Data was collected on the people's residence, age, marital status, occupation, race, sobriety (alcohol test), weather conditions, etc.

Comparing the victims to the control group (200 people), researchers found that the pedestrians killed were older (by 17 years), more often foreign born, more often Manhattan residents, less often married, more often of a lower socioeconomic status, more likely to have alcohol in their blood in high concentrations.
 

5. After-only design.  No control group is used, so it is the weakest experimental design, but is very common.

Example: user survey to assess attitudes toward a public service or facility.

Survey of people using a city park to assess their attitudes toward the facility.

Survey of people downtown to assess their attitude toward safety.

Survey of people using public transportation to assess their attitude.

Survey of people using Post Office to determine Customer Satisfaction Percentage. "90% of postal customers rate postal service as above average to excellent."

Course evaluations of professors.

For some issues, the after-only design may be appropriate. You are not trying to assess changes or programs. In fact, maybe you are using it to show that no program change is necessary. If people are satisfied, why change?

See page 36, Table 2-3, for a summary of the different Research Designs. 

Issues in Research Design: Practicality vs. purity, cost, availability of data, confidentiality, ethical considerations, legality, etc.  Main point: Experimental Design is theoretically the best, but usually not feasible/practical/cost-effective/ethical/legal for research in social sciences, so compromises have to be made, trying to conform as closely as possible to the Scientific Method.