Why do evaluation researchers in crime and justice choose

non-experimental methods?1

CYNTHIA LUM*,** College of Criminal Justice, Northeastern University, Boston, MA 02115, USA

* corresponding author: E-mail:

SUE-MING YANG Department of Criminology, University of Maryland, College Park, MD 20742, USA

Abstract. Despite the general theoretical support for the value and use of randomized controlled

experiments in determining Fwhat works_ in criminal justice interventions, they are infrequently used in

practice. Reasons often given for their rare use include that experiments present practical difficulties and

ethical challenges or tend to over-simplify complex social processes. However, there may be other

reasons why experiments are not chosen when studying criminal justice-related programs. This study

reports the findings of a survey of criminal justice evaluation researchers as to their methodological

choices for research studies they were involved in. The results suggest that traditional objections to

experiments may not be as salient as initially believed and that funding agency pressure as well as

academic mentorship may have important influences on the use of randomized controlled designs.

Key words: criminal justice evaluation, evaluation research, experiments, scientific validity, what works

No question has simultaneously dominated practice and research in criminology

and criminal justice more so than Fwhat works?_ in reducing crime, recidivism, and crime-related risk factors. Since a number of highly influential reports in the 1970s

and 1980s indicating a grim future for many criminal justice programs and policies

(see Kelling et al. 1974; Lipton et al. 1975; Spelman and Brown 1984), the push

towards evaluating criminal justice interventions to find effective treatments has

defined the role of a number of researchers. This emphasis on program

effectiveness can be seen in systematic reviews of programs (most notably,

Sherman et al.’s 1997 report to congress), the increased use of meta-analyses to

draw more parsimonious conclusions from the plethora of evaluation research (see,

e.g., Andrews et al. 1990; Cox et al. 1995; Dowden et al. 2003; Lipsey and Wilson

1993; Logan and Gaes 1993; Lösel and Koferl 1989; Prendergast et al. 2000;

Wilson 2000, 2001; Wilson et al. 2000, 2001; Whitehead and Lab 1989), and the

establishment of the Campbell Collaboration,2 an organization which advocates for

higher quality research and evidence-based policy (see Farrington and Petrosino

2001; Petrosino et al. 2001).

**In August 2005, Dr. Lum’s affiliation will change to George Mason University.

Journal of Experimental Criminology (2005) 1: 191–213 # Springer 2005

A natural development from this Fwhat works_ pursuit has become assessing the quality of these evaluations. The believability of evaluation research depends not

only on the theoretical sense of what is being evaluated but also upon the eval-

uation’s methodological quality. For example, we are cautious of the results of

studies which evaluate the effectiveness of drug treatment or incarceration using a

sample of individuals who most likely would not re-offend even without any inter-

vention. Ensuring that scientifically valid approaches are used when evaluating the

effects of treatment is imperative when asserting that a treatment or policy Fworks_ or Fdoesn’t work_ in reducing crime, criminality, or crime-related risk factors (Cook and Campbell 1979; Farrington 2003a; Farrington and Petrosino 2001; Shadish

et al. 2002; Sherman et al. 1997; Weisburd and Petrosino, forthcoming).

Broadly, scientific validity emphasizes that the methodology used in an eval-

uation of a criminal justice intervention maintains certain standards that contribute

to greater believability in asserted conclusions. Although different types of scien-

tific validity have been articulated (see Cook and Campbell 1979; Farrington

2003a), scholars have argued that internal and external validity are especially im-

portant to methodological quality (Farrington 2003a; Farrington and Petrosino

2001; Shadish et al. 2002). Specifically, external validity refers to Bthe generali- zability of causal relationships across different persons, places, times, and opera-

tional definitions of interventions and outcomes^ (Farrington 2003a: 54). External validity can be maximized by choosing random samples from a population

(Farrington 2003a), replicating the treatment on different samples and conditions,

and continually evaluating the intervention (McCord 2003). Internal validity

refers to an evaluator’s ability to determine whether the intervention did in fact

cause a change in the outcome measured or that treatment effects can be clearly

distinguished from other effects (Shadish et al. 2002: 97). Internal validity is

often maximized through the use of the experimental design as the evaluation


An experimental design establishes internal validity by randomly allocating a

population of interest (or sample thereof ) into different conditions, treatments, or

programs to isolate the effects of those conditions from other possible factors that

may contribute to group differences. Random allocation of treatment programs en-

sures that there is no systematic bias that divides subjects into treatment and control

groups (Campbell and Stanley 1963; Farrington and Petrosino 2001). Specifically,

random allocation allows for the assumption of equivalence between treatment and

comparison groups, a necessary condition to Frule out_ other confounding factors that might explain differences between groups after treatment (Weisburd 2003).

Thus, as Cook (2003) emphasizes, random allocation provides an appropriate

counterfactual in the control group, showing what would happen had the treatment

not been administered. Therefore, when carefully designed and implemented, a

randomized controlled experiment is regarded as highly useful in contributing to

the believability of the results of evaluation research (Boruch et al. 2000a; Burtless

1995; Cook 2003; Sherman 2003; Weisburd 2000, 2001).

The use of experiments has been supported not only on these scientific and

statistical grounds in determining Fwhat works,_ but also, as Cook (2003) points out,


empirical evidence indicates that real differences can exist between results of

experiments and non-experiments. For criminal justice experiments, Weisburd

et al. (2001) found that non-experimental evaluations in criminal justice tended to

result in more positive or Fit works_ findings compared to experimental evaluations, perhaps leading to false conclusions about program effectiveness (see Gordon and

Morse, 1975, who found similar findings in social research generally). Further-

more, meta-analysts have found differences in the size or magnitude of effects

depending on the evaluation method used. Effect sizes in experimental evaluations

can be larger (see Wilson et al. 2001), smaller (see Wilson et al. 2000), or without

significant difference (see Lipsey and Wilson 1993; Whitehead and Lab 1989)

compared to non-experiments. Some have also justified the importance of

experimental over non-experimental methods on other grounds, including that it

is unethical to not use randomized experiments to discover whether a program is

effective or harmful (Boruch 1976; McCord 2003; Weisburd 2003) or that

experimentation benefits policy and normative practice (Boruch et al. 2000a; Cook

2003). Clearly there is, at least in theory, justification for the use of randomized

experiments in evaluating the effects of social programs.

The choice to use experiments in criminal justice evaluations

Despite this general methodological justification for the use of the randomized

controlled experiment, this type of design is infrequently used when evaluating

criminal justice interventions (Shepherd 2003). In Sherman et al. (2002), the most

comprehensive collection to date of criminal justice evaluations in the United

States, of the 657 evaluations listed and summarized, 84% used non-experimental

methods to draw conclusions about treatments while only 16% used an experimental

methodology. A variety of reasons have been hypothesized and well documented

that might account for this large discrepancy.

The most common arguments against the use of randomized experiments often

involve practical or ethical concerns (for a review of some of these arguments, see

Boruch 1976; Clarke and Cornish 1972; Cook 2003; Farrington 1983; Shepherd

2003; Stufflebeam 2001; Weisburd 2000). Experimentation is seen as difficult to

conduct in non-clinical settings either due to a variety of problems such as imple-

mentation issues (Boruch 1976; Petersilia 1989), convincing practitioners to

participate (Feder et al. 2000), or ethical or moral dilemmas in treating some

individuals and not others based on a random allocation scheme (Boruch et al.

2000b; Clarke and Cornish 1972). Many of these arguments do not challenge

experimentation in theory, but rather recognize the limitations of randomized

controlled experiments in practice. Others have also argued that the use of

experimental designs may be inadequate in capturing the complex social or research

environment, and some have challenged its limited use or its ability to maximize

methodological quality (see Burtless 1995; Clarke and Cornish 1972; Heckman and

Smith 1995; Pawson and Tilley 1994, 1997; Stufflebeam 2001).


While reasons of practicality, ethics, and lack of complexity point specifically to

concerns with the experimental methodology itself, there may also be other reasons

besides methodological considerations that might influence researcher decisions. For

example, early academic experiences, including the influence of mentors and

academic advisors, may influence the methodological choices that a researcher

makes in his or her academic career. Literature on academic socialization indicates

that many of these factors can influence the success, productivity, and quality of a

scientist’s research (Corcoran and Clark 1984; Reskin 1979). Generally, academic

mentorship is believed to be a positive influence on researcher productivity, job

placement, and career development (Cameron and Blackburn 1981; Clark and

Corcoran 1986; Raul and Peterson 1992). Along similar lines, mentorship might

have important influences on the methodological choices a researcher makes. It

may be the case that evaluation researchers who have worked with mentors or

advisors on experiments in the past tend to go on to conduct experiments in the


The academic discipline from which researchers come or the formal academic

training a researcher has received may also contribute to various methodological

choices that evaluation researchers make when studying criminal justice inter-

ventions. Those conducting criminal justice evaluations come from a variety of

disciplines and backgrounds; perhaps the shying away from experimentation comes

from scientific biases within certain academic disciplines. Wanner et al. (1981)

found differences across academic disciplines generally in terms of research

productivity. Differences might also exist in theoretical and methodological norms

as well as what is considered Fscientific_ across different disciplines (Kuhn 1970). In terms of criminal justice evaluations, the most obvious differences might be

reflected in biases toward certain subject matters. Psychologists may examine the

effects of programs which attempt to affect risk factors, early childhood development,

or psychological treatment in prisons, while those from the field of education may be

more concerned about school-related programs. Those trained in criminology or

criminal justice may focus on programs in traditional criminal justice institutions.

Related to these choices might be differences in the disciplinary biases, nature,

mechanisms, and subjects of research, which may also help to shape the choice of

research method. For example, the reliance on experiments is much more common

in psychology than sociology, criminology, or education because of the traditional

use of laboratory experiments in psychological research. In some cases, as Palmer

and Petrosino (2003) discovered, the replacing of psychologists with other social

scientists in research agencies can also contribute to the decline in the use of

experiments in evaluation research conducted by those agencies. The reliance on

experiments is much more common in psychology than sociology, criminology, or

education because of the traditional use of laboratory experiments in psychological


Studies have also indicated that government funding and external pressures can

influence research methodology. In a historical account of the use of the randomized

controlled experiment in criminal justice, Farrington (2003b) suggested that James

Stewart of the National Institute of Justice (NIJ) was highly influential during the


1980s in advocating the use of experiments for projects receiving NIJ funding.

Garner and Visher (2003), when reviewing the funding awarded by NIJ in the

1990s, found that the number of awards and the total amount of funds awarded to the

research using randomized experiments declined across that time.3 Palmer and

Petrosino (2003) also have emphasized the importance of funding agencies on the

choices of research methods when analyzing the changes and transition of the

California Youth Authority (CYA). They found that when the National Institute of

Mental Health (NIMH) was the main funding agency of CYA, the support of

randomized trials by NIMH constituted a powerful incentive for the CYA to use

them (Palmer and Petrosino 2003: 240). However, Palmer and Petrosino also

discovered that when the Law Enforcement Administrator Authority (LEAA)

became CYA’s new funding agency, this changed the research orientation of CYA

as LEAA encouraged the use of short-term, quick analysis over long-term experi-

ments (Palmer and Petrosino 2003: 243Y244). While many of these arguments have been either hypothesized or made on a

case-by-case basis, how salient are they? Cook (2003) r

Order now and get 10% discount on all orders above $50 now!!The professional are ready and willing handle your assignment.