CHAPTER 14 Generalization
· Discuss the issues created by generalizing research results to other populations, including potential problems using college students as research participants.
· Discuss issues to consider regarding generalization of research results to other cultures and ethnic groups.
· Describe the potential problem of generalizing to other experimenters and suggest possible solutions.
· Discuss the importance of replications, distinguishing between exact replications and conceptual replications.
· Distinguish between narrative literature reviews and meta-analyses.
Page 292IN THIS CHAPTER, WE WILL CONSIDER THE ISSUE OF GENERALIZATION OF RESEARCH FINDINGS. When a single study is conducted with a particular sample and procedure, can the results then be generalized to other populations of research participants, or to other ways of manipulating or measuring the variables? Recall from that internal validity refers to the ability to infer that there is a causal relationship between variables. External validity is the extent to which findings may be generalized.
GENERALIZING TO OTHER POPULATIONS
Even though a researcher may randomly assign participants to experimental conditions, rarely are participants randomly selected from the general population. As we noted in and , the individuals who participate in psychological research are usually selected because they are available, and the most available population consists of college students—or more specifically, first- and second-year students enrolled in the introductory psychology course to satisfy a general education requirement. They may also be from a particular college or university, may be volunteers, or may be mostly males or mostly females. So, are our research findings limited to these types of subjects, or can we generalize our findings to a more general population? After considering these issues, we will examine the larger issue of culture and how research findings can be generalized to different cultural groups.
Smart (1966) found that college students were studied in over 70% of the articles published between 1962 and 1964 in the Journal of Experimental Psychology and the Journal of Abnormal and Social Psychology. Sears (1986) reported similar percentages in 1980 and 1985 in a variety of social psychology journals; Arnett (2008) found that 67% of the articles in the 2007 volume of the Journal of Personality and Social Psychology used college student samples. The potential problem is that such studies use a highly restricted population. Sears points out that most of the students are first-year students and sophomores taking the introductory psychology class. They therefore tend to be young and to possess the characteristics of emerging adults: a sense of self-identity that is still developing, social and political attitudes that are in a state of flux, a high need for peer approval, and unstable peer relationships. They are intelligent with high cognitive abilities. Thus, what we know about “people in general” may actually be limited to a highly select and unusual group. Indeed, Peterson (2001) found that students, as a group, are more homogenous than nonstudent samples. That is, students are more similar to each other than adults are similar to other adults in the general population.
Research by Henry (2008) illustrates how the use of college students may affect the external validity of research on prejudice. In his sample of articles Page 293from 1990 to 2005, an increasing percentage of studies used college students as participants. Further, in looking at the actual results of studies on prejudice that compared college students with adults, he reported a variety of differences among adults and college students. For example, college students were less conservative and rated women and ethnic minorities more favorably.
Researchers usually must ask people to volunteer to participate in their research. At many colleges, introductory psychology students are required either to volunteer for research or to complete an alternative project. If you are studying populations other than college students, you are even more dependent on volunteers—for example, asking people at a homeowners’ association meeting to participate in a study of marital interaction or conducting research on the Internet in which people must go to your web page and then agree to participate in the study, or conducting a telephone survey of county residents to determine health care needs. In all these cases, external validity of the findings may be limited because the data from volunteers may be different from what would be obtained with a more general sample. Some research indicates that volunteers differ in various ways from nonvolunteers. In their comprehensive study on the topic, Rosenthal and Rosnow (1975) reported that volunteers tend to be more highly educated, of a higher socioeconomic status, more in need of approval, and more social.
Further, different kinds of people volunteer for different kinds of experiments. In colleges, there may be a sign-up board with the titles of many studies listed or a web page that manages research participants and volunteer opportunities for the university. Different types of people may be drawn to the study titled “problem solving” than to the one titled “interaction in small groups.” Available evidence indicates that the title does influence who signs up (Hood & Back, 1971; Silverman & Margulis, 1973).
Another important consideration arises when asking participants to volunteer for online surveys and experiments. Researchers can find potential participants through online survey design services. Psychologists are increasingly using Amazon Mechanical Turk ( ; Jacquet, 2011), a website for recruiting people to work on many types of tasks including participating in research for a specified payment. This sort of sampling strategy has important implications for external validity. While the online sample is more diverse than the typical college student sample, there are still generalization issues because Internet users represent a unique demographic. The Pew Research Center’s Internet and American Life Project (Pew Internet, 2010) found that living in an urban/suburban area, being college educated, being younger, and having a higher income are all related to reporting more time online. Thus, by asking Page 294for volunteers for an online survey, researchers are sampling from a particular demographic that may not generalize well to the population of interest.
Sometimes, researchers use only males or only females (or a very disproportionate ratio of males to females) simply because this is convenient or the procedures seem better suited to a particular gender. Given the possible differences between males and females, however, the results of such studies may not be generalizable (Denmark, Russo, Frieze, & Sechzer, 1988). Denmark et al. provide an example of studies on contraception practices that use only females because of stereotypical assumptions that only females are responsible for contraception. They also point out several other ways that gender bias may arise in psychological research, including confounding gender with age or job status and selecting response measures that are gender-stereotyped. The solution is to be aware of possible gender differences and include both males and females in our research investigations. Moreover, it is important to recognize the ways that males and females might differentially interpret independent variable manipulations or questions asked in a questionnaire.
The location that participants are recruited from can also have an impact on a study’s external validity. Participants in one locale may differ from participants in another locale. For example, students at UCLA may differ from students at a nearby state university, who in turn may differ from students at a community college. People in Iowa may differ from people in New York City. Thus, a finding obtained with the students in one type of educational setting or in one geographic region may not generalize to people in other settings or regions. In fact, studies have explored how personality traits like extraversion (the tendency to seek social stimulation) and openness to new experiences vary across geographic areas. Rentfrow, Gosling, and Potter (2008) looked at geographic differences in personality traits among citizens of various U.S. states and found extraversion to vary by state. People in midwestern states tended to be more extraverted than people in northeastern states, and people in western states tended to be more open to new experiences. Thus, a study conducted in one location may not generalize well to another, particularly if the variables in question are related to location in some way.
Whether theories and research findings generalize across cultures is a critically important issue. Some observers of current psychological research have been very critical of the types of samples employed in behavioral research. Based on analyses of published research by Arnett (2008) and others, Henrich, Heine, and Norenzayan (2010) contend that psychology is built on the study of WEIRD Page 295(Western, Educated, Industrialized, Rich, Democratic) people. In many cases, research samples consist primarily of college students from the United States, other English-speaking countries, and Europe. Ultimately, researchers wish to discover aspects of human behavior that have universal applications but in fact cannot generalize beyond their limited samples. This is, at its heart, a critique of the external validity of behavioral research: Does our human behavioral research generalize to all humans, or is it really a study of the WEIRD?
Clearly, if psychologists want to understand human behavior, they must understand human behavior across and among cultures (Henrich et al., 2010; Miller, 1999). Miller described research on self-concept by Kitayama, Markus, Matsumoto, and Norasakkunkit (1997) to illustrate the benefits of incorporating culture into psychological theory. Traditional theories of self-concept are grounded in the culture of the United States and Western Europe; the “self” is an individualistic concept where people are independent from others and self-enhancement comes from individual achievements. Kitayama and his colleagues take a broader, cultural perspective: In contrast to the U.S. meaning of self, in other cultures the “self” is a collective concept in which self-esteem is derived from relationships with others. Often, Japanese engage in self-criticism, which can be seen as relationship-maintaining, whereas Americans work to maintain and enhance self-esteem—thus, very different activities contribute to a positive self-concept in the two cultures (Kitayama et al., 1997). This is a very common theme in research that incorporates culture in psychological processes: “The significance of self-esteem, however, may be much more specific to a culture than has typically been supposed in the literature” (p. 1262).
Much of this cultural research centers on identifying similarities and differences that may exist in personality and other psychological characteristics, as well as ways that individuals from different cultures respond to the same environments (Matsumoto, 1994). Research by Kim, Sherman, and Taylor (2008) provides another example of the limits of external validity across cultural groups. This research focused on how people from different cultures use social support to cope with stress. In reviewing the research on the topic, they concluded that Asians and Asian Americans might benefit from different styles of social support as compared with European Americans. For example, Asian Americans are more likely to benefit from support that does not involve the sort of intense disclosure of personal stressful events and feelings that is the hallmark of support in many European American groups. Rather, they suggest that Asians and Asian Americans may benefit more from support that comes with the comforts of proximity (being with close friends) rather than sharing.
These examples all focused on differences among cultures. Many studies also find similarities across cultures. Evolutionary psychologists, for instance, often conduct studies in different cultural groups because they are looking for similarities across cultures in order to see if a particular behavior or attitude can be tied to our evolutionary past. For example, Singh, Dixson, Jessop, Morgan, and Dixson (2010) wanted to see if a particular aspect of beauty that is tied to greater reproductive success—namely waist-to-hip ratio (e.g., the ratio for Page 296a 25-inch waist and 35-inch hips is .71), which is related to sex hormones and thus fertility—would be seen as attractive across cultures. Diverse groups from Africa, Samoa, Indonesia, and New Zealand evaluated photographs of females with small and large waist-to-hip ratios. The researchers found that indeed, low waist-to-hip ratio among females was seen as more attractive across all these groups. In this example, the results obtained in one culture do generalize to other cultures.
We noted in that about 7% of psychological research is conducted with nonhuman animals. Almost all of this research is done with rats, mice, and birds. Most research with other species is conducted to study the behavior of those animals directly to gather information that may help with the survival of endangered species and increase our understanding of our bonds with nonhuman animals such as dogs, cats, and horses ( ).
The basic research that psychologists conduct with nonhuman animals is usually done with the expectation that the findings can be generalized to humans. This research is important because the research problems that are addressed require procedures such as long-term observation that could not be done with human samples. We do expect that we can generalize as our underlying biological and behavioral patterns are shared. In fact, the value of studying nonhuman animals has been demonstrated by research that does apply to humans. These applications include the biological bases of memory, food preferences, sexual behavior, choice behavior, and drug addictions. The American Psychological Association has prepared a brochure on animal research: ( ).
In Defense of College Students
It is easy to criticize research on the basis of subject characteristics, yet criticism by itself does not mean that results cannot be generalized. Although we need to be concerned about the potential problems of generalizing from unique populations such as college students (cf. Sears, 1986), we should also keep several things in mind when thinking about this issue. First, criticisms of the use of any particular type of subject, such as college students, in a study should be backed with good reasons that a relationship would not be found with other types of subjects. College students, after all, arehuman, and researchers should not be blamed for not worrying about generalization to a particular type of subject if there is no good reason to do so. Moreover, college student bodies are increasingly diverse and increasingly representative of the society as a whole (although college students will always be characterized as having the ability and motivation to pursue a college degree). Second, replication of research studies provides a safeguard against the limited external validity of a single study. Studies are replicated at other colleges using different mixes of students, and Page 297many findings first established with college students are later applied to other populations, such as children, aging adults, and people in other countries. It is also worth noting that Internet samples are increasingly used in many types of studies. Although such studies raise their own issues of external validity, they frequently complement studies based on college student samples.
GENERALIZING ACROSS METHODS
The person who actually conducts the experiment is the source of another external validity problem. In most research, only one experimenter is used, and rarely is much attention paid to the personal characteristics of the experimenter (McGuigan, 1963). The main goal is to make sure that any influence the experimenter has on subjects is constant throughout the experiment. There is always the possibility, however, that the results are generalizable only to certain types of experimenters.
Some of the important characteristics of experimenters have been discussed by Kintz and his colleagues (Kintz, Delprato, Mettee, Persons, & Schappe, 1965). These include the experimenter’s personality and gender and the amount of practice in the role of experimenter. A warm, friendly experimenter will almost certainly produce different results from a cold, unfriendly experimenter. Participants also may behave differently with male and female experimenters. It has even been shown that rabbits learn faster when trained by experienced experimenters (Brogden, 1962)! The influence of the experimenter may depend as well on the characteristics of the participants. For example, participants seem to perform better when tested by an experimenter of the other sex (Stevenson & Allen, 1964).
One solution to the problem of generalizing to other experimenters is to use two or more experimenters. A fine example of the use of multiple experimenters is a study by Rubin (1975), who sent several male and female experimenters to the Boston airport to investigate self-disclosure. The experimenters revealed different kinds of information about themselves to both male and female travelers and recorded the passengers’ self-disclosures in return. One interesting result was that women tended to reveal more about themselves to male experimenters, and men tended to reveal more about themselves to female experimenters.
Pretests and Generalization
Researchers are often faced with the decision of whether to give a pretest. Intuitively, pretesting seems to be a good idea. The researcher can be sure that the groups are equivalent on the pretest, and it is often more satisfying to see that individuals changed their scores than it is to look only at group means on a posttest. A pretest also enables the researcher to assess mortality (attrition) effects when it is likely that some participants will withdraw from an experiment. Page 298If you give a pretest, you can determine whether the people who withdrew are different from those who completed the study.
Pretesting, however, may limit the ability to generalize to populations that did not receive a pretest. (cf. Lana, 1969). Simply taking the pretest may cause subjects to behave differently than they would without the pretest. Recall from that a Solomon four-group design (Solomon, 1949) can be used in situations in which a pretest is desirable but there is concern over the possible impact of taking the pretest. In the Solomon four-group design, half of the participants are given the pretest; the other half receive the posttest only. That is, the same experiment is conducted with and without the pretest. Mortality effects can be assessed in the pretest conditions. Also, the researcher can examine whether there is an interaction between the independent variable and the pretest: Are posttest scores on the dependent variable different depending on whether the pretest was given? Sometimes, researchers find that it is not feasible to conduct the study with all four groups in a single experiment. In this case, the first study can include the pretest; the study can be replicated later without the pretest.
Generalizing from Laboratory Settings
Research conducted in a laboratory setting has the advantage of allowing the experimenter to study the impact of independent variables under highly controlled conditions. The internal validity of the research is the primary consideration. The question arises, however, as to whether the artificiality of the laboratory setting limits the ability to generalize what is observed in the laboratory to real-life settings.
Mook (1983) articulated one response to the artificiality issue: Generalization to real-life settings is not relevant when the purpose of the study was to investigate causal relationships under carefully controlled conditions. Mook is concerned that a “knee-jerk” criticism of laboratory research on the basis of external validity is too common. Good research is what is most important.
Another response to the laboratory artificiality criticism is to examine the results of field experiments. Recall from that in a field experiment, the researcher manipulates the independent variable in a natural setting—a factory, a school, or a street corner, for example.
Anderson, Lindsay, and Bushman (1999) asked whether laboratory and field experiments that examine the same variables do in fact produce the same results. To answer this question, they found 38 pairs of studies for which a laboratory investigation had a field experiment counterpart. The studies were drawn from a variety of research areas including aggression, helping, memory, leadership style, and depression. Results of the laboratory and field experiments were in fact very similar—the effect size of the independent variable on the dependent variable was very similar in the two types of studies. Thus, even though lab and field experiments are conducted in different settings, the results Page 299are complementary rather than contradictory. When findings are replicated using multiple methods, our confidence in the external validity of the findings increases.
SUPPORTING GOOD EXTERNAL VALIDITY
It may seem as if no research can possibly be generalizable! In some ways, this is true. Furthermore, it can be very difficult to understand the extent to which a study is generalizable; external validity is an aspect of a study that we try to assess, but cannot truly know. How, then, can we support good external validity? There are a few ways that external validity can be supported.
The key way that external validity can be supported is related to a study’s methodology. Using a census, or a random sample will always produce better external validity than using a nonrandom sample. This, of course, is not always possible. Next, we will explore a few other ways in which external validity can be supported.
Generalization as a Statistical Interaction
The problem of generalization can be thought of as an interaction in a factorial design (see ). An interaction occurs when a relationship between variables exists under one condition but not another or when the nature of the relationship is different in one condition than in another. Thus, if you question the generalizability of a study that used only males, you are suggesting that there is an interaction between gender and the independent variable. Suppose, for example, that a study examines the relationship between crowding and aggression among males and reports that crowding is associated with higher levels of aggression. You might then question whether the results are generalizable to females.
shows four potential outcomes of a hypothetical study on crowding and aggression that tested both males and females. In each graph, the relationship between crowding and aggression for males has been maintained. In Graph A, there is no interaction—the behavior of males and females is virtually identical. Thus, the results of the original all-male study could be generalized to females. In Graph B, there is also no interaction; the effect of crowding is identical for males and females. However, in this graph, males are more aggressive than females. Although such a difference is interesting, it is not a factor in generalization because the overall relationship between crowding and aggression is present for both males and females.
Graphs C and D do show interactions. In both, the original results with males cannot be generalized to females. In Graph C, there is no relationship between crowding and aggression for females. In Graph D, the interaction tells us that a positive relationship between crowding and aggression exists for males but that a negative relationship exists for females. As it turns out, Graph D describes the results of several studies on this topic (cf. Freedman, Levy, Buchanan, & Price, 1972).
Outcomes of a hypothetical experiment on crowding and aggression
Note: The presence of an interaction indicates that the results for males cannot be generalized to females.
Researchers can address issues of external validity that stem from the use of different populations by including subject type as a variable in the study. By including variables such as gender, age, or ethnic group in the design of the study, the results may be analyzed to determine whether there are interaction effects like the ones illustrated in .
The Importance of Replications
Replication of research is a way of overcoming any problems of generalization that occur in a single study. There are two types of replications to consider: exact replications and conceptual replications.
Exact replications An exact replication is an attempt to replicate precisely the procedures of a study to see whether the same results are obtained. A researcher who obtains an unexpected finding will frequently attempt Page 301a replication to make sure that the finding is reliable. If you are starting your own work on a problem, you may try to replicate a crucial study to make sure that you understand the procedures and can obtain the same results. Often, exact replications occur when a researcher builds on the findings of a prior study. For example, suppose you are intrigued by Singh et al.’s (2010) research on waist-to-hip ratio that was mentioned previously. Singh reports that males rate females with a ratio of .70 as most attractive. In your research, you might replicate the procedures used in the original study and expand on the original research. For example, you might study this phenomenon with males similar to those in the original sample as well as males from different cultures or age groups. When you replicate the original research findings using very similar procedures, your confidence in the external validity of the original findings is increased.
The “Mozart effect” provides us with an interesting example of the importance of replications. In the original study by Rauscher, Shaw, and Ky (1993), college students listened to 10 minutes of a Mozart sonata. These students then showed better performance on a spatial-reasoning measure drawn from the Stanford-Binet Intelligence Scale than students exposed to a relaxation tape or simple silence. This finding received a great deal of attention in the press as people quickly generalized it to the possibility of increasing children’s intelligence with Mozart sonatas. In fact, one state governor began producing Mozart CDs to distribute in maternity wards, and entrepreneurs began selling Mozart kits to parents over the Internet. Over the next few years, however, there were many failures to replicate the Mozart effect (see Steele, Bass, & Crook, 1999). We noted above that failures to replicate may occur because the exact conditions for producing the effect were not used. In this case, Rauscher and Shaw (1998) responded to the many replication failures by precisely describing the conditions necessary to produce the Mozart effect. However, Steele et al. (1999) and McCutcheon (2000) were unable to obtain the effect even though they followed the recommendations of Rauscher and Shaw. Research on the Mozart effect continues. Some recent findings suggest that the effect is limited to music that also increases arousal; it is this arousal that can cause better performance following exposure to the music (Thompson, Schellenberg, & Husain, 2001). Bangerter and Heath (2004) present a detailed analysis of the development of the research on the Mozart effect.
A single failure to replicate does not reveal much, though; it is unrealistic to assume, on the basis of a single failure to replicate, that the previous research is necessarily invalid. Failures to replicate share the same problems as nonsignificant results, discussed in . A failure to replicate could mean that the original results are invalid, but it could also mean that the replication attempt was flawed. For example, if the replication is based on the procedure as reported in a journal article, it is possible that the article omitted an important aspect of the procedure. For this reason, it is usually a good idea to write to the researcher to obtain detailed information on all of the materials that were used in the study.
Page 302Several scientific societies are encouraging systematic replications of important scientific findings. The journal Perspectives on Psychological Science (published by the Association for Psychological Science) is sponsoring the publication of Registered Research Replications ( ). Multiple groups of researchers will undertake replications of important studies using procedures that are made public before initiating the research. When completed, all of the replications will be described in a single report. In addition to the Psychological Science initiative, the …