InstructorsStudentsReviewersAuthorsBooksellers Contact Us
  DisciplineHome
 TextbookHome
 
 
 
 
 
Textbook Site for:
Understanding Research Methods and Statistics ,An Integrated Introduction for Psychology , Second Edition
Gary W. Heiman - Buffalo State College
Capstone Chapter: Examples of Designing and Evaluating Research

Introduction
Topic 1: Attribution of Arousal
Topic 2: Time Perception
Topic 3: Fear of Success in Females
Topic 4: Creativity

Additional Research Topics


Introduction

The following provides students with the opportunity to practice their knowledge by working through research examples in a structured way. It may also provide some ideas for studies they may want to pursue on their own or for class projects.

For each topic, sporadically there questions that put students you in the role of the researcher who designs,evaluates, and replicates a specific study. To get the most out of these passages, students should not merely read along passively. Instead, they should actively and thoughtfully participate as if they really were the researcher conducting the study.

The first four topics are examined in depth, focusing on a specific study and the additional studies that developed from it. Then, five other research topics are briefly reviewed that suggest various designs. Every research design contains flaws, so for each study discussed there are other approaches a researcher might take.

The numerous details in evaluating and designing studies that were discussed throughout the text boil down to the following key questions:

1. Purpose of study? Is it to test a causal hypothesis, to demonstrate a correlation, or to describe participants or a behavior?

2. Type of behavior studied?Does it involve a subject characteristic, a response to concrete stimuli, or a response to a social interaction?

3. Type of design? Is the study a true experiment or a quasi-experiment? Does it involve a single-subject, correlational, or observational design? Should it be conducted in the field or laboratory?

4. Type of participants and sampling? Will you generalize to a specific population? What selection criteria are needed? How will you sample the population? Will participants be representative?

5. Control of subject variables? What confounding subject variables are present? Should you counterbalance with a between-subjects design or use a within-subjects design?

6. How to manipulate variables? Should you vary instructions or stimuli? Should you employ confederates? Do you have a strong, consistent, and valid manipulation? Do you need a manipulation check? Are there confounding variables present?

7. How to measure variables? Will you examine overt behaviors or self-reports? Will participants be tested individually? Are the scoring criteria sensitive? Are you reliably and validly scoring participants' typical behavior?

8. Procedural problems? Is there experimental realism? Are there order effects, demand characteristics, or other biases present? Should you conduct a pilot study?

9. Materials needed?What materials and apparatus do you need for consistent and comparable stimuli and for reliable measurements?

10. Ethical problems? Are we harming participants or violating their rights? Is deception justified? Have you obtained informed consent and provided a debriefing?

11. Statistical analysis? What is your N (and ns)? Have you maximized power and minimized error variance? Do your scores fit a parametric or nonparametric procedure? Have you employed a between-subjects or a within-subjects design?

12. Validity of conclusions? Do the results clearly confirm your hypothesis? Are there alternative hypotheses that reduce your internal validity? Do you have external, construct, ecological, and temporal validity?
Top of Page


Topic 1: Attribution of Arousal

Schachter and Singer (1962) proposed that when people become physiologically aroused by one stimulus, their emotional response to another stimulus is heightened. In particular, this notion has been applied to the topic of romantic love and sexual attraction. For example, like Romeo and Juliet, some couples exhibit greater attraction for each other when their parents produce arousal by interfering with the relationship (Driscall, Davis & Lipetz, 1972). Researchers have proposed that the heightened physiological arousal due to an extraneous source is "misattributed" (misdiagnosed) as an especially strong romantic attraction toward the member of the opposite sex. Dutton and Aron (1974) tested this proposal in several studies, asking whether participants would misattribute arousal due to fear as being heightened sexual attraction to a member of the opposite sex.

What are the hypothesis and purpose of this study?

What are the variables to be studied, and which design is appropriate?

How would you elicit the dependent behavior?

What sampling would you use, and should you control subject variables?

The hypothesis is that greater attraction to a member of the opposite sex occurs when participants simultaneously experience fear from some other source. Since causation is the issue here, you should conduct a true experiment. The independent variable is amount of fear. The dependent variable is the amount of attraction a participant experiences. Although you might perform a field experiment, let's first discuss a laboratory setting. (See Experiment 3 in Dutton & Aron, 1974.)

This study clearly calls for a confederate who will be the object of attraction. Thus, the manipulation is to create conditions of different levels of fear in participants while a confederate is present. The dependent measure is a participant's attraction to the confederate. Because a person's attraction is not likely to change from one condition to the next, a between-subjects design with a single trial per participant is appropriate.

Participants are tested individually, since in a group they might be inhibited and not show misattribution. To keep the inherent attractiveness of the confederate constant, one confederate serves for all conditions. Regarding subject variables, Dutton and Aron employed a female confederate, thus requiring the testing of male participants. Also, participants should not know the confederate and ideally should have a minimum of other romantic entanglements (engaged or married people may resist feeling, or at least reporting, any attraction). Participants should be the same age as the confederate, and you might limit or balance race and nationality. Otherwise, there is no unique population here, so a random sample from available college students is sufficient.

You must now define how to manipulate participants' fear. You could arrange for an "accident" that raises anxiety or concoct an "experiment" that directly harms participants to make them fearful. But all you really need to do is threaten them with something that will make them fearful. At the same time, the aversive treatment must seem realistic within the context of a scientific experiment, and also allow you to manipulate only the amount of fear. You might threaten to hit participants or to embarrass them, for example, but you might bring about effects in addition to fear, such as anger or refusal to participate. Instead, Dutton and Aron capitalized on the reputation of psychology experiments by telling participants that they would be electrically shocked as part of the study.

How would you create the conditions to manipulate fear?

How would you confirm that the intended effect occurred?

Dutton and Aron manipulated fear by threatening participants with different levels of shock. The shock was described as either small (low fear condition) or large (high fear condition). To produce a strong manipulation, they described the small shock as "a minor tingle which some people actually find pleasant" and the large shock as "quite painful." (You might add a control group threatened with no shock.)

A manipulation check is required to ensure that the procedure actually influences participants' fear, so at some point you will actually measure participants' anxiety level as a function of their assigned shock level.

How would you measure the dependent variable?

Instead of observing an overt behavior from which to infer a participants' attraction, Dutton and Aron asked their participants how sexually attracted they were to the confederate. But rather than asking "Is she attractive?" they achieved greater sensitivity by presenting a 5-point Likert scale with the questions "How much would you like to ask her out?" and "How much would you like to kiss her?" They also included an additional, less threatening measure of arousal by having participants describe an ambiguous picture in a projective test. Each description was examined for sexual content using content analysis and, for reliability, was scored by two scorers. Included in the questionnaire was the manipulation check in the form of the question: "How do you feel about being shocked?" Participants responded with ratings, and the researchers interpreted "greater dislike" as indicating "greater fear."

How would you ensure experimental realism?

A deceptive cover story is needed so that participants encounter the confederate and are assigned a shock level in a convincing and realistic way. Dutton and Aron introduced the confederate to the participant as a second "subject." Both were ostensibly there for an experiment in which two people would be tested simultaneously, to study the effects of punishment (shock) on learning. The researcher then tossed a coin supposedly to determine which of the two would receive the high or low shock condition (but actually to randomly assign the real participants to their conditions). At this point, since it was not necessary to actually administer the shock, the experimenters had participants complete the questionnaire.

How would you address demand characteristics?

To reduce reactivity and social desirability, Dutton and Aron presented the questionnaire under the guise that personality characteristics and feelings between participants have an important influence in this type of learning study. The confederate and real participant were taken to separate cubicles while the experimenter "set up the shock equipment." For face validity, the questionnaire contained several "filler" questions about the participants, along with the attraction and fear questions and the projective test.

What procedural or ethical problems must be addressed?

Because any subtle differences in the behaviors or demeanor of the confederate may alter her attractiveness, her behavior must be "scripted" so that she acts the same way with all participants. Keeping where she sits constant is also important because participants need to see her and yet not be too near or far away. Keeping her "blind" to the hypothesis is advisable, because she might otherwise emit subtle cues that could confound conditions. Of course, the experimenter must behave consistently as well. A pilot study to practice and de-bug the procedure is definitely needed.

Ethically there is a problem if participants feel coerced into receiving shock and experiencing fear. Dutton and Aron solved this problem by telling participants they would be shocked and then obtaining informed consent, thus giving participants the opportunity to leave before the study continued. Although the researcher lied to participants because they were not actually shocked, the lie would cause them to expect more harm than they experienced---ethically a much better situation than causing them to expect less harm than they experienced.

Select N and diagram the study.

What statistical procedures will you perform?

Because of the very controlled setting and the small variability of rating scores, Dutton and Aron had sufficient power to obtain significant results with 20 participants per condition. A diagram of the study is shown in Table 23.1

Ratings from the "date" and "kiss" questions were averaged together, producing each participant's score. These scores are ratio scores, and they seem to fit the requirements of a parametric procedure. The study involves a between-subjects design, and with two conditions, an independent samples t-test is appropriate. (If you included a control group or other levels, or if you included additional factors, you would perform a between-subjects ANOVA.) With higher ratings indicating greater attraction, Dutton and Aron obtained an overall mean attraction rating of 2.8 in the low fear condition, which differed significantly from the mean rating of 3.5 in the high fear condition.

Another analysis is also required: The manipulation check must confirm that the shock conditions produced high and low anxiety levels. Using participants' rating scores from the question about "disliking the shock" produces a between-subjects design with parametric data, so the independent samples t-test is again appropriate. Dutton and Aron found that the mean dislike rating for the high shock group was significantly larger than for the low shock group, suggesting that their manipulation altered participants' fear levels as intended.

What conclusions can you draw from this study?

What issues of validity need to be addressed?

Significantly higher attraction scores in the high fear group confirm the hypothesis that attraction is heightened by fear from an extraneous source. One important concern, however, is whether demand characteristics have limited internal validity for concluding that greater fear causes greater attraction. Maybe the participants were dishonest because of reactivity, social desirability, and experimenter expectations. (After all, it doesn't take a genius to realize that if a "psychologist" tells you that a shock is very unpleasant, you should then indicate that you dislike the idea of being shocked!) It is therefore possible that participants were not more or less fearful. Also, you must ask whether there is external validity for concluding that this relationship occurs in other settings. Clearly, participating in an experiment is not the usual context in which you think about one's sexual attraction to another person. This setting is very contrived, and everything depends on how convincing the confederate and experimenter were.

How would you replicate this study under more natural conditions?

You need a field experiment to test this hypothesis in a more naturalistic setting. Dutton and Aron sought a setting in which participants were made anxious by a natural fear-arousing stimulus or activity. The idea was to catch participants after they had experienced some positive yet fear-arousing event, so any thrill-seeking activity would suffice. For example, you could test at a "bungee cord--jumping" event or at a roller-coaster ride. Dutton and Aron selected a narrow, wobbly, foot-bridge suspended high above a scenic canyon to create the experimental condition of high fear. Nearby, was a wide, solid, and sturdy bridge over the canyon, which served as the "control bridge" for the low (no) fear condition.

How would you conduct this experiment?

You might strategically place the confederate so that participants walk by her, and then question them to determine their attraction to her. However, participants might not even notice her. Dutton and Aron solved this problem with the following procedure: After a male had crossed the bridge, a female interviewer approached him to answer a questionnaire for a "study" about the effects of scenic attractions on creativity. Among the filler items was a brief projective test, which was later scored for sexual imagery by two trained raters. In addition, participants were offered the interviewer's phone number, so they could later call to discuss the study. Whether participants accepted the phone number and whether they called were taken as indications of greater attraction to the interviewer. As a control, a male interviewer also tested some male participants and offered his phone number.

How would you analyze these results?

Dutton and Aron employed three dependent variables here. First, the sexual imagery scores from the projective test produced an inter-rater reliability coefficient of +.87. Each participant's average score was a ratio score, so the data were entered into two between-subjects t-tests using the conditions of high and low fear. When the interviewer was female, participants who crossed the scary bridge provided significantly greater sexual imagery than did participants who crossed the control bridge. When the interviewer was male, no significant difference was found.

Second, the scores for accepting the interviewer's telephone number and for actually calling consist of two yes-no, categorical or nominal variables. Dutton and Aron found that 9 out of the 18 high-fear participants who took the female interviewer's phone number actually called her. Only 2 out of the 16 low-fear participants called. The male interviewer received 2 calls out of the 7 high fear participants who took his number, and 1 call out of 6 from the low-fear group. To determine whether calling rates for the control and experimental groups differed significantly, Dutton and Aron performed one chi square procedure for the female interviewer and one for the male interviewer. They reported significant differences only for the female interviewer.

What procedural and control problems exist?

First, you must confirm that the two bridges actually produce high and low fear, respectively. Because directly asking participants about their fear might have caused suspicion, Dutton and Aron relied on a pilot study in which other, similar participants who crossed each bridge answered a questionnaire and confirmed the effect of the bridges.

A second problem is that random sampling was not possible here, because only those participants who actually crossed a bridge and who volunteered to complete the questionnaire were tested. (Also, only those of a certain age who were unaccompanied by a female were approached.) Most critical is the fact that the participants themselves decided which bridge to cross. Given that participants were not randomly assigned to the conditions, this study involves a quasi-experimental design. Thus, in addition to the bridge they crossed, participants may have differed along all sorts of other variables. In particular, participants who crossed the scary bridge were probably more adventurous and thus perhaps more likely to call the interviewer and to project more sexual imagery in their stories. In a replication, Dutton and Aron created a more comparable control group by selecting participants who had crossed the scary bridge, but who then loitered about until (presumably) the fear had dissipated. They found results similar to those reported above. Nonetheless, it is inappropriate to say that greater fear caused the higher attraction scores.

What additional research on this topic would you suggest?

A general question concerns the construct validity of arguing that participants actually misattribute or misinterpret extraneous arousal as heightened sexual attraction. Do they not know that they were scared by the bridge or electric shock? If participants identify the actual source of their physiological arousal, then they are not misattributing it and some other factor is responsible for heightened sexual attraction. Because of this question, several alternative explanations for these results have been proposed (see Allen, Kendrick, Linder & McCall, 1989).

Conceptual replications of misattribution effects would also be appropriate. For example, Cohen, Waugh, and Place (1989) observed couples entering and leaving a movie theater, and noted that more touching occurred after a suspenseful movie than after a dull one. White and Knight (1984) demonstrated heightened attraction due to misattribution of arousal from physical exercise (running in place). However, misattribution has not always been successfully replicated (e.g., Kendrick, Cialdini & Linder, 1979). To extend this research, consider that most studies involve male participants. But would the same results occur with female participants and a male confederate? Would they occur with homosexual male or female participants and a same-sex confederate? Also, studies have been conducted regarding the effects of alcohol consumption on sexual arousal (e.g., McCarty, Diamond & Kaye, 1982), and on the dynamics of people meeting in bars, reacting to "opening lines," and so on (e.g., Cunnigham, 1989). Both topics would seem relevant to the situation where fear and sexual attraction converge. Further, little evidence is available regarding misattribution of other emotions, such as anger. And, finally, it is unclear whether this process works in reverse such that increased sexual attraction might be misattributed and result in an increased response to a fearful situation.
Top of Page


Topic 2: Time Perception

Have you ever taken a long car-trip, and noticed that the drive home seemed to take less time than the drive to your destination, even though on the clock both trips took the same amount of time? This experience may be an example of the observation that the more a time interval is "filled" with stimuli, the longer in retrospect it seems to have lasted. As you go toward your destination, the scenery and sights are novel, so that the travel time is mentally "filled" with many interesting stimuli. Then, although only 30 minutes have elapsed, they are overestimated as "feeling like 40." On the return trip, however, you've seen all the sights, so the interval is mentally unfilled. Then the elapsed 30 minutes feel like 30 or perhaps even 20 minutes.

Based on such observations, Ornstein (1969) hypothesized that people judge the duration of an interval using their memory for the stimuli that occurred during the interval. When the memory is in some sense "larger," the interval is perceived as longer. He therefore set out to show that the more stimuli a person encounters during an interval, the longer the interval is judged in retrospect.

What are the hypothesis and purpose of this study?

What are the variables to be studied, and which design is appropriate?

How would you elicit the dependent behavior?

What sampling would you use, and should you control subject variables?

The purpose of this study is to test the hypothesis that filling an interval with more stimuli causes it to be perceived as lasting longer. Because any extraneous event during an interval helps to "fill" it and thus confounds the study, you should conduct a controlled laboratory experiment. In particular, you can create a true experiment by randomly assigning participants to conditions of the independent variable, which is the number of stimuli filling an interval. The dependent behavior is a participant's estimate of the duration of the interval. Presumably, time perception is similar in all normal humans, so you can randomly sample from available college students.

When studying any cognitive process, you are likely to find large individual differences, but you can control subject variables by performing a repeated-measures design, testing the same participants under each condition. In our case, however, once participants know that they will be estimating an interval, they may count or otherwise mentally time it. One solution to this problem is to employ repeated measures but also to disguise and de-emphasize the time-estimate response. Ornstein, for example, buried the request for a time estimate in a questionnaire that participants completed after each interval. Alternatively, you might employ a between-subjects design so that you can truly surprise participants with a request to estimate the interval after it is over. Although this approach is better because it guarantees that participants are not prepared for a time estimate, let's assume that you adopt Ornstein's repeated-measures design.

How would you define your conditions and institute needed controls?

The obvious approach is to directly vary the number of stimuli presented to participants during an interval. Ornstein varied the number of tones that participants heard. In similar studies, Schiffman and Bobko (1977) varied the number of visual stimuli participants viewed, and Kowal(1987) varied the number of musical notes in a melody.Taking a somewhat different approach, Hicks, Miller, and Kinsbourne (1976) varied participants' activities by having them sort cards and perform different mental tasks. Let's assume that you employ the tones used by Ornstein.

You must also define the duration of the interval.Although there is research literature on the perception of very brief, millisecond intervals, Ornstein defined an ecologically realistic interval of 9 minutes and 20 seconds.(You wouldn't want an interval as obvious as 60 seconds or 5minutes, because participants would be likely to guess these.) The key, then, is to create a strong manipulation by filling the interval with a substantially different number of tones in each condition. Ornstein created three conditions, with the tones occurring at the rate of 40, 80,or 120 tones per minute throughout the interval. The duration of the tones was constant regardless of the condition, but the pauses between them was varied so that the tones occurred regularly throughout the interval.

How would you measure the dependent variable?

Here, direct estimates of the interval's duration from participants are needed. Thus, Ornstein asked participants to estimate the duration in minutes and seconds. This score is objective and participants can easily record their estimates. Alternatively, if you believed that participants would be unable to translate their subjective impressions of time into these terms reliably, you could ask them to estimate the interval nonverbally by drawing a line to represent its length (Mulligan & Schiffman, 1979).

What procedure for testing participants and what materials are needed?

As long as the experimenter does not communicate expectations or create undue pressure and thus reactivity,the researcher can be present to test participants. For reliability, you can produce an audio tape recording of the three intervals, using electronic clocks and tone generators to create the stimuli. To eliminate distractions, you can play the recording over headphones, at a constant volume for all participants. (Be sure to select only people with normal hearing.)

To create a realistic situation and prevent participants from forming hypotheses that might bias their estimates, you need a cover story to "explain" why they are listening to recorded tones. For example, in your instructions you might say that you're studying the relaxing effect of these stimuli, and that participants are to merely relax and sit quietly during the interval. In this way you also minimize any extraneous stimulation that might otherwise "fill" the interval and influence time estimates. Further, this "relaxation therapy" could require the removal of any jewelry, to ensure that participants do not look at their watches during the interval. For consistency, all participants should estimate an interval immediately after being exposed to it, without any distractions between the interval and participants' responses. Thus, you might tell participants that, immediately after the interval, they should turn over the paper in front of them to answer the questions provided there. What they will find on the reverse side is a questionnaire that includes a question asking them to estimate the interval's length in minutes and seconds,along with other distracting questions regarding participants' thoughts and relaxation responses. These "other" questions not only add credence to your cover story but also allow you to determine what participants were doing during the interval and to obtain a manipulation check of whether they attended to the tones. Afterwards, in your debriefing, stress that participants are not to tell other potential participants about the time-estimation task. A major procedural problem concerns the order effects produced by repeated exposure to all three conditions: Whether an interval is relatively filled may depend on which intervals were previously heard. Likewise, listening to over 9 minutes of tones is a long, boring task, so fatigue effects are likely. However, by testing each third of your participants under the order of conditions ABC, BCA, or CAB, you can adequately counterbalance for order.

What are the prediction of the study, and the N to be tested?

Diagram the study, and determine the statistical procedures to be performed.

The prediction is that intervals containing more tones will be estimated as having lasted longer. Because of possibly large error variance due to obtaining estimates in minutes and seconds, a relatively large N is needed. Based on similar time perception research, you can assume that approximately 50 participants per condition will provide substantial power.

Time-estimate scores are ratio scores that meet the requirements of a parametric procedure. Since this study has three levels of one within-subjects factor, you perform a one-way within-subjects ANOVA of the design shown in Table23.2

After computing a significant F, you perform posthoc comparisons to determine which means differ significantly. Ornstein found that the interval with 120tones per minute was judged to be significantly longer than that with 80, and that both of these were judged to be longer than that with 40 tones per minute. Unfortunately, he did not report eta squared to indicate the effect size of his manipulation, so we do not know how consistently his varying the number of tones determined time estimates.

The above analysis merely deals with whether the subjective impression of an interval increases as it becomes more filled. The estimates may increase as predicted, but none of them may have any resemblance to the actual duration of the interval. To see a participant's time estimate in relation to the interval's actual duration, you can subtract the actual duration of the interval from each estimate. Any positive difference indicates that participants have overestimated the duration, saying that it seems longer than it actually was. Any negative difference indicates that participants have underestimated the duration, saying that it seems shorter than it actually was. An analysis of the means of scores obtained using this procedure indicates the effect that filling an interval has on errors when estimating time.

What conclusions can you draw from this study?

What are the limitations on your conclusions?

Unless unknown confoundings are present, this finding confirms the hypothesis that a greater number of stimuli in an interval causes the interval to be perceived as longer. A limitation is that such tone-filled intervals are never encountered in the real world, so you have limited external validity beyond this study. You also have reduced generalizability, because your ultimate purpose is to understand the general perception of the passage of any time interval. With this latter problem in mind, consider the next question.

What factor could be added to this study to increase its generalizability?

The effect of varying the stimuli in an interval should be tested with intervals of different lengths, so that the findings are not tied to just one interval. Specifically you could create a second factor, by presenting both the condition containing the above interval (to replicate Ornstein's study) and another condition containing a different sized interval. Let's say that you include the interval of 4 minutes 40 seconds, which is one-half the size of Ornstein's (but you could add intervals of any size that make sense).

How would you analyze this design, and what may it indicate?

Because you now have the two factors of number of tones per minute and the length of the interval, a two-way ANOVA is appropriate. However, you should not analyze participants' actual estimates of the duration of the interval. If their estimates bear any resemblance to reality, then the two interval sizes would automatically produce differences in time estimates (you'd expect about a 4-minute difference). To equate the different length of the intervals, again subtract the actual duration from participants' estimates and look at their estimation errors.Let's say that you obtain the mean difference scores shown in Table 23.3.

The average error in the time estimates in each cell is positive, so participants consistently overestimated all intervals. The main effect of increasing the number of tones(as seen by comparing the column means) still tests the original hypothesis that filling the interval with more stimuli increases its perceived duration: With more tones,overestimates increase indicating that the interval is perceived as increasingly longer than it actually was. The main effect of duration (as seen by comparing the row means) indicates that, overall, the two intervals produced differences in estimation error. Of most interest will be whether there is a significant interaction.

Graph this interaction, labeling the X axis as the conditions of number of tones per minute.

How will a significant or nonsignificant interaction be interpreted?

If the interaction is significant, you can conclude that the way in which estimates change with more filled intervals depends on the duration of the interval. Note that this conclusion would limit your generalizability of the filled-interval hypothesis, since you would find that it applies differently depending on the interval's length. A nonsignificant interaction, however, would suggest that the influence of increasing the number of tones is similar---that is parallel---for both short and long intervals.

So far the results seem to support the hypothesis that participants judge an interval as longer when they have a "larger" memory for the contents of the interval. However,there are at least two additional explanations to consider when conceptualizing the constructs underlying a participant's memory of an interval. First, the greater number of tones per minute also creates an interval containing more complex stimulation. Perhaps a person's time perception is actually determined by the degree of complexity characterizing the overall event that fills an interval, not merely by the amount of stimulation.

How might you manipulate the complexity of the stimuli during an interval, and what variable must you keep constant?

For a given interval size, the variable to keep constant is the number of stimuli presented during the interval. You then vary the complexity of the stimuli presented. (In fact,Ornstein [1969] conducted a second study in which participants viewed drawings of geometric shapes that varied in their complexity.) You may also vary the complexity of the response that participants make to a stimulus, as in Brown (1985) who had participants physically trace patterns that varied in complexity. These studies showed that increased complexity did increase the perceived duration of the interval.

The second explanation to consider when conceptualizing memory for an interval is that given either a larger number of stimuli or more complex stimulation, the result may be to produce a less organized memory for the contents of the interval. Stimuli that are simple to us---that we understand---are organized in our memory; stimuli that are complex are not. Thus, anytime we organize the stimuli in memory, the interval may seem shorter, while with less organization it may seem longer. Mulligan and Schiffman(1979) tested this hypothesis by manipulating whether participants could make the stimuli more meaningful and organized. The procedure adopted for making a stimulus more or less meaningful was discussed in a previous chapter.

How would you manipulate the meaningfulness of stimuli occurring in an interval?

There are a number of ways you might give more or less meaning to a stimulus. Believe it or not, Mulligan and Schiffman (1979) adopted the procedure discussed in Chapter 23, in which participants were presented "droodles" (the cartoons with the funny, disambiguating captions). Participants viewed a droodle for a fixed period, either with or without a caption, and then estimated the viewing interval. Estimates were shorter when a caption was provided, presumably because it allowed better organization and simplification of the components of the droodle in memory.

Current research suggests that the perceived duration of an interval is influenced by the nature of the mental stimulation and processing occurring during the interval.There are, however, two paradoxes reported in this literature. First, it follows that the more stimuli we can recall from an interval, the more our memory is filled and thus the longer the interval is judged to be. (For example, Ornstein found that hearing neutral words that were well recalled produced longer estimates than hearing unpleasant words that were poorly recalled.) Paradoxically, increased organization of stimuli is usually a prime cause of increased retention. Thus, contrary to Mulligan and Schiffman (1979), increased organization of droodles should lead to better retention and longer estimates. These researchers found that it led to shorter estimates, and produced no differences in retention of the droodles! To further study this phenomenon, cognitive psychologists have many ways of producing differences in a participant's retention and organization. You might, for example, ask participants different types of questions about a word (Craik & Tulving, 1975) or you might make an ambiguous paragraph easier to understand by presenting a title (Brandsford & Johnson, 1972).

The above research involves retrospective judgments, in which participants consider the interval only after it has ended. The second paradox occurs when participants are aware that they will estimate the interval at its start (making a "prospective" estimate). Here the amount of stimulation in the interval has the opposite effect, such that the more an interval is filled, the shorter the interval is judged to be(Hicks, Miller & Kinsbourne, 1976; but see Brown, 1985).In this setting, "time flies when you're having fun" because the fun mentally fills the interval, so time passes quickly. Conversely, "a watched pot never boils" because we ignore ongoing events, so the interval is mentally unfilled and time drags by. In fact, Cahoon and Edmunds(1980) had participants watch a pot of water come to a boil.They found that the more that participants concentrated on waiting for the water to boil, the longer the interval was judged to be. One possible explanation for this paradox is that when making a prospective time estimate, participants do not rely on the amount of stimulation experienced during the interval. Instead, they rely directly on their experience of time. Time estimates become shorter when more stimuli are encountered because greater stimulation directs participant's attention away from their experience of time.To test whether less attention to time makes it pass rapidly, you can take any established procedure for consuming more or less of a participant's attention and incorporate it into a study in which participants estimate an interval's duration.
Top of Page


Topic 3: Fear of Success in Females

Some people argue that the most insidious effect of racism and sexism is that the victims of such biases are conditioned by society to behave in certain ways that match the stereotype. For example, Horner (1972) proposed that women who avoid striving for success in school or occupations may be motivated by "fear of success," or FOS. Given the society of the 1970s, it was reasonable to suggest that women were conditioned by society to avoid seeking success because they had learned to fear the negative consequences associated with it. In particular, they may have anticipated being perceived as "unfeminine" and socially undesirable if they were successful. Men, Horner proposed, generally exhibit less FOS because, for them,there are fewer negative consequences associated with success. Horner conducted a study to demonstrate that men and women exhibit different levels of FOS.

What are the hypothesis and purpose of this study?

What are the variables to be studied, and which design is appropriate?

What sampling would you use, and should you control subject variables?

The purpose of this study is to demonstrate the existence of FOS. Because it would be premature to test variables that cause FOS, you should turn to a correlational, descriptive study. The basic proposal is that men and women exhibit different levels of FOS, so at most the design is a quasi-experiment, with the quasi-independent variable consisting of the conditions of male or female. The dependent behavior is FOS, presumably a relatively stable characteristic. The study does not require a specific field setting, so conducting it in a laboratory provides the best control. Presumably, FOS is found throughout the general population, so you can select participants randomly from available college students. Note that the quasi-independentvariable of gender is a between-subjects factor, which maybe confounded by many subject variables, including race,age, college experience, and so on. To control these variables, you might either limit the population to a more homogeneous group or produce matched groups of males and females.

How would you measure the dependent variable?

The key is to operationally define and then measure behaviors that are construct-valid and actually reflect FOS. You might design a situation to cause participants to strive for success, but if they don't, you cannot be sure that this result reflects FOS. But you can measure participants' motives more directly using their self-reports from interviews or questionnaires. Closed-end questions, however,would be unworkable here, because (1) this is exploratory research, so you do not necessarily know the appropriate questions to ask, (2) participants may be unaware of their FOS, and (3) demand characteristics may prevent them from giving honest responses. To avoid such problems, Horner(1972) employed a "projective" test from which she could infer high or low FOS from participants' responses. Female participants completed a story that began "After first-term finals, Anne finds herself at the head of her medical school class." Males completed a corresponding story about "John." Then, in the completed stories, Horner looked for content that suggested a participant's concern with FOS.

What procedure, instructions, and materials are needed?

To ensure reliability, participants completed the story in writing. To focus participants on dealing with the success of the characters, Horner asked participants to comment on Anne's or John's reactions to their grades, other people's reactions, and their past and future lives. Because only one story was completed per participant, order effects were not a problem.

By counterbalancing the gender of the researcher within each group, you could control for experimenter-produced demand characteristics. (A double-blind procedure would probably not be necessary if you can minimize the interaction between participants and researcher.) Also note that participants can be tested in groups. To minimize participants' reactivity to divulging personal information, you might frame the experiment as a creativity test, English composition test, or as a test of knowledge about social interactions. The materials you would use consist of a sheet of paper with the appropriate description of John or Anne typed across the top. (By providing more sheets you imply that participants can or should write more.)

The design is ethically acceptable because participants are knowing volunteers, and informed consent can provide straightforward information about the task and situation. However, the hypothesis would not be conveyed until the debriefing.

How would you determine each participant's score and ensure a powerful design?

The major problem with a projective, open-ended question concerns the reliable scoring of participants' responses. Horner performed a content analysis, operationally defining FOS to be present when a participant's described Anne or John as having negative feelings or negative experiences as a result of achieving high grades. Horner found, for example, that Anne was described as the stereotypical lonely, unattractive "egghead" who was disliked (and even beat up by classmates when the grades were published).Multiple scorers would be appropriate here and you should check for high inter-rater reliability. A relatively large N is warranted because of potentially high variability in participant responses and scoring. With so simple a task, you could easily test upwards of 50 people per condition.

You might think that, after scoring the stories, youcould assign each participant a score reflecting the number of FOS images in his or her story. However, you do not know that a greater number of FOS images in a story indicates a greater degree of FOS. (Some participants may simply write a more detailed story.) Horner solved this problem by viewing FOS as an all-or-none characteristic. She categorized each participant's story as either exhibiting or not exhibiting FOS, and then determined whether more females exhibited FOS than males.

What statistical analysis should you perform?

This is a between-subjects design with nonparametric, nominal, or categorical data, so performing a chi square procedure is appropriate. You could perform a one-way procedure to determine whether, among those with FOS stories, there is a greater frequency of females than males, but technically you should compute the two-way chi square to determine whether the frequency of FOS and non-FOS is independent of participants' gender. This design and the results of Horner's study are shown in Table 23.4

Here, a significant result indicates whether the frequency that participants produced an FOS story depends on whether they are male or female. Horner reported that females produced a significantly greater number of FOS stories, although, unfortunately, she did not compute the phi correlation to indicate the strength or consistency of this relationship.

What conclusions can you draw, and are they valid?

Because this is a quasi-experiment, you cannot confidently conclude that gender causes the differences in FOS stories. Rather, you can only speculate as to how FOS develops in males and females and how it operates to influence behavior. It is reasonable to expect that other participants in similar settings would produce similar results, so there is some external validity for this specific relationship. But you won't know whether FOS actually operates in the real world, whether it actually motivates people, or what all its components are, so ecological, construct, and broad external validity are limited. (Then again, this is just the first study.)

Even your tentative conclusions hinge, however, on whether there are any confoundings between the two gender conditions. Again, look carefully at Table 23.4.

What is the major confounding between the above conditions?

The difference in FOS scores between the conditions maybe due to the sex of the participants in each group or it may be due to the sex of the character in each story: In short, the sex of John and Anne is confounded with the quasi-independent variable of sex of subject. To test this alternative hypothesis, Monahan, Kuhn, and Shaver (1974) had both males and females complete stories about both John and Anne. They found that women and men showed greater FOS when describing Anne! Thus, there is something about completing the Anne story that produces greater FOS imagery. What is it about Anne's success in medical school that causes both males and females to imagine her as having negative feelings and experiences because of that success?

What bias is built into the stimuli?

The bias is that Anne is in medical school, stereotypically a "male" situation. Thus, Horner's study was also confounded because the two conditions differ in terms of whether the sex of the character fits the stereotype of the school. This conclusion was confirmed by Cherry and Deaux (1978), who had males and females describe John and Anne as being either at the head of their medical-school class or at the head of their nursing-school class. In the medical-school setting, 60% of the Anne stories indicated FOS, compared to only 30% of the John stories. But in the nursing-school setting the shoe was on the other foot: 64% of the John stories but only 30% of the Anne stories indicated FOS. Further, both men and women provided more negative descriptions of John in nursing school and of Anne in medical school.

What do these results indicate about the construct of FOS?

If FOS exists, Horner did not measure it.Essentially her study lacked construct validity:Participants never actually stated that they feared success, and, instead of reflecting some intrinsic motivation, their negative descriptions depended entirely on the context of the story. Apparently, we are all aware of stereotypes and tend to predict unpleasant consequences for someone who violates a stereotype. In particular, when describing someone who is successful at something that is uncommon for their gender, we expect the social consequences of that success to be negative.

Recall that research explanations are supposed to be parsimonious; that is, they should not be complicated by unnecessary constructs. Because the above study (and similar research), has not demonstrated that FOS is necessary for explaining behavior, the construct has largely been discarded.

What suggestions do you have for further research?

To retain the construct of FOS, you would have to measure it in an objective, valid, and unconfounded manner. For example, you might develop a closed-end questionnaire to measure FOS and also seek to identify a concrete behavior that reflects avoiding success in real life. Then you could show both concurrent and predictive validity by determining the correlation between questionnaire scores and actual avoidance of success. Also, since very young children would not be expected to have learned the negative consequences of violating sex-role stereotypes, you might eliminate the bias in Horner's study by testing young children for FOS. And finally, because society and sex stereotypes presumably have changed since the early 1970s, a replication of the above FOS studies could determine whether there is a generational or "cohort" effect between Horner's participants and those of today.
Top of Page


Topic 4: Creativity

Creativity is often examined in terms of problem solving.In this context, being creative is defined as making new,uncommon associations between ideas to solve a problem.Isen, Daubman, and Nowicki (1987) suggested that one variable that influences creative problem solving is mood or "affect." When experiencing positive affect, participants may organize information into broader, more all-inclusive categories, such that they combine highly divergent information. Because of this new organization, participants may see unusual connections or novel associations, thus facilitating problem solving.

What are the hypothesis and purpose of this study?

What are the variables to be studied, and which design is appropriate?

What procedure and participants would you employ?

The purpose here is to show that more positive affect causes greater creativity, so a laboratory experiment involving college students is appropriate. Any manipulation that influences a participant's mood is appropriate. For example, you could present different mood-inducing words or vary the amount of reward that participants receive. In a series of studies, Isen et al. (1987) employed several such procedures, including showing participants either a five-minute comedy film (consisting of television "bloopers"), a neutral control film (about the normal curve!), and a negative film (a documentary on World War II Nazi concentration camps). Let's say you choose to employ this procedure. Accordingly, your independent variable is the type of film shown to participants. (Technically, a subject's mood is an intervening variable, which,presumably, is influenced by the type of film viewed.) Because of potential carry-over effects, the films should be presented as a between-subjects factor.

The dependent variable is a participant's score on creative problem solving. Isen et al. employed a number of "brain-teaser" problems to measure creative problem solving, among them, the Remote Associates Test (Mendrick,Mendrick, & Mendrick, 1964). Here, each test question contains three words for which there is one "remote" association. The questions are presented in a questionnaire,and participants fill in the word that provides the association. For example, given the stimuli mower, atomic,and foreign, the correct answer is power (since it may occur logically with each stimulus word).

How would you ensure that the manipulation worked?

How would you ensure reliable measurement of creativity?

A manipulation check is necessary to be sure that the films had the desired effect. Isen et al. accomplished this check by telling participants that the film they were about to see was being pretested for another experiment. Then, after the film was over, they asked participants either to rate the pleasantness of several unfamiliar, neutral words or to rate statements describing how the film made them feel. Participants' responses confirmed that the films had produced the intended differences in mood. Following these tasks, participants performed the Remote Associates Test, under the cover story that norms were being established for another study.

To reliably measure creativity, Isen et al. had participants complete 21 Remote Associates questions. To control the difficulty of the questions, the researchers used pilot data to select 7 easy, 7 medium, and 7 difficult items. A participant's score was the total number of questions correctly answered. Further, to reliably demonstrate the relationship, the researchers tested between 50 and 100 participants in each of their studies, roughly counterbalancing for sex of participant.

Diagram this study, and select the statistical procedures you should use.

The study is diagrammed as shown in Table 23.5.

Such scores fit a parametric procedure, and the one-way, between-subjects ANOVA is appropriate. If the F is significant, post hoc comparisons and eta squared are computed. Based on Isen et al. (1978), the number of items correctly solved should be significantly higher when participants experience the positive-mood condition than when they experience the neutral condition. Negative mood should not, however, produce a significant difference in creative problem solving compared to the neutral condition.
To obtain greater information from the above design, you could analyze an additional, hidden factor.

What additional factor can be analyzed?

What statistical procedure should be used, and what will it show?

The additional factor to be analyzed is the difficulty of the Remote Associates items. Instead of computing an overall total-correct score per participant,you can examine the number correct for each participant when answering the easy, medium, and difficult questions,respectively. In doing so, you create a two-way mixed design. You still have the between-subjects factor containing the three levels of mood, but you also have the repeated-measures factor consisting of easy, medium, or difficult questions. This results in a 3 x 3 mixed ANOVA. From this, you will see the main effect of the mood conditions as well as the main effect of the difficulty of problems. Most interestingly, you will see the interaction between difficulty and mood. This interaction will indicate whether the relationship between a person's mood and his or her creative problem solving depends on how difficult the problems are. Notably, when Isen et al. (1987) performed a similar analysis, they found that, regardless of mood level, the difficult questions were so difficult they tended to produce a floor effect, while the easy items were so easy they tended to produce a ceiling effect. Therefore, the interaction of mood and difficulty was not significant.

Although Isen et al. found that positive mood conditions produced higher creativity scores, there is a potentially serious problem in inferring that participants truly experienced the different moods. Consider the order in which participants performed the various tasks described previously.

What demand characteristics might have biased this study?

After presenting a film, Isen et al. conducted a manipulation check to determine participants' moods. But imagine you are a participant who watches a brief film that is rather obviously light-hearted or depressing. And that you are then asked to rate the pleasantness of words or, worse, to describe how the film made you feel. Wouldn't you suspect that the film was supposed to produce a positive or negative mood? This procedure may have communicated experimenter expectations about the mood that participants were supposed to indicate, and the participants may simply have complied. If this were the case, the researchers could not identify the cause of the differences in creativity, because they could not be sure that the intended differences in mood even existed.

To solve this problem, the researchers might present the manipulation check after the creativity test, but having a difficult or easy time in performing the creativity test might change participants' mood from what it was at the beginning of the test. Likewise, they might use different stimuli, but any manipulation that is strong enough to influence mood may also communicate experimenter expectations or bias participants. Thus, a better design might be to describe the relationship between mood and creativity, but without actively manipulating mood so that demand characteristics are avoided.

A further problem is that the above manipulation check,at best, showed that the average mood score differed between the conditions, but there was undoubtedly variability in the moods of participants within each condition. Therefore, this procedure is not a very precise or sensitive way of examining how specific differences in mood relate to differences in creativity.

Given these criticisms, when designing an extension of this research you might seek a description of the relationship between participants' specific mood level and their creativity, but without actively manipulating mood.

What other design might you employ to study this relationship?

What are the hypothesis and purpose of this study?

How would you define and measure the variables?

Given the goals described above, the obvious choice is a correlational study: After measuring participants' present mood when they enter the study, you can correlate these scores with participants' scores on a subsequent creativity test. The hypothesis is that mood and creativity are related. The purpose is to show that the relationship exists, but you are no longer trying to show that a positive mood causes greater creativity.

To perform a more literal replication of the Isen et al.procedure, you might again operationally define creativity in terms of participants' ability to solve Remote Associates problems. To measure mood, you could again have participants rate the pleasantness of uncommon, emotionally neutral words. (By checking the literature, you can obtain previously used lists of such words.) Your operational definition here is that the higher the pleasantness ratings of the words, the better a participant's overall mood.

How would you create the rating scale for items measuring mood?

Because rating a neutral word as very pleasant or unpleasant may seem an unnatural task for participants, the rating scale can be anchored with only the words pleasant and unpleasant. Likewise, because participants may have a hard time making fine discriminations in the pleasantness of a word, it is appropriate to provide only a 5-point scale. With an odd number of points, you allow participants a middle or neutral rating, which in this context is also appropriate. In your instructions, however, stress that participants should consider all points on the scale. A pilot study is called for here to determine whether the words are neutral, whether their pleasantness can be rated, and whether the scale is appropriate. You would ask participants to rate each word on these dimensions and then include only those words in the mood test that are consistently judged to meet these criteria.

How would you construct the mood questionnaire?

To reflect a participant's mood reliably, you would want to get a sufficient number of words rated---say, 24 words total. To cancel out any potential response biases, you can counterbalance the placement of pleasant and unpleasant at the left end of the scale for half of the words and at the right end for the other half.

You might intermix the mood questions with the Remote Associates Test and present them as one questionnaire, but this arrangement could be very confusing for participants. A better approach is to keep the two measurement procedures separate, with individual printed instructions for each.

How would you administer the mood and creativity tests?

There are two reasons that you might have all participants complete the mood questionnaire before completing the Remote Associates Test. First, you would avoid any possible influence that the Remote Associates Test might have on a participant's mood. (Remember, your goal in this design is to not manipulate or influence participants' mood.) Second, although your emphasis is definitely not on causality, you can strengthen such an inference by having what you believe is the causal variable occur first.

What participants will you select, and how will you sample?

It is most important to obtain participants who may differ greatly in their moods and in their abilities to solve the Remote Associates Test. Since observing this relationship across a wide range of scores both provides more information and increases statistical power, you want to avoid a restricted range of scores on either variable. A target N of 100 participants would provide sufficient power. The hypothesized relationship applies to the general population, so you can randomly select college students.Doing so also allows you the advantage of collecting data ina quiet, controlled laboratory setting. If you had reason to believe that college students would produce a restricted range of scores, however, you might instead conduct a field study in several randomly selected public locations or a mailed survey, using systematic and/or stratified random sampling techniques.

What specific steps will you follow to analyze the data?

First, you must determine each participant's score on each test. For the Remote Associates Test, you total the number of correct answers. For the mood test, you initially code all questions so that a higher rating always indicates that the word is rated as more pleasant, implying a more positive affect. Then, each participant's mood score can be either the total or the mean of the ratings that he or she selected.

Both of these variables reflect ratio scales of measurement that seem approximately normally distributed. Implicitly, you have assumed a linear relationship between the variables, so you would compute the Pearson correlation coefficient using each participant's pair of scores. Next, you would determine whether the coefficient is significant.Your prediction is that there will be a positive correlation between mood and creativity, with higher pleasantness ratings associated with higher creativity scores. Because you are predicting a specific relationship, you perform a one-tailed test of significance. If r is significant, you can compute the linear regression equation, and then graph the regression line to summarize this relationship. Since you have essentially hypothesized that participants' mood predicts their creativity, word ratings would be the predictor (X) variable and Remote Associates scores the criterion (Y) variable. You would also compute the standard error of the estimate. This calculation communicates the "average" amount that participants' creativity scores differ from the score that was predicted using their mood score and the regression equation. You would also compute r2 to describe the proportion of variance in creativity scores that is accounted for by the relationship with mood. The larger this statistic, the more important mood is for understanding differences in creativity.

What are the important issues of validity in this study?

The first issue is content validity. You want to be sure that pleasantness ratings actually and only reflect the pleasantness that participants attribute to the words. Any flaws in the rating task, or any words that participants have experienced in a way that biases them, will mean that you are not measuring pleasantness as intended. Likewise,the Remote Associates Test must measure participants' ability to make remote associations. If, for example,participants do not know a word's definition, then they will give an incorrect answer for that question because of a variable having nothing to do with making associations.

Your other major concern is construct validity. You have defined creativity as the ability to associate diverse elements in an uncommon way. But creativity may be more than this. Also, your operational definition of being creative is "producing the correct remote associate." But participants who are creative and ingenious might see a unique association that the test does not anticipate, so they would be scored as incorrect. Likewise, the word-pleasantness ratings may reflect a participant's mood, but then again they may not. It is possible to be in a very poor mood and still think a word has a pleasant ring to it (aardvark springs to mind). Also, there are many aspects to a person's mood (anger, elation, sadness) and you cannot know which, if any, are reflected by these ratings.

For these reasons, you might select any number of published, objective (closed-end) mood tests from the literature (e.g., the "Profile of Mood States"; Schackman,1983). An established mood test is usually supported by considerable research showing its validity (e.g., McNair,Loor & Droppleman, 1984). Further, such tests often contain subscales, each of which measures a certain component of mood: The items in one scale would measure the factor of "depression," those in another "anxiety," in yet another "vigor," and so on. In fact, if you had employed such a test in the above study, you could use a participant's score from each subscale as one measure of mood, which you could then correlate with creativity. (In this case, the techniques of "multiple correlation and multiple regression" as well as "partial correlation" would be appropriate.) Of course, the above concerns about validity also apply to the Isen et al. (1987) experiments discussed initially. The advantages of your correlational study are that it conceptually replicates their laboratory study while reducing their demand characteristics, and, especially if conducted as a field study, it adds to the external validity and generalizability of the relationship between mood and creativity.

In designing this correlational study, you assumed that if participants performed the Remote Associates Test first, their performance might influence their subsequent mood scores. As an additional research question, you could actually test whether word pleasantness ratings are changed by the Remote Associates Test.

How would you design this study and create the stimuli?

You would create a pretest-posttest design, with participants performing a word-pleasantness rating test once before the Remote Associates Test and then once after. To prevent participants from merely reproducing earlier ratings, you would create alternate forms, providing two different sets of words to be rated. You would need to demonstrate high test-retest reliability between the two sets and also to counterbalance their use, using each set as the pretest or post test for one-half of the participants. Then you would examine the difference between the overall (mean) pleasantness ratings before and after the Remote Associates Test.

However, if you merely test a group of participants before and after the Remote Associates Test, you will encounter the problems of a one-group pretest-posttest design, having no idea why pleasantness ratings might change between the two testings. After all, you are measuring mood as it changes from moment to moment. Therefore, you should also test a control group, measuring mood twice using the same intervening interval as that for the experimental group, but without the Remote Associates Test. Then you would analyze the pleasantness ratings in a 2 x 2 mixed-design ANOVA for the between-subjects factor of experimental-control group and the repeated-measures factor of pre- and post test. If the interaction is significant,then differences in the pre- and post test mood scores depend on, and are thus influenced by, whether participants perform the intervening Remote Associates Test.
Top of Page


Additional Research Topics

In the research literature, you will find studies that deal with almost every behavior imaginable. Below are some common and not so common research topics with which you can become involved rather easily.


Belief in Astrology
Astrological horoscopes and personality descriptions contain Barnum statements such as "You are generally a happy person, although you sometimes become angry." A person's reactions to such descriptions tells us not only about the popularity of astrology but also about general belief systems. For example, Glick, Gottesman, and Jolton (1989) studied "believers" and "skeptics" to determine how they deal with positive and negative descriptions in horoscopes. The researchers proposed that both types of participants would be impressed with the accuracy of positive descriptions (e.g., "You are intelligent"), presumably because such descriptions affirm positive self-perceptions. But they also suggested that only believers in astrology would accept negative descriptions (e.g., "You are indecisive"), because their faith in astrology overrides their self-perceptions. In addition, the researchers tested whether a description not attributed to astrology would be accepted as more accurate, and whether experiencing positive or negative descriptions would alter participants' belief in astrology.

What type of design would you use to study these factors?
Testing any of these factors involves first the quasi-independent variable of creating conditions by selecting participants who are astrology believers or skeptics. Additional factors then include favorable versus unfavorable personality descriptions, and/or whether or not the descriptions are attributed to astrology. The dependent variable would be participants' rating of the accuracy of the descriptions. In their study, Glick et al. (1989) manipulated these factors as between-subjects factors, so that they could also examine the influence of a particular type of experience on participants' attitude toward astrology. They tested this influence with a pretest-posttest design, measuring participants' attitudes toward astrology before and then after they participated in the above conditions.

The researchers found that (1) skeptics accept positive personality descriptions as more accurate than negative ones, but believers accept a negative description as being equal in accuracy to a positive one, (2) believers and skeptics alike rated astrological descriptions as more accurate than nonastrological descriptions, and (3)skeptics, more so than believers, became more positive in their feelings toward astrology after receiving favorable horoscopes. Glick et al. proposed that skeptics are more open-minded, so they are more likely to change their opinion about astrology after receiving a positive description. Believers, however, seem to ignore the contradictions from a negative description, thus suggesting that believers and skeptics differ in terms of how they test these descriptions and evaluate the evidence that may support them.

What suggestions for additional research can you make?
First, it would be interesting to examine how believers and skeptics test astrological predictions. For example, you might present them with a horoscope that is ultimately confirmed or disconfirmed by what "coincidentally" happens later in an experimental setting.Then you could measure how the outcome of a prediction is evaluated by skeptics and believers, and how this evaluation influences their attitudes toward astrology.
Second, when believers are confronted with a negative statement that disconfirms their self-perceptions, they continue to maintain their original opinion. The implication is that believers in astrology do not apply the correct logic for testing hypotheses. Thus, you could test for differences in logic and hypothesis testing between skeptics and believers.
Top of Page

Serial-Position Effects
A highly reliable finding in the study of memory is that when participants recall a list of words, they tend to recall the first few and the last few items in the list best. Because recall changes as a function of an item's serial position in the list, this effect is called the serial position effect. Often in these studies, a list is spoken to participants which consists of a random string of single digits, and, for reliability, each participant is tested with a number of lists. To see this effect, we graph the participants' recall scores for the words in a list as a function of where in the list the words occurred. The graph produces a serial-position curve, as shown in Figure 23.2.

Note that the higher recall of the first items in the list is called the "primacy effect" and the higher recall of the final items is called the "recency effect."

To study the memory processes that produce these effects, researchers have employed numerous techniques that alter recall of the list. From the impact of a particular manipulation, they infer a characteristic of memory. For example, the final few items in a list are most recently entered into memory, so they may be in some sense "fresher." Some researchers (e.g., Crowder, 1982) have proposed that these items are better recalled because there is an "echo" in memory of the most recent spoken words.The final word in the list is recalled best because no words come after it that might otherwise interfere with its echo.

How would you test this explanation?

Researchers have tested this proposal by speaking an additional word at the end of the list, called a "stimulus suffix." When the lists consist of digits, the suffix for every list is usually the digit zero. So as not to increase the effort that participants must apply to remembering the list, they are told not to remember the zero; it is merely the signal to recall the list, and they are to otherwise "ignore" it. In control conditions, a tone is the recall signal. Because the tone is so different from the list of words, it should not produce interference, so recall of the list in the control group should exhibit the typical high recency. Conversely, because the zero is heard, it should reduce the freshness of the final list items and fill the echo, so recall of the list in the experimental group should exhibit reduced recency relative to the tone condition. In fact, it does, and the lowered recency is called the "suffix effect." (See Turner et al. [1987] for an overview.)

The reason that the suffix disrupts recency is not known. Originally, its effect was thought to be limited to spoken words only. Yet suffix effects have also been found when the zero is silently mouthed by the experimenter or by the participant, and among deaf participants when the list and suffix is presented using sign language. In addition, suffix effects have been shown to occur when spoken words in the list have different vowels (gap, got, gut) but not when they have different consonants (gap, tap, map). Thus, the suffix effect is not restricted to auditory stimuli, yet it does not always work with auditory stimuli. This inconsistency makes the notion of an "echo" very suspect.

Although there have been many explanations of this effect, one that has received virtually no attention concerns the assumption that participants can in fact "ignore" the zero at the end of the list. Participants are presumably equally able to ignore a tone and "zero," and both readily signal that the list is over. Then the suffix has its effect simply because in some way it fills the echo, decreasing the freshness of the final items in the list. But how do participants know that the word zero is the one to ignore unless they first pay attention to it? They must identify this word to recognize that it is not part of the list to be remembered. A tone, by contrast, can be ignored instantly, because it is so obviously different from the preceding list of words. Can (or do) participants ignore the zero? If not, is the zero at first treated as part of the list, thereby confusing participants? Does a list followed by a zero require greater attention than a list followed by a tone, and does this greater attention come at the expense of recalling the list?

How would you study whether participants do ignore the suffix?
You might measure the reaction time of participants to determine whether there are differences in their ability to recognize the end of a list when it is signaled by a tone or zero. If the zero takes longer to recognize, then there is something about it, in terms of the attention it requires, that is different from the control condition. If so, then a confounding has occurred, because the tone and the zero are not equivalent signals for the end of the list.

Alternatively, you might first identify some words that are easy to recognize and some that are difficult to recognize. Then, using these words in place of the zero, you can see whether they alter the recency effect. Or you might give participants a cue to forewarn them that the zero or tone is about to occur. For example, you might ring a bell just prior to the zero. If reading the list aloud, you could change your tone of voice and inflection, or make a movement, to signal that the zero is about to occur. If such cues eliminate the suffix effect, then you have evidence that the zero normally reduces the recency effect because of the attention it demands or the confusion that it produces.
Top of Page

Attractiveness and Height
Evidence suggests that, in our society, taller men are perceived as more honest, competent, and attractive. For example, men described as tall when advertising in the "Personals" column of a newspaper tend to receive more responses (Lynn & Shurgot, 1984). Accordingly, Shepard and Strathman (1989) investigated whether women prefer to date taller men and whether they consider taller men more attractive. They also investigated whether men prefer to date shorter women and whether they consider shorter women more attractive.

How would you study the relationship between height and attraction? Shepard and Strathman conducted a correlational study, presenting males and females with a questionnaire that asked for the number and height of their recent dates as well as for a rating of each date's attractiveness. Participants were also asked whether they preferred to date a person who was shorter than, taller than, or the same height as themselves. The researcher's found that females reported that they dated tall males more frequently than short or medium males. They also reported a preference for taller men, but they did not rate their taller dates as more attractive. Interestingly, short and medium-height males reported that they went out on dates just as often as did tall males. Overall, the males preferred shorter female dates, rated them as more attractive, and dated them more frequently.

What problem do these self-reports present?

These participants' self-reports may have been unreliable for several reasons: (1) The subjects might have inaccurately estimated a date's height, especially after the fact; (2) the date's personality and compatibility might have influenced participants' perception of the date's attractiveness; and (3) a "bad" date might have biased participants so that they remembered their dates as shorter or taller than they actually were. Further, as Shepard and Strathman noted, the females reported fewer dates with short or medium-height men, but the males, regardless of their height, reported the same frequency of dates. Since the shorter males were dating somebody, the self-reports of either the males or the females must have been in error.

What suggestions for additional research can you make?

Rather than relying on self-reports about participants' past dating experiences, Shepard and Strathman conducted an additional experiment in which they manipulated height and measured attractiveness. Here, they presented participants with a photograph of the upper bodies of a male and female facing each other. In the photograph, the male was either 5 inches taller, the same height, or 5 inches shorter than the female. Females rated the man as more attractive when he was taller. Males, however, did not rate females differently depending on her height.

You might also determine why the females' reports of seldom dating short men does not correspond to the high frequency of dating reported by short men. Perhaps the males or females are erroneous in their reports because of pressure to give the socially desirable response. If so, the number of erroneous reports should increase when the researcher strongly implies such demand characteristics through instructions or the task. Also, you might investigate whether the contradictory results occur because of a definitional problem, having to do with the term taller. Stereotypically, most people would agree that a man whose height is 7 feet is "tall," and that one who is 5 feet is "short." However, taller can be a relative personal term, because for a woman whose height is 4 feet 6 inches, a man with height of 5 feet is "taller." To what extent do males and females use a stereotypic or personal definition when selecting or describing dates?

There is an ecological issue here as well: Women are, on average, shorter than men. The opportunities for dates are greater, then, if women accept taller men and men accept shorter women, so their "preferences" may simply reflect the facts of life. Does the fact that most men are taller than most women lead to a stereotype of the ideal date? As we saw when we discussed Horner (1972), people tend to anticipate negative consequences when violating a sex-role stereotype. To what extent do the above studies measure stereotyped responses instead of actual attraction? And, finally, is it appropriate to conclude that being taller always makes a male more attractive to a woman? Is there a point where taller is not better but becomes "too tall"?
Top of Page

Influence of Color
Research from environmental psychology suggests that the color of walls,furniture, or floors has an influence on various aspects of behavior. For example, "warm" colors (those close to red) are often believed to be arousing, to increase physical performance, and to improve mood. And "cool" colors (those close to blue) are believed to be soothing, to lower performance, and to have a dulling effect on mood.

How would you design a study to test these beliefs?

One obvious way to manipulate the color in the environment is to test participants in different rooms that are painted a different color. In each room, you can have participants perform a physical task and/or provide responses that indicate their mood. For example, Kwallek,Lewis, and Robbins (1988) asked participants to type business forms for 20 minutes after placing them in either a red or blue "office," and then had them complete a questionnaire describing their anxiety, mood, and general arousal. After resting in another room, participants either returned to the same colored office or were switched to the different colored office, performed additional typing, and completed another questionnaire. As it turns out, the participants who were moved to the different colored office made significantly more typing errors. And the interaction showed that more errors occurred when participants moved from the blue to the red office than when they moved from red to blue. The mood data indicated that participants remaining in the red office showed greater anxiety and stress, participants remaining in the blue office showed greater depression, and participants who were moved to different colored offices showed the greatest level of general arousal.

Because moving the participants into a different colored room influences their behavior, a problem with this study is that placing participants in the first office constitutes changing rooms in an uncontrolled manner. As an alternative approach, you might include a control condition that allows you to "reverse" the influence of any previous colored room.

How would you design this alternative study?

Hamid and Newport (1989) studied how the color of a room influences behavior using an ABACAB reversal design. The control or baseline---condition A---was a gray room that presumably neutralized the influence of other colors. A pinkroom was condition B, and a blue room was condition C. The researchers measured hand-strength and mood in six young children after they had experienced each colored room. Greater physical strength and more positive mood was found in the pink conditions.

What suggestions for additional research can you make?

If there is an influence of color on arousal and performance, then it should extend to a variety of colored objects and situations. Would similar effects on typing errors occur depending on the color of the paper in a typewriter or the color of a computer display screen? In the previous chapter we saw the influence of black uniforms on aggression, but the present research suggests that other colors of clothing may also affect behavior. Certainly the idea behind "dressing for success" is that a worker's or job applicant's clothing style influences how that person is perceived and judged by supervisors and co-workers (e.g.,Forsythe, 1990). Does wearing reddish or bluish clothes also influence perceptions? We also discussed how participants misattribute their arousal due to fear as being due to sexual attraction. Is it possible that arousal from wall color or the color of clothing could be misattributed as sexual attraction? As an applied topic, the colors in business environments may be important for maximizing worker productivity. Similarly, given that maximum arousal is sought in athletic events, does wall color influence, for example, weight-training success? And, finally, do certain colors play a role when we become over aroused, as discussed below?
Top of Page

Self-Consciousness and "Choking Under Pressure"
An unusual behavior to study is the phenomenon of "choking under pressure." Baumeister (1984) proposed that inferior performance ("choking") occurs when we feel so pressured to perform well that we focus too much attention on the process of performing a task and not enough attention on the outcome of the task. Thus, in a self-fulfilling prophecy, the more we worry that things are going badly, the worse they go. Baumeister also proposed that personality characteristics play a role, such that, for example, a person who is more self-conscious should exhibit greater "choking."

How would you design a study to test these proposals?

In an experimental setting, you could create a task for participants in which you manipulate the amount of pressure they feel. The task should be a simple one in which errors are easily measured. For example, you can have participants quickly trace a pencil maze, solve simple math problems, or perform a reaction-time task in which they press one of several buttons to make a correct response. Consider Heaton and Sigall (1991) who first identified participants along the quasi-independent variable of high or low self-consciousness. For experimental realism, they had participants in each condition form a "team." Then they manipulated the pressure situation through the additional factors of: (1) indicating that the participant's team was behind or ahead, and (2) having participants perform when alone, when watched by their team, or when watched by the opposing team. Choking was measured by the time it took to place variously shaped pegs in their corresponding holes.Participants low in self-consciousness choked depending upon the audience characteristics, while those high in self-consciousness choked when their team was behind.

What suggestions for additional research can you make?
Apparently, people differing in self-consciousness perceive the source of pressure differently. The results for participants with high self-consciousness suggest that they choked because of competitive pressure, so you might manipulate the situation for them in terms of the amount of competition involved. The results for participants with low self-consciousness suggest that they choked because of their need for social approval, so for them you might manipulate the social setting. Also consider the possibility of replicating the above relationship between choking and pressure in a field setting: Perhaps a "stranger" could watch someone playing a game in a video arcade. As extensions of this research, the above results suggest that other personality traits, in terms of whether a person is task or socially oriented, may play a role in choking. Thus,for example, you might correlate participants' scores on these traits with how much they choke or, as above, examine how these traits interact with different conditions that promote choking. Alternatively, given that high pressure is a form of arousal, you might ask whether the color of the room in which a participant is being tested influences choking. Finally, you might ask whether a "lucky charm,"such as a lucky shirt, is considered lucky because a participant did not choke when wearing it. (And what color is it?)


BORDER=0
Site Map I Partners I Press Releases I Company Home I Contact Us
Copyright Houghton Mifflin Company. All Rights Reserved.
Terms and Conditions of Use, Privacy Statement, and Trademark Information
BORDER="0"