The problem of dream content analysis validity as shown by a bizarreness scale
Aus Small World
Daniel Erlacher - Institute for Sport and Sport Science, University of Heidelberg, Germany
Michael Schredl - Sleep laboratory, Central Institute of Mental Health, Mannheim, Germany
Bitte zitieren nach: Schredl, M. & Erlacher, D. (2003). The problem of dream content analysis validity as shown by a bizarreness scale. Sleep and Hypnosis, 5, 129-135.
Inhaltsverzeichnis |
Abstract
Dream content analysis is one of the basic methods in psychological dream research. Whereas reliability issues has been addressed in the literature quite often, the validity of dream content analysis was rarely studied in a systematic way. The present study investigated the validity of a bizarreness scale. I.e., the question whether an external judge estimates the number of bizarre elements per dream in the same way as the dreamer herself or himself was studied. As reported previously for dream emotions, a marked underestimation regarding the external judges was found. Thus, the findings indicate that the written dream report yielded not a complete picture of the original dream experience and hence the validity of dream content analysis which is based on written dream reports is limited at least in several areas. How strong the validity problem affects the results of content analytic studies and which dream characteristics are most susceptible to this kind of error should be investigated in future studies using a similar methodology as the present study.
keywords: dream recall frequency, dream content, reliability, stability
Introduction
Dream content analysis is one of the basic methods in psychological dream research (cf. Hall & Van de Castle, 1966; Winget & Kramer, 1979; Domhoff, 1996). The following example illustrates this approach. A psychotherapist speculates that depressive patients dream about rejection more often than comparable healthy controls. A research colleague elicits 100 dream reports of healthy subjects and 100 dream reports of patients with depression and constructs a scale with explicit coding rules that measures rejection of the dream within the dream, i.e., the occurrence of at least one situation in which the dreamer is rejected by other dream characters will be coded as 1, otherwise a zero will be coded. It is of major importance that the scale is developed without knowing the dream material that will be analyzed. In the next step, the dream reports are sorted in random order and an external judge rates every single dream whether rejection as defined by the content analytic scale is present or not. After the judging procedure, the dreams were assigned to the two groups which are compared statistically on order to test whether depressed patients have reported rejection dreams more often. The advantages of this approach are clearly visible: content differences are reflected quantitatively and are thus accessible for statistical testing. In addition, the subjectivity of the experimenter is minimized by an external judge using a scale with explicit coding rules and the study is replicable, e.g., analyzing new dream samples. These are important quality criteria of common scientific practice.
A small influence of subjectivity judging the dream reports is present if different independent judges rate the same dream material in the same way. These coefficients are termed interrater reliability and depicted as exact agreements (e.g., Domhoff, 1996) or correlation coefficients (e.g., Hauri, Sawyer & Rechtschaffen, 1967). Systematic studies which investigate cutoffs above which the coefficient is sufficiently high however have not been carried out yet. This kind of reliability that is related to application of the scale should not be confused with the reliability coefficients that are related to the stable measurement of interindividual differences and that play an important role in classical test theory (cf. Rost, 1996). Schredl (1998) has demonstrated that with regard to specific dream characteristics up to 20 dream reports should be elicited per person due to the large variability of dream content from dream to dream. This number of dreams would yield a sufficient interitem consistency. Within this context, the measurement of a single dream is analog to the response to a test item; with increasing number of items/dreams the reliability coefficient increases also (e.g., Rost, 1996).
In addition to reliability, validity is an important quality criteria. Validity designates the extend with which the measured score is related to the underlying dimension that should be measured by the instrument/scale. Often the authors (e.g., Hall & Van de Castle, 1966; Domhoff, 1996) rely on face validity, e.g., an aggression scale like “Are there any aggressive interactions present within the dream?” measures aggression. This seems plausible and appropriate, but another kind of validity problems which are outlined in the following using the example of dream emotions should be considered. Dream content analysis does not aim at the description of the dream report but at the measurement of the subjective dream experience. Several authors (Hobson et al., 1987; Strauch & Meier, 1996; Schredl, 1999) emphasize that the dream report – even recorded as elaborated as possible – is only a more or less complete recall of the dream experience including emotions, actions, images, thoughts etc. Colors, for example, are rarely reported spontaneously but specific probing yielded much higher figures (Kahn et al., 1962). This is also valid for other dream characteristics such as emotions (Nielsen, Deslauriers & Baylor, 1991; Merritt et al., 1994), tactile sensations (Strauch & Meier, 1996) and bizarre elements (Revonsuo & Salmivalli, 1995). On the other hand, Stern, Saayman and Touyz (1978) demonstrated that specific instructions for recording the dreams (focusing on settings in nature or urban settings) may alter dream content markedly. I.e., there is a dilemma between eliciting dream reports without any further questioning and giving instructions for recording specific dream characteristics in a more detailed way.
Schredl and Doll (1998) analyzed the validity of two content analytic scales measuring emotions based on a sample of 133 home dream reports. The self-rated dream emotions (positive and negative) measured on two four-point scales (none, mild, moderate, strong) have been chosen as criteria. The same two scales have been used for the external judgement procedure (cf. Schredl, 1999). The judges were instructed to code also emotions which they can infer from the dream action. The second method used in this study was the emotion scale of Hall and Van de Castle (1966) which measures only explicit mentioned emotions (5 classes: anger, apprehension, happiness, sadness, confusion). In the fictive dream example “I saw a monster and ran away.” an external judge is not allowed to code any emotions according the rules of Hall and Van de Castle (1966); using the rating scales of Schredl (1999) a coding of negative emotions is possible and probable. Whereas 0.8% of the dreamers did not report any dream emotions (self-rated), the external judgement applying the rating scales yielded 13.5% dream reports without emotions and – according to the Hall and Van de Castle scale – 57.9% dreams are void of emotions. I.e., the external judges relying solely on the dream report without any further information were not able to extract all emotions experienced by the dreamer (assuming self-rating of emotions upon awakening are predicative). Interestingly, the correlation coefficients between external judgement and self-rating war high (positive emotions: r = .557; negative emotions: r = .669). With respect to reliability coefficients of about r = .80 (cf. Schredl, 1999) these correlations which represents criteria validity can be considered as sufficiently high since validity coefficients can not exceed reliability coefficients. Another study, however, reported much smaller values of criteria validity, e.g., r = .31 (anxiety; Riemann et al., 1985).
To summarize, the validity of dream content analysis which is based on the dream reports might vary considerably with the investigated dream characteristics. One can imagine that the number of dram characters, for example, might be measured validly, but the measurement of dream emotions is much more complicated and less valid (underestimation by external judges despite the relatively high correlation).
Carrying on the work of Schredl and Doll (1998), the present study investigated the validity of a bizarreness scale. I.e., the question whether an external judge estimates the number of bizarre elements per dream in the same way as the dreamer herself or himself was studied. For measuring bizarreness, a relatively narrow definition in comparison to Hobson et al. (1987) was chosen since the authors themselves stated that the coding of improbable events/features are much more difficult than coding events/features which are not possible in waking. Since the scope of the paper did not encompass the development of a bizarreness scale but to investigate validity issues, a scale as simple as possible was chosen. A detailed definition is presented in the method section. In order to avoid the possible effect of instructions on dream content, the participants received their instruction to evaluate the bizarre elements directly after recording the dream. It was expected – parallel to the measurement of dream emotions – that a marked underestimation of the bizarre elements by the external judges will be found despite of a relatively high intercorrelation of the two measures.
Method
Measurement instruments
The questionnaire comprised questions regarding sociodemographic variables (age, gender, study subjects) and a seven-point scale measuring dream recall frequency (Schredl, 2002). The scale’s retest reliability is high (r = .83; average of 70 days; Schredl, 2002). In addition, the participants were given a form with the following written instruction on top: “Please record your dream as detailed and elaborate as possible. Allow enough time to avoid that you forget something.” The bizarreness scale (see below) and a dream evaluation form were given to the participant in a sealed envelope (see procedure section). The dream evaluation form includes the instruction that dream content must not be altered during the evaluation process.
Bizarreness Scale
The bizarreness scale that had been applied in the present study was based on the research of Hobson et al. (1987) and Revonsuo and Salmivalli (1995). In order to facilitate to judgement process, only elements which are impossible or extremely improbable in waking life should be coded. In addition, a definition of non-bizarre element was included.
Definition “Bizarreness” General: As bizarre objects/actions/persons etc. are defined that do not exist or are impossible in waking-life reality.
1. Incongruity
- An element inconsistent with waking life (e.g., a dog talking, dreamer has three arms)
- Discrepancy from physical laws (e.g., flying, time travels for example into the Middle Ages)
- Mismatching features (e.g., standing in a burning house and freezing, failing an important examination and feel joy)
2. Discontinuity
- Changes of features: elements disappear, appear suddenly or change shape (e.g., dreamer talks to a friend who changes into an animal)
- Impossible or very improbable alterations in familiar settings (e.g., being in the living room and snake heads jut out the wall)
3. Uncertainty
- Obscure or undetermined elements (e.g., unknown monster, plant-like object emits indefinable noises)
Notice: The examples given in parentheses serve only as illustration and better comprehension of our bizarreness definition.
Definition “Non-Bizarreness” General: As non-bizarre are regarded objects/actions/persons etc. that are possible in waking life. These might by extraordinary or improbable but nevertheless possible.
The following examples which have been grouped along the above presented categories depict non-bizarre elements.
1. Incongruity
- Madonna is singing in your living room.
- Wax-coated dumplings are on the lunch table.
2. Discontinuity
- The wallpaper of your living room is green within the dream and not white as in reality.
3. Uncertainty
- You are in an unfamiliar setting like an isolated island.
Procedure and Participants
The participants received the questionnaire, the recording form and a sealed envelope which must be opened after recording the dream. Subsequent to reading the bizarreness and non-bizarreness definitions, the participants evaluated and recorded the bizarre elements that occurred in the dream. In addition, a short explanation for what reasons this element was rated as bizarre was requested. The number of bizarre elements per dream was the variable included in the analyses. Dream reports then were typed and coded separately by two independent judges. These two judges were trained with 40 dream reports stemming from a different study (Schredl et al., 2003). Within this training period discrepancies were resolved by discussion. The interrater reliability was determined as Pearson correlation for the number of bizarre elements per dream.
Overall, 46 psychology students (38 women, 8 men) participated. The mean age was 22.0 ± 3.7 years. Each participant contributed only one dream.
Results
On average, the participants recalled dreams on about three mornings per week (2.79 ± 2.17). The mean word count of the dream reports amounted to 176.2 ± 144.6 words. The interrater reliability between Rater 1 and Rater 2 was r = .910 (Pearson correlation for the number of bizarre elements per dreams). The exact agreement of recognizing the bizarre element in the same position of the dream text was, however, relative low (42%) between the two judges. The means of the number of bizarre elements variable are depicted in Table 1. Whereas the correlations between judges and self-rating were very high, a marked underestimation of bizarre elements by the judges was found. In addition, the means of Rater 1 and Rater 2 also different significantly (t = 2.4, p = .0203).
Table 1. Means and standard deviations (SD) of “Number of bizarre elements per dream” and the comparison with the self-rated value.
| Variable | Mean ± SD | Statistical test 1 |
| Self-rating | 2.54 ± 3.24 | |
| Rater 1 | 1.24 ± 2.65 | t = 5.5 p < .0001 |
| Rater 2 | 0.65 ± 1.18 | t = 5.2 p < .0001 |
| Mean of Rater 1 and Rater 2 | 0.95 ± 1.88 | t = 5.6 p < .0001 |
Note. 1 t-test for dependent samples (N = 46)
Dream example
“My two sons and I are on wooden terraces of a stadium. The two are quarreling. They stand opposite of each other. I do not understand what they are saying. Pierre lay his hand onto Marcel’s shoulders (who is much smaller). He is shaking him, Marcel resists the pressure. I shout: “Stop.” In that moment Marcel falls backwards down the steep wooden seat rows and interspaces that lay behind him (several meters). I go down. Marcel is lying there, calm, apparently uninjured, with a sleeping face. I am very excited and consider whether he is still alive. I address him but he does not respond.” The dreamer reported two bizarre elements within this dream sequence. First, she stated that nobody will be unharmed after such a fall. The two external judges also rated this element as bizarre. The second bizarre element is related to the fall itself. The dreamer made two drawings (see Figure 1) in order to illustrate that it is impossible to fall down this way (because of the interspaces). The first drawing made during recording the dream was not as illustrative than the second drawing. The judges whom only the text was presented did not code this bizarre element.
Figure 1. Drawings by the dreamer illustrating a bizarre element (see section dream example). Figure 1a was drawn with the dream report showing seat rows. Figure 1b was drawn along with the rating of the bizarre element: fall of the dream character Marcel.
Discussion
The results support the at the beginning formulated hypothesis that external judges code less bizarre elements than the dreamer herself or himself. In the following, the implications of this finding are discussed.
First, the issue of interrater reliability should be looked a carefully. Despite the high correlation, a marked and significant man difference between the judges was found, i.e., Rater 1 coded more bizarre elements than Rater 2. Regarding the application of the scale, one should consider a more intense training period in order to minimize discrepancies. In general, it is desirable to specify interrater reliability that is not computed as exact agreement not only by the correlation coefficient but also by mean comparisons. This has been done very rarely in the literature (cf. Schredl, 1999). It is also of importance that the reliability of the variable used in the analysis, namely “number of bizarre elements per dream”, was sufficiently high despite the relatively low exact agreement of the two judges. Again generally speaking, is seems adequate to compute the interrater reliabilities for the indices presented, e.g., male/female percent, aggression percent (e.g., Domhoff, 1996) in addition to the percentages of exact agreement of the content analytic procedure itself.
In what way should the differences between self-ratings and external judging be interpreted. First, it seems quite difficult to estimate the reliability of the self-ratings. It might be reduced by misinterpretations of the bizarreness scale or evaluation errors. The method of computing interrater reliability coefficients is not applicable but maybe the problem can be minimized by training the participants in the same way than the judges have been trained (coding independent dream material and discussing discrepancies). One should keep in mind, however, that focusing on particular dream characteristics before recording (and actually having) the dream might affect dream content considerably (see discussion below).
The aim of dream content analysis is the measurement of the original dream that has been dreamt and Hobson and Stickgold (1994) summarized their experience: “One lesson that we have learned is that the open ended inquiry into dream mentation that has been typical of most past work (including our own) is grossly inadequate (p. 10).” The present results and the findings of Schredl and Doll (1998) clearly indicate that dream content analysis based on the external judgement of written dream reports yields marked underestimation regarding number of bizarre elements and dream emotions. I. e., the dream report that is recorded by the participants in response to an open ended question (see method section of the present study) and the subsequent carried out content analysis is not sufficient to elicit all subjective experiences of the dream. On the other hand, it seems plausible that for other dream characteristics, e.g., occurrence of sexual interaction, aggression, death themes, this problem plays only a minor role. I.e., this kind of studies (comparison of self-ratings with external judgements) should be carried out in a systematic way for a variety of different dream characteristics.
In order to solve the problem of validity, Hobson and Stickgold (1994) suggested to use affirmative phenomenological probes, i.e., to ask the dreamer explicitly about the characteristic studied. A problem which is not addressed by the authors is the possible effect of the instructions, e.g., to record dream emotions explicitly (Merritt et al., 1994), on subsequent dreams. Stern, Saayman and Touyz (1978) were able to demonstrate that instructions did affect dream reports. The question whether these instructions result in a more detailed description of the dream experiences (desirable) or whether a bias regarding dream contents is the result should be studied in a more detailed. way. One might design a study, for example, which used the methodology of the present study (instructions after recording the dream) and a second sample of participants who receive the instructions right on the beginning of the study. Another idea which was, for example, applied by Leuschner et al. (1994) and is discussed by Hobson and Hoffman (1984): using drawings made by the dreamer in addition to the written dream report. It is possible to use coding systems with high interrater reliability coefficients (cf. Leuschner et al., 1994).
To summarize, the written dream report yielded not a complete picture of the original dream experience and hence the validity of dream content analysis which is based on written dream reports is limited at least in several areas. How strong the validity problem affects the results of content analytic studies and which dream characteristics are most susceptible to this kind of error should be investigated in future studies using a similar methodology as the present study.
References
- Domhoff, G. W. (1996). Finding meaning in dreams: a quantitative approach. New York: Plenum Press.
- Hall, C. S., & Van de Castle, R. L. (1966). The content analysis of dreams. New York: Appleton-Century-Crofts.
- Hauri, P., Sawyer, J., & Rechtschaffen, A. (1967). Dimensions of dreaming: a functional scale for rating dream reports. Journal of Abnormal Psychology, 72, 16-22.
- Hobson, J. A., & Hoffman, S. (1984). Picturing dreaming: Some features of the drawings in a dream journal. In M. Bosinelli & P. Cicogna (Eds.), Psychology of dreaming (pp. 11-30). Bologna: CLUEB.
- Hobson, J. A., Hoffman, S. A., Helfand, R., & Kostner, D. (1987). Dream bizarreness and the activation-synthesis hypothesis. Human Neurobiology, 6, 157-164.
- Hobson, J. A., & Stickgold, R. (1994). Dreaming: a neurocognitive approach. Consciousness and Cognition, 3, 1-15.
- Leuschner, W., Hau, S., Brech, E., & Volk, S. (1994). Disassociation and reassociation of subliminally induced stimulus material in drawings of dreams and drawings of waking free imagery. Dreaming, 4, 1-27.
- Merritt, J. M., Stickgold, R., Pace-Schott, E., Williams, J., & Hobson, J. A. (1994). Emotion profiles in the dreams of men and women. Consciousness and Cognition, 3, 46-60.
- Nielsen, T. A., Deslauriers, D., & Baylor, G. W. (1991). Emotions in dream and waking event reports. Dreaming, 1, 287-300.
- Revonsuo, A., & Salmivalli, C. (1995). A content analysis of bizarre elements in dreams. Dreaming, 5, 169-187.
- Riemann, D., Beyer, J., Wiegand, M., & Berger, M. (1985). A comprehensive manual for scoring manifest dream content. In W. P. Koella, E. Rüther, & H. Schulz (Eds.), Sleep 1984 (pp. 355-357). Stuttgart: Gustav Fischer Verlag.
- Rost, J. (1996) Lehrbuch Testtheorie: Testkonstruktion (Textbook test theory: Test Construction). Bern: Huber.
- Schredl, M. (1998). The stability and variability of dream content. Perceptual and Motor Skills, 86, 733-734.
- Schredl, M. (1999). Die nächtliche Traumwelt: Eine Einführung in die psychologische Traumforschung. Stuttgart: Kohlhammer.
- Schredl, M. (2002). Messung der Traumerinnerung: siebenstufige Skala und Daten gesunder Personen. Somnologie, 6, 34-38.
- Schredl, M., & Doll, E. (1998). Emotions in diary dreams. Consciousness and Cognition, 7, 634-646.
- Schredl, M., Wittmann, L., Ciric, P., & Götz, S. (2003) Factors of home dream recall: a structural equation mode. Journal of Sleep Research, xx, xxx-xxx.
- Stern, D. A., Saayman, G. S., & Touyz, S. W. (1978). A methodological study of the effect of experimentally induced demand characterictics in research of nocturnal dreams. Journal of Abnormal Psychology, 87, 459-462.
- Strauch, I., & Meier, B. (1996). In search of dreams: results of experimental dream research. Albany: State University of New York Press.
- Winget, C., & Kramer, M. (1979). Dimensions of dreams. Gainesville: University of Florida Press.

