Content of review 1, reviewed on January 21, 2023
The paper explains basic ideas of natural experiments and provides useful information on the topic. It seems like the paper says, “Hey, psychologists. We may find a good randomized experiment even outside our labs!” Timely & proper. I fully agree that the idea has been rather disregarded in the field of psychology and that this paper would be useful for many psychological researchers. Below, just a couple of suggestions that the authors may (or may not) consider to improve the manuscript.
Narrow Definition of Experiments. I know many psychological researchers view experiments as randomized experiments (e.g., APA’s definition of experiment) but I’ve thought it’s somewhat misleading. Experiments are rather general, and basically any procedure that helps researchers test the hypothesis can be an experiment. Randomization is one of many options researchers could choose and rely on to make a plausible causal conclusion. I think it’s not essential to be a scientific experiment. Think about one who investigates the effect of hitting an iron when it’s hot on the quality of steel (i.e., strength). He or she split the original hot steel into two pieces and hit the one while leave the other as it is. Finally, compare the strength of the two pieces. I think this is an experiment even though there is no randomization. I think this is a case in many scientific labs (maybe not in psychological sciences but in many natural sciences). In fact, reasoning causal inference in such cases is possible even without randomization because of unit homogeneity (as said by Holland, 1986). My point is that the the term, experiments, is a general one that may or may not have the randomization procedure but, in this paper, experiments just mean randomized experiments. Later, the authors also use “controlled experiments” but I feel this is not clear either. One may control the assignment procedure (i.e., randomization) or control the other procedures, for example, like fairly splitting the iron into two pieces. Controlled experiments may not only mean the randomized controlled trials. I’m ok with the narrow definition of experiments but I think it’s better to clearly present the usage of the term at the beginning of the paper.
IV Assumptions and Linearity. The authors consider a linear model for IV estimation. Thus, they didn’t discuss complier average treatment effect or local average treatment effect in the IV estimation. I’m fine with it. Such a nonparametric identification could be beyond the scope of this introductory paper. But, then, the linearity should be included as one of the IV assumptions. Indeed, Figure 2 shows that the authors considered a linear model but it was not clearly written in the main text. Without the linearity, by applying IV estimation, one cannot identify the average treatment effect (only could identify CATE or LATE) even though all the three assumptions the authors listed in p. 17 for IV variables are met.
Also, I found that the IV assumptions the authors gave are a bit strong. For example, under linearity, it is ok that Z (IV) doesn’t cause X (treatment) for the relevance condition. That is, it’s enough for Z to be associated with X via some unknown confounders. Of course, saying that Z affects X is an easy way to describe the basic feature of IV variables, but I think it’s worth to mention it for readers. Also, indeed, the other two assumptions, exclusion restriction & exogeneity conditions, can be considered one assumption. For example, see Pearl’s (2009) book, causality, for the graphical criterion of IV variables. It says, that i) Z should be d-connected with X, and ii) Z should be d-separated with Y in a modified DAG where the causal path Z -> Y is removed. The second condition embraces the authors’ restriction & exogeneity conditions. I personally think Pearl’s criterion is easier and have thought what the benefit we would get by distinguishing the exclusion restriction & exogeneity conditions.
Randomization vs. As-if randomization. I left unconvinced that such a distinction is really necessary and useful. The distinction is highlighted in Figure 1 but is it crucial to understand for researchers who want to implement natural experiments? I don’t know what I could learn more from such a distinction, for example, for IV designs.
Minors:
Figure 2 & the note considered standardized variables. Need to mention in text
Why is “the randomization of offspring genotype during meiosis” just a as-if randomization not an actual randomization? Not so sure on this distinction and explanation
Source
© 2023 the Reviewer.
Content of review 2, reviewed on October 03, 2023
Review of “Natural experiments: missed opportunities for causal inference in psychology”
I appreciate that the authors have considered the previous comments and revised the manuscript accordingly. I still hold a favorable view of the paper. Below, I will list just a couple of thoughts that the authors may or may not find useful for further improving the manuscript.
Personally, I thought the current introduction is a bit too lengthy (almost 8 pages) and contains many heterogeneous details. Maybe it's better to restructure the paper such that the introduction would be condensed, and some contents would be separated from it; for example, the survey results may constitute an independent section following the introduction. Readers would benefit from a clearer structure and more balanced section lengths.
I believe that the authors' addition of Figure IV (Fig. 3) was in response to my previous comment, and I appreciate their serious consideration of it. However, personally, I find the revision somewhat excessive. Such analytical details may not be necessary, especially when compared to other sections. It might suffice to mention that it's acceptable for IV and the treatment to be associated. My intention for this introductory paper was to provide correct information, possibly by simply adding one or two sentences, rather than incorporating too many technical details. I mean, although the previous description was incorrect (this is why I commented it), I don’t think the issue itself is crucial especially for novices.
Related to the above, I believe that the current description of IV designs may not be useful for most psychologists. It primarily focuses on the estimation issue (p. 20) without providing sufficient psychological contexts. In fact, it presents only one study example from criminology. I'm not certain whether psychologists would be motivated to use IV designs based on the current description. I suggest introducing a real study context where they can better understand the benefits of IV designs. In many experimental studies, noncompliance is a common issue even though they randomized the assignment process. Researchers often address it by checking fidelity and excluding non-compliant cases. However, such hasty deletions can introduce problems in terms of causal inference. I believe that IV designs offer a useful approach to handling noncompliance issues, and this should be well known to many psychologists.
Although they responded to my feedback, I still remain unconvinced by the concept of 'as-if' randomization and its distinction from actual randomization. Perhaps the authors should provide a clear definition for it. While I agree that the distinction may be necessary for natural experiments (as they correctly pointed out in their response), I'm still uncertain about its importance in IV designs. Although Figures 1 and 2 seem to suggest that it is crucial for IV designs as well, I couldn't find any such implication in the paper.
Minors:
p. 18, line 30: The description seems to suggest that the balance check is crucial only with as-if randomization. However, I believe it is also important in a true randomization case.
p. 19, line 7: The phrase "A natural experiment can serve as not only a treatment variable..." is somewhat confusing. They may have meant, "A naturally randomized variable can serve..."
p. 20, line 17: I believe that indeed, all equations here require standardization of variables, not just some.
p. 24, line 44ff: “The statement, 'This assumption cannot be tested directly; violations could be suggested by discontinuities in the density of the assignment variable around the threshold,' may be misleading. In RD designs, a discontinuity in the density of the assignment variable around the threshold is expected. What researchers should focus on are anomalies or irregularities around the cutoff, rather than the discontinuity itself.
Source
© 2023 the Reviewer.
References
P., G. M., Adam, A., C., A. R., Susanne, B., Tobias, E., Paul, H., R., M. S., Sven, R., Alexandra, Z., M., R. J. 2024. Natural Experiments: Missed Opportunities for Causal Inference in Psychology. Advances in Methods and Practices in Psychological Science.
