Content of review 1, reviewed on August 03, 2023
Thank you for the opportunity to review this manuscript. To summarize, this study investigates strategy use in a word list task. It is a close replication of a previous study (conducted by some of the same authors). The main hypotheses are about how strategy use changes over time. More specifically, the study tests whether strategy use predicts better memory performance, whether strategy use becomes more stable when repeating the same task, and whether strategy use increases during the first two task blocks.
Stage 1 Primary Criterion #1: Whether the authors provide a sufficiently clear and detailed description of the methods for another researcher to closely replicate the proposed experimental procedures and analysis pipeline, and to prevent undisclosed flexibility in the experimental procedures or analysis pipeline.
Mostly. There are a couple of points that need clarification (see detailed comments) or where authors refer to a different manuscript for the information (which I personally think makes it harder to replicate an experiment).
Stage 1 Primary Criterion #2: Whether the manuscript describes a sufficiently valid (i.e. close) and robust (e.g. statistically powerful) replication of the original study methods and rationale to provide an indication of replicability.
Unclear. This study doubles the sample size of the to-be-replicated experiment. However, this decision is not motivated; it would be very helpful to report what the effect sizes were in the previous study and conduct for example a simulation power analysis based on that to get a better sense for how well-powered this replication was.
Stage 1 Secondary Criterion #1: The logic, rationale, and plausibility of the proposed hypotheses.
The logic, rationale, and plausibility of the proposed hypotheses were clear and seemed plausible.
Stage 1 Secondary Criterion #2: The soundness of the methodology and analysis pipeline.
The methodology and analysis pipeline seemed sound, but please keep in mind that I do not consider myself an expert in Bayesian analyses.
Stage 1 Secondary Criterion #3: Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the results obtained are able to test the stated hypotheses.
I imagine that the authors were pretty confident in the difficulty level of their memory task given the results from their previous study. The possibility/absence of floor/ceiling effects is not something that is explicitly discussed in the current version of the manuscript. It would be helpful to make these decisions more explicit in the manuscript (plus any positive controls that may have been included). Rater correspondence was estimated and is reported.
Detailed comments:
Title/abstract:
From the title and abstract, it was not clear to me that the main question of interest was task strategy evolvement (how task use changed over time). You may want to consider making that aspect of the study more apparent in the title and/or abstract.
In addition, I found the abstract too general. After reading it, I only had a very vague sense of what participants actually had done and what hypotheses were tested. Again, stressing the hypotheses about task strategy evolvement here may be helpful in setting up readers’ expectations.
Introduction:
The introduction has one general paragraph and then goes straight into describing the authors’ previous study that they set out to replicate. As someone who is not extremely familiar with other research that has been conducted in this field, I was missing some additional context as to how these data fit into the more general picture/existing literature. Please clarify and potentially add additional context (i.e., previous relevant research that has been conducted on strategy evolvement) to the introduction.
Method:
When first reading the method, I got confused by how this study relates to the paper by Jylkkä et al. (2023). This became clearer as I read on, but I believe it would be helpful if authors introduced right in the beginning of the methods that the data in the current manuscript were collected as part of a bigger study that are published elsewhere. At one point, the sample is referred to as the control group (p. 7) – I would take that wording out because it suggests that there is more than one group (which is not true for this experiment).
There currently is not enough information on how sample size was determined. How large were the effects in the previous study? How confident can you be that the doubled sample size is adequate? Was a power analysis conducted? Please clarify.
This may be a personal preference, but I generally do not think it is helpful when authors refer to a different paper for additional methodological details (as is done here on ps. 6 and 8). This feels especially true for a replication where the goal should be to make methodological details as explicit as possible. I understand that it is a space saver, but including these details here would make it easier for readers to evaluate the methodological decisions that were made.
For example, I would find it helpful if the 18 to-be-remembered words and their psycholinguistic characteristics had been included. This would make it easier to evaluate the scoring criteria that were used later on – depending on how long the words were, expecting them to be typed correctly without any typos may be an overly harsh criterion, but this is hard to judge when you first have to find a different article to see the stimuli.
Am I understanding it correctly that participants had to remember the same 18 words three times? That is, any change in strategy in Block 2 may not be because of familiarity with the task but also because of familiarity with the words? Please clarify this aspect of the methods.
P. 9: “After investigating the qualitative features of the strategy reports coded as Other strategy, we decided to lump Other strategy together with Manipulation instead of Maintenance, contrary to what Waris, Fellman et al. (2021) had done.“ This decision needs more justification, as it is a free experimenter parameter. Why was this decision made? If this experiment were to be replicated again, how would one decide whether “Other strategy” should be lumped with “Maintenance” or “Manipulation”? May even be a decision on a participant to participant basis necessary/more accurate?
P. 10: What exactly is meant by “raters continued with consensus decision to solve the discrepancies in their codings”? Please make more explicit how a consensus was reached, so that this process can be more easily replicated.
Minor comments:
- References: For two citations, the DOI is not given as a link.
- P. 5, line 43ish: Em dash is not showing correctly.
- P. 7, line 40ish: Double semicolons
- Table 1: No non-binary participants or was this answer option not available?
- P. 9, line 41ish: Closing bracket missing
- P. 9: “The first reported strategy was coded into one of 8 different strategy categories” – the supplementary table actually includes ten different strategy categories?
Source
© 2023 the Reviewer.
Content of review 2, reviewed on September 20, 2023
I am happy with the changes the authors made. Thank for editing the title, abstract and introduction. I also really appreciated the additions to the methods (including the Figure) – this made it a lot easier to follow the methodology that was used.
Minor comment:
- Figure 1: Please define the abbreviation WLL in the figure caption.
Source
© 2023 the Reviewer.
Content of review 3, reviewed on December 11, 2023
Overall, I found the results and discussion clear and in line with each other. The only exception to this was that the replication of hypothesis 1—strategy use improves memory performance—was overstated (it was only partially replicated). In addition, I think it would be helpful if some changes were made to the data files and code provided to facilitate future use.
- In the abstract, I found it confusing to use numbers (e.g., (1)) both for references and for numbering the different hypotheses. I suggest using a slightly different format for the numbering of the hypotheses so that the distinction becomes clearer (e.g., 1. or a).
- Abstract: “The first two findings were replicated: use of strategies involving manipulation of the memoranda (Grouping, Visualization, Association, Narrative and Other Strategy) was associated with superior word recall, and strategy changes decreased over the task blocks.” I found it a little bit surprising how the findings related to the third hypothesis were not specifically mentioned in the abstract.
- Incomplete replication of Hypothesis 1: In the abstract but also in other parts of the manuscript, it is stated that the first hypothesis was replicated—but then in the results, it becomes clear that it was only partially replicated. Only manipulation strategies increased task performance over no strategy use (while maintenance strategies did not result in a performance advantage). Please make this pattern of findings clearer in the abstract and also discussion for clarity (otherwise the statement that the first hypothesis was replicated may be misleading).
- In the results, it would have been very helpful to me if hypotheses were more clearly matched to the different analyses conducted (e.g., already in the title of the section). As it is written now, it is not always clear which analysis informs which hypothesis.
- All figures looked pixelated. Consider using a higher resolution.
- For Figure 5A, the very short y-axis was an unexpected choice for me and may make any changes look bigger than they are. Consider displaying a larger excerpt of the y-axis.
- This is just a note (because no changes should be made to the introduction at this point), but I thought that the explanations about the cognitive routine framework and the task demand hypothesis included in the discussion could have been included already in the introduction (where the information about this was more limited). Nothing to do at this point, but maybe something for the authors to take forward in different manuscripts.
Data:
- Could you come up with more intuitive names for the data files? For example, “Data_variables” could be named “Overview of variables (readme)”.
- For “Stragegy_most_used”, could you include which number maps onto which strategy in the “Data_variable” files?
- It would be helpful to provide additional information on the data file: Does it only include data from participants that were included? Is it the cleaned data set?
- For the code, I was wondering whether it would be helpful to include the actual results or a protocol of the analyses run (e.g., as part of a R Markdown file)? So that for example if the same code is run with a different R version and slightly different findings come out, such mismatches could be identified more quickly.
Source
© 2023 the Reviewer.
References
Matti, L., Daniel, F., Tilda, E., Liisa, R., Juha, S. 2024. Strategy use and its evolvement in word list learning: a replication study. Royal Society Open Science.
