Review of Evolution of reproductive structures for in-flight mating in thynnine wasps (Hymenoptera: Thynnidae: Thynninae)

Content of review 1, reviewed on August 24, 2020

Reviewer´s comments on JEB-2020-00321

This manuscript presents a study of the evolution of the morphology of the matching male and female reproductive structures in thynnine wasps, using 3D geometric morphometrics and phylogenetic comparative methods. This is indeed a remarkable biological system, with many opportunities for addressing interesting evolutionary questions. The study is generally well designed and thorough, providing a newly estimated phylogeny for the group of interest, and putting forward a comprehensive 3D morphological dataset using modern data-acquisition techniques. The authors have combined those (phylogeny and morphological dataset) using cutting-edge phylogenetic comparative methods to address a series of evolutionary questions that can be interesting for a wide public.
All this said, I have some very serious concerns regarding the methods implemented for GM data acquisition and evolutionary shape analyses, which at present precludes me from recommending the manuscript for publication. As I realize that many of these might be a lack of accurate description, I would recommend that the authors address them into detail such that the manuscript can be more easily understood and assessed for publication. I provide a list of these concerns below, separated into major and minor points. Major points focus mostly on the structure of the introduction, and the methods and results obtained. At present I rescind from commenting into detail on the discussion, as I am not convinced of the methods used for data analyses. However, as I said above, I think this is an interesting system and the data at hand can provide very interesting evolutionary insights, so I´m sure once the analytical part of the manuscript is cleaned up and better presented, there will be space for an interesting discussion of the results, with relevance for evolutionary biologists.

MAJOR POINTS
INTRODUCTION – While the introduction provides all the necessary information for presenting the interest of this biological system, I felt that it is not very tightly organised and is rather quite focused on the organism, instead of using the organism to bring about interesting evolutionary questions. I believe that this can be easily fixed by re-reading the introduction with a clean eye and trying to better order the ideas such that the evolutionary questions and topics are highlighted, instead of dwelling too much into the organism-specific details.
For instance, the transition from the first (evolution of reproductive systems in insects) to the second (winglessness in the hymenoptera) paragraph is very abrupt, and one has to read almost to the end of the second paragraph and through the third to understand how this information is relevant to reproductive system evolution. Similarly, the paragraph starting in line 75 (regarding the possible role of allometry) seems out of order here, as the next (starting line 87) goes back to the specifics of the reproductive strategy of thynnines). Furthermore, at several instances, the authors use strong, specific terms that refer to very specific evolutionary processes, without clearly linking them to any of the specific information that they present for their system (e.g. sexual conflict, female choice, integration etc, quite noticeable in the paragraph starting in line 97). I would recommend that the authors rethink the order and way of presenting the specifics of their system to put this information in focus based on its more general evolutionary interest. That is, try to spin the wasp specifics to tell us why and how this system allows you to address (which) important evolutionary questions. The information is mostly already there, but the way it is presented is not clear and convincing for the general evolutionary public.

Finally, although this might be a more minor point, please note that lines 121-125 at the end of the introduction are a huge overstatement of the novelty of this study. Other researchers have used, in very creative and novel ways, 3D morphometrics to study the reproductive structures of insects for over a decade. The work of the group of Mark McPeek is one obvious example, and I was surprised to see that not a single article from their studies – which are very relevant to the topic treated here – are cited. I strongly recommend that the authors lower their tone in this section, and that they review more comprehensively the studies published in insects using 3D morphometrics, despite the fact that the detailed techniques used are different. See, e.g.,
McPeek MA, Shen L, Torrey JZ, Farid H. 2008 The tempo and mode of three-dimensional morphological evolution in male reproductive structures. Am. Nat. 171, E158-178. (doi:10.1086/587076)
McPeek MA, Shen L, Farid H. 2009 The correlated evolution of three-dimensional reproductive structures between male and female damselflies. Evolution 63, 73–83. (doi:10.1111/j.1558-5646.2008.00527.x)

METHODS – I am not a specialist in phylogenetic inference techniques (detailed in a Supp. Mat.), so I will not comment on this part. I rather focus on morphological data and analysis in a comparative context. In these sections, I have several methodological doubts which do not allow me to assess the quality of the results, as I believe some of the methods used need to be better explained for the reader to understand what has been done and why. Additionally, some reordering would be necessary to help the reader follow the methods used.

First, I am not clear about the usefulness of the linear measurements dataset presented in lines 178-183. Unless I missed something, this is merely used to present some basic statistics in the results, not linked at all to any of the evolutionary questions presented in the introduction. These data are used for allometry analyses, but which are not clearly presented anywhere in the methods, and are not linked to the questions (the part on Evolutionary Allometry starting in line 314 only mentions GM-derived data). By looking at Supp. Table 6, I get the impression that the authors examined all possible allometric models their dataset may represent, which is neither hypothesis-driven nor statistically robust (as a correction for multiple comparisons should be implemented in this case). I would recommend removing this part, and all linear data, from both the methods and the results. Please see also below for an additional important comment on allometric analyses.

Regarding GM data acquisition and analyses, the following important points need to be amended to provide a better description of the methods used, clarify on their usefulness and allow to assess their accuracy:
- lines 210 – 214: I do not follow the logic of using two different approaches for the different structures studied. If you consider that data redundancy on symmetrical structures is an issue (which I don´t really agree with), you should follow the same rule for all the structures, I don´t really understand the argument of line 214 of “due to the lack of possibilities for primary landmarks”. For the sake of scientific discussion, also note that by digitizing only a half-structure, you may get artificial signal of shape variation across the mid-plane, because this is not anchored by the second part of the structure, as it should. To my view, the best approach would have been to digitize the entire structures in all cases and then remove asymmetric shape variation by mirroring the structures across their mid-plane to obtain symmetrical structures (with numerical redundancy, which is however not an issue for statistical analyses if using permutation-based inferences as those implemented in geomorph). At any rate, this is not a strong opinion, as different researchers take different approaches, and mainly because re-digitizing the data would be quite time consuming at this stage. However, I strongly believe that the authors should use the same approach for all the structures they are studying. Comparing e.g. the rates of evolution of a half-structure (paramere and pygidium) to those of a whole-structure (hypopygium) is extremely questionable, and this is not limited to rates.

lines 219 – 222: please mention specific numbers of landmarks and semilandmarks used on each structure and point to the figure where the reader can see the distribution of these landmarks (i.e. Fig. 3, which should certainly appear earlier on, before the results on the phylogeny and PCAs presented in Fig. 2).
lines 225 – 228: If I understand correctly, the landmarks on each structure (hypopygium, parameres and pygidium) were first superimposed and then combined. Please modify the wording to explicitly express this, the expression used is confusing.
lines 228 – 237: I am sorry but I can follow neither the reasoning nor the exact procedure used to combine the datasets mentioned here using ShapeRotator, especially because ShapeRotator does nothing to influence the relative size of the structures combined (see below for a caveat on a different approach that does). First, I do not understand the argument of lines 230 – 232: tempo and mode of evolution, as well as disparity could have been perfectly examined on the original structures. In fact, evolutionary rates ARE phylogeny-standardized variances, so any data procedure that influences disparity also influences rates and vice versa. Second, I do not understand to what angle and how (procedurally) the structures were combined. This needs to the very least some clarification and certainly a better justification, as I do not really see that the combination of the structures is necessary for any of the downstream analyses applied. Also, the authors might want to have a look at the function combine.subsets and the references therein, most importantly Collyer, Davis, and Adams. 2020. Making heads or tails of combined landmark configurations in geometric morphometric data. Evolutionary Biology 47:193-205 with respect to the methods and caveats of combining landmark configurations and the possible effects of size standardization when doing so. However, as I said before, this is not the case for the procedures of ShapeRotator, which just fix a specific articulated angle. While I could in theory envision using this approach for “putting together” two different structures, and that would be an interesting solution when one wants to look at variation in two shapes together, the authors would need to do a much better job in describing the procedures implemented and justifying them.
line 271 (and again in line 285): “all 23 principal components from all five shape datasets” – first, what components are these? Reordering is needed, as the PCA is only explained later on. Second, I cannot imagine how all the landmarks digitized yield 23 PCs. Third, it is not clear which PCA this refers to, as the authors later mention several PCAs (lines 292 – 297). This needs clarification (and note that since the different structures and datasets have different numbers of landmarks it is impossible that they all yield 23 PCs).

RESULTS: As is the case for the presentation of the methods, several points in the results need clarification before the manuscript can be properly assessed. These include the following:
- Paragraph starting in line 386 (on Evolutionary Allometry): I strongly question the use of an alpha = 0.1 applied by the authors exclusively to the allometry analyses for assessing significance. Throughout, they use the usual cut-point of p<0.05, but for allometric analysis this is raised to 0.1, with no obvious justification (which to my view is inexistent, other than making some analyses significant). This needs to be corrected and all relevant statements modified. Under the commonly accepted threshold of p<0.05, only the first allometric test in Supp. Table 6 is significant (although I am not sure what “ALL_shape” corresponds to here).

Since we are at it, I feel like the models in Supp. Table 6 are written the other way around, allometry is represented by the model shape ~ size. If this was also implemented as such analytically, then all of these analyses need to be rerun in the correct order.

DISCUSSION: As I said, I´d rather not comment too much on the discussion right now, as I feel like the analytical part of the manuscript needs to be clarified and maybe corrected first, which may change the results altogether. However, the following somewhat incongruent points caught my eye, and they might be important for providing a more robust version in the future, for this or another journal:
- lines 433 – 435: This claimed congruence is false. If your data fit the BM model, you would see K values close to 1. In fact, as you comment in the previous paragraph, values of K lower than 1 point to some selective mechanism, and as such the OU model would be expected to better fit the data. However, please do take into account that model comparison methods as those implemented in mvMORPH are known to have very poor statistical performance (i.e. see Adams, D.C., and M.L. Collyer. 2018. Multivariate phylogenetic comparative methods: Evaluations, comparisons, and recommendations. Systematic Biology 67: 14-31; Adams, D.C., and M.L. Collyer. 2019. Phylogenetic Comparative Methods and the Evolution of Multivariate Phenotypes. Annual Review of Ecology, Evolution, and Systematics 50: 405-425). While better methods are not at present available, and as such these analyses are the only ones available in our toolkit, I believe the authors need to at least take this methodological limitation into account and comment on it. Note, also, that a deviation from the classical BM model as captured by low K values might also be due to other modes of evolution, i.e. under a BM with evolutionary rates that vary across time, which seems possible for this dataset based on the dtt results. However, this is a model not available at present for multivariate data, so this hypothesis cannot be at present tested.

paragraph on allometry (starting in line 482): here the authors treat some very interesting possible sources of shape-size variation across structures. This is a very powerful evolutionary set up for these analyses, which should have been presented earlier on in the manuscript, i.e. when presenting the hypotheses to be tested and the analyses conducted. I find this section to be one of the most robust and interesting evolutionary results of the study, and at present it is lost because it is not accurately and clearly presented in the introduction and methods. Not being a specialist on thynnines, only when I got to this part did I realize the importance of these analyses!
paragraph on integration (starting line 511): I wonder to what extent the observed allometric patterns influence integration? That is, is allometry and size co-vaciation between structures the main mechanism through which these are integrated, or is shape reinforcing this integration? This could be easily tested by also applying integration analyses to size-corrected shape data (rather than to the different ShapeRotator and individual datasets, which, as I said before, do not convince me as alternatives).
May be a matter of taste, but I found the section of “future directions” to be a bit too lengthy for a journal manuscript. This looks like what one might include in a thesis, not a scientific publication. I would recommend reducing this part.

MINOR POINTS
Throughout the manuscript, please review and amend the use of parentheses around in-text citations, these are very poorly used, which made the reading of the manuscript terribly cumbersome!

line 36: I think the term “morphological integration” merits a definition at its first mention, to make the term more easily comprehensible to the general evolutionary reader.

lines 226 – 228 vs 296 – 297: it would be useful that you always mention the datasets (if maintained) in the same order

line 254: please specify which algorithm was used to obtain the ultrametric tree (nnls or extend)

lines 281 – 284: please specify which measure was used to quantify morphological disparity here.

lines 351 – 357: The tests for phylogenetic signal do not test the null hypothesis of evolution under BM. A better reading of the associated articles is necessary, to help the authors correct their wording here. Also, since CS is a univariate trait it is odd to use the notation Kmult (note that for univariate traits Kmult of Adams 2014a converges back to K of Blomberg et al. 2003, accurately calculated in geomorph; I´m just referring to the notation used here).

lines 359 – 361: It is the models fitting (better or worse) the data, not the other way around. Please reword.

line 367: Again, odd wording, please amend, the structures do not reject models or hypotheses.

lines 370 – 371: Idem, plots do not explain variance, PCs do.

lines 384 – 385: I assume this refers to rates, not the shapes themselves. Please reword.

Source

Content of review 2, reviewed on December 30, 2020

This is a revised version of a manuscript I previously reviewed for JEB.

The authors have provided apparently detailed answers to the comments previously provided by myself and by another reviewer. I applaud the inclusion of clear, testable hypotheses. Most minor comments also seem to have been incorporated. However, I believe that the authors dismissed several of the comments previously provided by both reviewers. Due to that, the issues listed below remain. As such, I cannot recommend the publication of this manuscript.

Despite substantially restructuring the introduction, this section is still very largely focused on the particular system under examination (thynnine wasps). Despite the very useful comments provided by the second reviewer for connecting this system to specific concepts and mechanisms of sexual and natural selection, such connections are still missing, failing to provide a wide evolutionary frame for this study (which is definitely possible).
With respect to linear measurements, I understand they might be of interest, as the authors explain in their response, but I see no hypotheses or explanations or setting up of this in the introduction. The comparison of different body parts only appears in the Methods section (lines 305-306), not being at all mentioned in the introduction, which rather focuses on the mating and binding structures. Also note that allometry analyses are mentioned (out of context) for the first time in line 300.
The answer to my comment about using a bilateral configuration for the hypopygium (point 4 in the authors´) numbering remains. First, since they are also using curve semilandmarks, there is no such limitation in the number of landmarks. Second, I am not sure what the line numbering in their answer refers to (lines 179 – 189 are in the introduction in the version I got from the editorial manager). Finally, and most importantly, obtaining a half-structure from these data is really easy, so I do not see why be so reluctant in trying it. The main important point here is not technical, but rather analytical: comparing evolutionary patterns and rates of a whole-structure to a half-structure can be dubious, and I really see no justification for doing so.
The author´s answer to my comment about combining the structures (point 7) is also not satisfactory, as the main technical issue of HOW THE ANGLE FOR COMBINING THE STRUCTURES WAS DEFINED has not been addressed. In fact, I never said this approach was not correct, and I think it is one of the most creative data treatments in this study. But it is not accurately explained in the corresponding section (lines 341 – 344 of the version I have of the manuscript), and it has to be! Also, I would like to see some support for the author´s statement on “otherwise the rates are not comparable”. I am not aware of any scientific publication supporting this statement. On the contrary, given that these are GM-derived shape variables, they are all standardized Procrustes residuals, therefore they are equally unitless and dimensionless, and they can be compared. In fact, the original publication that put forward the method for performing such rate comparisons (Denton & Adams 2015, Evolution 69: 2425-2440) mentions such a possibility and provides the adjustment needed to take into account differences in the number of landmarks between separately superimposed structures (see page 2431 of that article).
The new explanation of K as a measure of phylogenetic signal is still not accurate, as the comparisons conducted have nothing to do with random pairs of species on the phylogeny (lines 370-372).
Are by coincidence 12 PCs always retrieved as explaining 95% of the variance across all five datasets? That´s to the least curious (line 379).

Source

References

L., S. T., Marta, V., J., T. N., Rod, P. 2021. Evolution of reproductive structures for in-flight mating in thynnine wasps (Hymenoptera: Thynnidae: Thynninae). Journal of Evolutionary Biology.

Pre-publication Review of

Evolution of reproductive structures for in-flight mating in thynnine wasps (Hymenoptera: Thynnidae: Thynninae)

Reviewed On August 24, 2020 , and December 30, 2020

Submitted to

Reviewed

Actions

Content of review 1, reviewed on August 24, 2020

Source

Content of review 2, reviewed on December 30, 2020

Source

References