Review of Meta-analysis of Antarctic phylogeography reveals strong sampling bias and critical knowledge gaps

Content of review 1, reviewed on March 21, 2022

This paper is an assessment of Antarctic biodiversity through genetic data. Using several thousand ITS and COI sequences, the authors identified broad spatial patterns in genetic diversity, important environmental drivers, and key sample rich areas/major sampling gaps. I really enjoyed this paper and I think it is a great initiative. The paper provides a much-needed assessment of Antarctic biodiversity that will certainly help target future research.

I was happy to see a macrogenetics paper where ITS and COI were primarily used to infer phylogenetic patterns, which plays well to their strengths as markers. I still do have some concerns with the analyses that should be addressed. However, because the objective of the paper is to be descriptive, I think most of them can be resolved without affecting the impact/novelty of the paper.

I was concerned that there was a clear division of marker type with taxonomy. I know this is impossible to avoid based on the breakdown of makers by taxa in the supplement, but I wondered if this could contribute to the differences in patterns seen in plants and animals. For example, could the increased structure observed in animals (line 320 page 21) be a product of COI being on the mitochondrion, which has a smaller effective population size, different mutation rate, etc., to nuclear ITS markers? There are previous studies that have shown very different patterns can be detected even within a single species depending on the marker used (mtDNA vs. Y markers Nietlisbach et al., 2012, Mol. Ecol.; many seabirds mstats vs. genome wide markers). Consequently, I would ask that the authors discuss these limitations when contrasting patterns across the major taxonomic groups to make sure the reader is aware this might drive some of the patterns observed. This limitation does help strengthen the authors’ argument for a need for genome-wide SNP data studies in the Antarctic.

Taxonomic merging is uneven across animals, plants, and lichens. Meaning that the pattern of differentiation and structure in animals could be caused by looking at different genera, which have a longer time to MRCA than species. This likely also explains why haplotypes are more often unique to a 'population' in animals (merged to genera) than in plants (species-level). It is important to note that the uneven merging may impact metrics based on uniqueness because you are much more likely to have unique haplotypes when comparing “populations” that are actually different species than when looking within an individual species. Much more time will have passed between species, allowing new haplotypes to arise through mutation, but also there is likely to be a lower level of geneflow. I would suggest correct for this by merging all species to genera and comparing the patterns relative to an unmerged/partially merged analyses. If it is impossible to analyse animals at the species level, this limit needs to be made clear when comparing across the groups and the term “species” used only when species-level analyses are done.

Sample size effects and effort bias – despite some discussion about this in the manuscript, I noticed some metrics that are very sensitive to sample size have been used to estimate genetic diversity. Diversity should be examined using rarified metrics or corrected for sample size to ensure you are not only picking up sampling effort effect. If that is not possible, I would suggest testing for a sample size effect.

Line 129-130 Here the authors state that the sequencing data used in this study was publicly available. In the past, some studies have limited public archiving to novel haplotypes only. Did the authors check for this to ensure that all of the haplotypes from a study were archived? I ask because this can mean you are not capturing “diversity” fully and you may get subtle biases in the patterns you see due to accumulation of diversity over space (and effort). See: Pas-Vinas et al., 2021 Ecology Letters.

With regards to the refugia hypothesis at the end of the manuscript - I don’t think you can draw these conclusions from this data set. The refugia completely align with the two regions with the highest sampling density in Figure 2. So, I think for most species the higher diversity you are capturing is the increase in diversity expected with an increased sample size/sampling effort. I would ask you tone this line of discussion down.

Within the Discussion, I think some key references are missing. There have been a few studies in the Arctic that have also shown strong sampling bias toward research stations (Metcalf et al., 2018 NEE). This really should be cited and the similarities between the polar environments made clear. There are also macrogenetics studies on non-Antarctic species that find similar environmental drivers (e.g. Kling and Ackerly, 2021 PNAS – Wind current; Manel et al., 2020 Nat. Comms – Slope). It would be good cite these and put your results in a wider macrogenetics context.

Figure comments:
Figure 1 – I think this figure is a little underutilized. Could you explain what the different shapes in each panel are? I think they are subpopulations, but what is a subpopulation, an ESU/MU/species? The figure also says this is only terrestrial data, my understanding is that is the focus of the manuscript but, at some point you mention marine data. I think you need to make it clearer that marine data were not analyzed here. Would it also be possible to use a mix of colours and shading to show the populations rather than just colour, because I am not sure this colour key will be visible to everyone and some of the shades are really hard to distinguish.

Figure 2 – These panels should be divided into data from the major groups in some way (animals, plants, lichen) because you analyze these groups separately. Could you also mark the location of the research stations on the map in figure 2 and denote the regions that are likely to be uninhabitable or glaciated to show the limited area where species can survive.

Figure 3 – red and green are not colour blind friendly - please swap to an accessible pallet. The differentiation relating to branch length is inconsistent across the species/genera, sometimes by an order of magnitude. The phylogenies should be standardized to have the same branch length equate to the same differentiation.

Figure 4 – I am questioning the impact of some of the black pictures used here (mostly for the plants). Perhaps these could be changed to actual pictures of the species? Same branch length issue for the phylogenies as in figure 3.

Figure 5 – see comments above about refugia and data bias

I think a map of major wind currents/environmental variables and diversity would be helpful to interpret the results of the plant GLM.

Line 561 – yes metadata sharing is an issue. I would recommend citing GEOME a new database specifically made to improve meta-data sharing https://geome-db.org/

Data availability – this was a bit unclear to me. Will the authors be providing a link to ncbi or supplementary table 1? I think the best open data practices here would be to make the sequence database you created accessible as a stand-alone platform like MacroPopGen (https://figshare.com/articles/dataset/MacroPopGen_Database_Geo-referenced_population-specific_microsatellite_data_across_the_American_continents/7207514/1 or as a downloadable fasta file.

Supplementary data – I am missing a clear table of contents for the supp matt and some of these tables are hard to interpret printed out (as many will read it), I think a little more information (i.e. clear legends and headers) is needed to guide the reader.

Overall, I really enjoyed the paper and think it will make a meaningful contribution to Antarctic biodiversity monitoring.

Source

Content of review 2, reviewed on July 19, 2022

I am reviewing this manuscript for the second time and find it much improved. The revised conclusions are well supported by the data and I remain confident that this work highlights important sampling biases and interesting biological trends. I have only one outstanding issue:

Supplement 2 contains some of the analyses that I initially requested e.g. comparing species vs. genus level divisions for analyses. This was initially missing from the files sent to me to review the article, while minor, this is the second time supplementary data have been missing from this manuscript as there were also files missing from the original submission. I do not know if this is an issue with the upload portal or the authors oversight. Nevertheless, the authors need to ensure that all the supplements are included in any final or accepted version otherwise the manuscript will be incomplete. I also think supplement 2 needs a key to make it readable. I could not follow the column names (I think the wording is partially inconsistent with the new terminology in the manuscript) and it took some time for me to find the analyses comparing genus vs. species level groupings. Collectively, this meant that I could not confidently confirm if I agree with the authors statements on line 360, " genus-level analysis provided similar results with species-level analyses (Supp. 2)". I would ask that the authors describe add some details about the similarities and differences in the main text between species and genus level analyses and add a key to make the supplement readable to others.

Source

References

P., L. X. P., A., D. G. A., S., P. W. S., R., P. L. R., I., F. C. I. 2022. Meta-analysis of Antarctic phylogeography reveals strong sampling bias and critical knowledge gaps. Ecography.

Pre-publication Review of

Meta-analysis of Antarctic phylogeography reveals strong sampling bias and critical knowledge gaps

Reviewed On March 21, 2022 , and July 19, 2022

Submitted to

Reviewed

Actions

Content of review 1, reviewed on March 21, 2022

Source

Content of review 2, reviewed on July 19, 2022

Source

References