Content of review 1, reviewed on September 22, 2015

This manuscript describes the current status of mitochondrial metagenomics (MMG) and proposes the use of the technique to better understand complex communities from both ecological and evolutionary perspectives. The manuscript is well written and well founded in previous research in the area (much of which the authors have been involved in themselves). It collects data from the studies of e.g. insect biodiversity using MMG and presents it in a broader context. I have a large bunch of minor comments, but I am completely assured that the authors can respond to them without too much effort. Apart from a few vague sentences, and a few dubious claims that I think should be adjusted, I think the manuscript could be published in close to its current form, given that my questions are appropriately answered (or that my comments are proven invalid by the authors in cases where I am asking about vague passages).

Minor comments (note that line numbers did not align perfectly to the actual written lines in my PDF):


Abstract:
- I think the term "mitogenomics" for MMG is rather badly chosen, since it would perfectly apply also to the study of single mitochondrial genomes, and I would advice against using it in the abstract.
- "Mitochondrial genomes therefore show great promise as a common marker …": A marker common to what? Please be more specific.


Background
- Page 3, line ~29, "Thus, there is a great need for unifying these various subfields, …": The observations described in the paragraph preceding this sentence do not constitute reasons to unify these subdisciplines. Since this is the central argument to perform MMG, I would like the authors to better specify WHY it is important to merge these subfields - as it stands now I have, as a reader, no reason to believe that these fields would get along fine by themselves, without any unification. Remember, the fact that people are not currently solving problems together is not a reason to unify their approaches per se.
- Page 3, line ~37: I suggest changing "temporal change" to "temporal changes".
- Page 3, line ~54: I suggest changing "error" to "errors".
- Page 4, line ~8: As mentioned in my comments on the abstract, I think that "mitogenomics" is a slightly misleading term, and although it has been used before to described MMG, I think that the authors should point out that it also could refer to the study of individual mitochondrial genomes.


A general framework for PCR-free biodiversity studies using mitogenomes
- Page 4, line ~47: I suggest changing "of all parts" to "from all parts".
- Page 5, the "voucher specimens" paragraph, lines ˙~26-47: This entire paragraph is rather messy, and hard to follow. My first thought upon reading it was: Why are you not simply barcoding your mitogenome samples before sequencing them, using the regular Illumina sample tagging protocol? That would, as far as I understand it, take away the need for baiting altogether. In addition, since this requires the specimen from different species to be extracted separately, I can't see why it would be a problem to add the Illumina barcodes. Then assembly could be carried out on the separate samples, without the need for multi-species assembly followed by bait identification. I realize that this might 1) be what you are describing, or 2) be infeasible for some other reason. In the former case, I think that the paragraph needs heavy editing, since I did not understand it this way. In the latter case, this infeasibility needs to be better described.
- Page 5-6, line ~60 - line ~8: This sentence is very long, and because of this hard to follow. Please consider dividing it into two or three shorter sentences.
- Page 6, line ~10: I think the word "powerful" is a bit imprecise. Could you better describe specifically what makes the mitogenome placement better than cox1?
- Page 6, line ~12-15: "This phylogenetic assignment method has successfully obtained family-level identifications and contributed novel sequences to a growing mitochondrial tree-of-life": I doubt that the assignment method per se has contributed novel sequences - that must have been done using e.g. sequencing efforts in relation to the assignment. Please repharse.
- Page 6, line ~16: Consider inserting a "the" here: "… in particular with THE denser taxon sampling …"
- Page 6, line ~23: "species lists are incomplete". But this will be the case also for the voucher approach described earlier, maybe even more so. It might be worth mentioning, so that readers don't get the impression that the voucher approach leads to more complete taxon sampling, which often would not be the case.
- Page 6, line ~25-28: "…to assume that even overall rare species will be abundant in at least some samples, thus bulk sample assemblies from multiple sites will end up generating a mostly complete reference dataset": To me, this sounds rather speculative, unless you have an extensive diversity of sampled sites. At least in the microbial world, some species seem to be more or less ubiquitously low-abundant, and although one can't assume that this knowledge is directly translatable to e.g. insect communities, hoping for that rare species are common elsewhere seems a somewhat naïve approach, and I suggest a rewrite of this sentence.
- Page 6, line ~36: I suggest changing "have this reference" to "have a reference".
- Page 6, line ~50: "…high sensitivity for species presence/absence…": High sensitivity for presence yes, but deducing absence requires rather extreme sequencing depths. I would be more careful with statements like this, since sequencing depth is so tightly entwined with detection of rare species.
- Page 6, line ~57-58: "…requires much less sequencing depth than the original assembly of reference mitogenomes; the coverage of mitogenomes required for secure detection of species…": See my comment above, detecting is easy - but what does a non-detect mean? Not necessarily absence!
- Page 7, line ~37-38: "mapping to mitogenomes potentially provides a large enough target to quantify biomass with good accuracy even at low sequencing depth." Is this really true? Doesn't the number of mitochondria vary quite substantially between species and even individuals? Is it actually feasible to infer biomass in this way? This is discussed briefly on page 14, but I think the authors promise more than they can hold in terms of biomass estimation, particularity with the wording "good accuracy".


Methodological issues
- Throughout this entire section, it is unclear if the authors refer to "high-coverage contigs" stemming only from mitochondria, or high-coverage contigs coming from anywhere in the genomes (or if they refer to both but in difference places in the text). This needs to be better clarified.
- The entire assembly passage is rather long, and have many short "side-tracks". I think that it is written in an appropriate way, but I would ask the authors to consider making a figure outlining the assembly process, somewhat like this:
DNA Extraction -> BLAST-based filtering -> Assembly using XXX -> Careful curation (maybe with reassembly) -> Reference mapping/Gene calling -> Chimera detection -> Mapping of reads back to contigs
This would make it easier to pick up the thread throughout the text. This is just a suggestion, but I think it would improve this part of the manuscript substantially.
- Page 7, line ~49-50: I suggest replacing "with the contigs" with "cut-off".
- Page 7, line ~54-59: "In addition, there were differences in the taxa studied, which may be affected by taxon-specific differences in the proportion of mitochondrial DNA relative to the nuclear (including symbiont) fraction.": I did not understand this sentence. In what terms were there differences between the taxa? Could you please clarify it?
- Page 8, line ~18: "careful curation": What kind of curation? Please be specific.
- Page 8, line ~53: "sequencing volume": I think the word "volume" here is a bad choice, since it could refer to the volume of DNA used for the sequencing. I suggest using e.g. "effort" or "depth" instead.
- Page 9, line ~23-29: "The filtering can be run via low-stringency (e.g. 1e-5) BLAST searches against a growing database of mitogenomes. Applying such a loose filter retained approximately 11% of all reads [14] on which to perform the assembly, which greatly speeds up assembly, compared to a full dataset.": To me, using BLAST for this sounds terribly slow. Does this really speed up assembly if you also include the time taken for the BLAST-filtering? Wouldn't some other k-mer filtering do this step more efficiently?
- Page 9, line ~33: "…most successfully used." Most successful by what criteria? Please specify.
- Page 9, line ~40: "TGICL": What does this abbreviation stand for? I think it has not been explained previously, and can't be said to be common knowledge. Or is it the name of a software tool? If so, it should be properly referenced.
- Page 9, line ~47-48: "whilst the most adequate assembler for MMG among existing software packages have been established…": OK, so if the most adequate assembler has been established, could you please indicate which one it is?
- Page 9, line ~55-59: "Adjustments to existing procedures in particular need to allow the extension of the assembly path in two directions and permit a connection of the ends to reflect the circular nature of the mitogenome.": My experience of assembly from metagenomes is that this is implemented pretty well in modern assemblers. However, they seldom indicate which contigs that could be circular in an easy-to-obtain manner, requiring extra steps to identify those (using e.g. paired-end information). But maybe this is what the authors mean?
- Page 10, line ~26-30: "…by automated recognition of tRNA genes and the extraction of intervening regions, which are then sorted into genes by mapping against a known reference…": What's the benefit of this compared to just using reference sequences or gene calling? Could you elaborate on that for just one sentence?
- Page 10, line ~45-46: I suggest a division of this sentence "These chimeras can be detected against known full or partial mitogenomes, where these are available, and by confirming that taxonomic assignments are consistent across the different genes in the assembly [15], although this latter method is still limited by highly uneven taxonomic coverage in public databases across different mitochondrial genes [46]." -----> "These chimeras can be detected against known full or partial mitogenomes, where these are available, and by confirming that taxonomic assignments are consistent across the different genes in the assembly [15]. The latter method is however still limited by highly uneven taxonomic coverage in public databases across different mitochondrial genes [46]."


The use of mitochondrial metagenomics in biodiversity studies
- In several places in this section the authors make the rather naïve claim that we can study "taxa without the need to involve taxonomic experts". While this is technically true, this line is almost certain to lead the field in the wrong direction in the wrong run, causing thousands of dubious mitochondrial sequence entries with improper taxonomic information to be deposited in the reference databases. Although taxonomic expertise is increasingly moving towards being DNA-based, the claim that taxonomic experts are not needed is indicative of an approach to DNA-based taxonomy that inevitably will lead to problems in the future. Compare to the situation with fungi today, for example in:
Nilsson RH, et al. 2006. Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS ONE 1: e59.
Nilsson RH, et al. 2012. Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences. MycoKeys 4: 37-63
Bidartondo M, et al. 2008. Preserving accuracy in GenBank. Science 319: 1616
All in all, I think that the authors should think twice before claiming that we do not need taxonomic experts, and make sure to be precise: we might need less of morphological experts, but knowledge of your sampled organisms is still crucial to understand the underlying biology. (And honestly, I think what the authors refer to is the morphological expertise, but they could be more clear about that.)
Examples of where this less careful wording is used in this section are:
Page 11, line ~22-24, 49 and 52. Page 12, line ~12-13.

- Page 13, line ~43-45: "…processing the large numbers of bees that are expected to be generated in national pollinator monitoring programs": Will these programs actually generate bees? Otherwise, I suggest rephrasing this.


Future prospects and next steps
- Page 13, line ~57: "[44]": is this really the correct reference for this statement?
- Page 14, line ~16: The Liu et al. paper would be very useful to be able to better identify, even if it is not yet accepted. Maybe you could provide a title for future reference?
- Page 14, line ~43: Please insert the word "DNA": "…nuclear vs. mitochondrial DNA (due…"
- Page 14, line ~57: Replace the word "molecule" with "sequence", as I assume this happens in silico.
- Page 15, line ~6: I suggest removing the "reference" to "(Gómez-Rodríguez et al., in preparation)" since it describes future work and does not help the reader to better assess the status of this problem.


Conclusions
- Page 15, line ~20: I suggest replacing "is establishing" with "has established".
- Page 15, line ~26-29: I suggest the following restructuring of this sentence: "Although mitochondrial genomes make up only a small proportion of the total sequence reads, they are the most useful marker to be extracted from these mixtures."
- Page 15, line ~44-48: "it will be straightforward to identify species in any specimen sample by shotgun sequencing and simple similarity searches against this growing database". I do not fully agree that this is "straightforward". See my comment on the "mitochondrial metagenomics in biodiversity studies" section above. Larger databases mean more noise, which means that you need to be a taxonomic expert to appropriately interpret the results. Once again, this problem is very evident in fungi, and will likely be also in MMG:
Nilsson RH, et al. 2012. Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences. MycoKeys 4: 37-63

Figure 1

- Unfortunatlely, I find this figure quite messy and not explaining/complementing the text very well. Would it be possible to not combine the "read-based" box and "bulk MMG" box? Also, it is very unclear to me what the arrows refer to. Please revise this figure for clarity.

Figure 2

- It is unclear in this figure text (as for the underlying text in the manuscript) if this data refers to only mitochondrial contigs or all contigs from the metagenomic assembly.
- Although not super-important because of their similarity, which assembler data for 2d comes from which assembler?
- I suggest removing the "s" from "samples" in "mixed samples of rainforest beetles".

Level of interest
Please indicate how interesting you found the manuscript:
An article whose findings are important to those with closely related research interests

Quality of written English
Please indicate the quality of language in the manuscript:
Acceptable

Declaration of competing interests

Please complete a declaration of competing interests, considering the following questions:

1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below.

If your reply is yes to any, please give details below.
I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

 


Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1126582142200970_comment.pdf)

Source

    © 2015 the Reviewer (CC BY 4.0).

Content of review 2, reviewed on January 09, 2016

I have reviewed a previous version of this manuscript, and the revised version has improved on a manuscript already of high standard. The authors have dealt with most of my concerns by clarifications or minor changes to the text. I still think that the authors handle the issue of database quality somewhat lightheartedly, but I concede that this is a matter of taste rather than scientific quality. Thus, I don't think that the authors need to address this further; it is obvious from their responses that they have at least given the issues some thought.

I have two passages where the text still does not fully convince me:

1) The locally rare species (line 155-159) are still unlikely to be identified using bulk MMG, even if taken at multiple sites. I doubt that you will get a complete picture of the studied community unless you have A LOT of samples.

2) The new example given at lines 227-229 seems a bit out-of-context to me, particularly as the number of mitogenomes per Gb is NOT improved compared to the number given at lines 223-224. Probably this last example could be skipped.

None of these two comments really needs any action, in my opinion. But the authors might want to consider them if they choose to revise the manuscript further before publication.

Level of interest
Please indicate how interesting you found the manuscript:
An article of importance in its field

Quality of written English
Please indicate the quality of language in the manuscript:
Acceptable


Declaration of competing interests

Please complete a declaration of competing interests, considering the following questions:

1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below.

If your reply is yes to any, please give details below.
I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to review: (http://www.gigasciencejournal.com/imedia/2017947769200970_comment.pdf)

 


The reviewed version of the manuscript can be seen here:

All revised versions are also available:

Source

    © 2016 the Reviewer (CC BY 4.0 - source).

References

    Alex, C., W., Y. D., Xin, Z., P., V. A. 2016. Mitochondrial metagenomics: letting the genes out of the bottle. GigaScience.