Content of review 1, reviewed on January 14, 2013

In this paper, the authors present an interesting and timely experiment. They demonstrate the exciting potential for direct (without PCR) sequencing on the Illumina platform to reveal species composition and potentially even relative abundance from bulk (environmental, mixed species) samples. While the authors acknowledge that further work is needed to develop and test the approach, the study is encouraging and should be of broad interest to ecologists and biodiversity scientists. While the economics of the method are not discussed, given the trajectory of sequencing costs, the authors’ optimism that such techniques might have broad applications seems reasonable.

Minor Essential Revisions

The “Availability of Supporting Data” section should be stronger. It is not clear where exactly the data and metadata are deposited beyond a general reference to the raw sequences being “available at Gigascience”. Furthermore, did the study follow GSC MIxS standards? Also, are there biomaterials (DNA extracts, tissues, or specimens) available via a natural history museum or bioarchive?

Discretionary Revisions

In general, the paper is well written and the results clearly presented. A few grammatical corrections and suggestions are given in the attached file.

I would suggest clearer distinction of the two samples presented in the study. Most of the results and all of the important conclusions refer to analysis of “Sample 2” alone. It appears that “Sample 1” was largely a test run and although the results of both samples should be recorded, the text would read better if attention were firmly focused on “Sample 2”. Reference to Sample 1 is rather distracting and could confuse readers. Perhaps rename Sample 2 simply “the Sample” and refer to Sample 1 (if necessary at all in the main text) as a “Preliminary Study”. Furthermore, it might be best to avoid comparing the samples - the differences might be intriguing but that is not the point of this paper. (These observations could be the basis for a future ecological study with appropriate experimental design, but should not distract from the main message of this paper.)

It would help readers if the authors recognized/explained DNA barcoding earlier on and were explicit in outlining from the beginning how they generated the reference dataset (i.e., DNA barcoding of all individuals - through voucher-based Sanger sequencing).

Level of interest: An article of outstanding merit and interest in its field

Quality of written English: Needs some language corrections before being published

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests

Source

    © 2013 the Reviewer (CC-BY 4.0 - source).