Content of review 1, reviewed on April 18, 2013

Basic reporting

The article appears to adhere to the PeerJ policies. The introduction is currently rather under-referenced, there is only a single reference on the first page! I would appreciate it if the authors would provide the necessary references for key concepts in the introduction; relevant metagenomics review articles, examples of case studies in the various fields mentioned, etc. Given the manuscript describes a pipeline for metagenomic sequence analysis, I would like to see more space devoted in the introduction to outlining the major methods for analysing such data (similarity-based, phylogenetic and composition based methods) and software implementations of these methods. This should perhaps be prioritised over some of the earlier introduction which could be edited for concision. The manuscript sometimes slips into colloquial use of language, e.g. 'more vexing'.

Minor items

Figure 1 is a little confusing in that it looks like the labels "rRNA" and "protein" relate to the input, whereas my impression is that the input is all metagenomics DNA which is translated.

I think it is not strictly true to say "First, metagenomic samples reflect entire communities of organisms, unlike “traditional” genome sequencing which reflects a single individual of a population." as whole-genome shotgun sequencing often sequences a population that may or may not be clonal, depending on the way the sample was prepared (e.g. single cell, single colony, multiple colonies, multiple cells) etc.

"Loss of linkage information occur in two ways: during sample extraction and size selection of fragments for sequencing." would be more accurate to say during fragmentation, as size selection is an optional step.

In Figure 2 the PCA plots could use a legend delineating the meaning of the colours.

Experimental design

This is my first review for PeerJ and I am not sure how the guidelines should be interpreted for journal articles describing software. However, given the paper is structured into the standard Introduction/Methods/Results/Discussion format, I think the manuscript would benefit from some restructuring to ensure that methods and results are correctly placed. It may be helpful, as per the editorial guidelines, to explicitly state a question or set of questions which are addressed in the manuscript. For example, one approach might be to compare to existing methods, with the aim of providing a rationale for why Phylosift is superior in some regards to other solutions, either in terms of accuracy of phylogenetic placement, running time, additional functionality, etc. A table with some comparisons to other commonly used software would be really helpful. Personally I am interested in comparisons with approaches such as MEGAN (with LCA), Metaphlan, Phylopithia, ribosomal MLST and simpler approaches such as BLAST-best-hit.

Validity of the findings

I downloaded the software and successfully ran it both on the Mac and on the PC, with a few minor problems (for example, the shell script to launch Phylosift didn't work on the Mac, and I had to call it from the bin/) directory. I liked the selection of default reports including the taxonomic breakdowns and the Krona images which were intuitive to use. The results on my yet samples were promising, although I found that species and subspecies assignments were not always accurate. I appreciated the statistical information which i could use to interpret the confidence in these results. The software was rather slow to run on 1000 and 10,000 reads. I wonder how well this approach will scale. I appreciate the suggestion of the use of contigs to speed the process up, although I worry about the loss of abundance information, and I also worry about the formation of chimeric, 'consensus' contigs from mixtures that obscure rather than improve phylogenetic signal. I did find a very erroneous assignment of a read from Bacteroides hitting instead Vibrio cholera with strong statistical report which I have reported to the authors who are looking into it. On the basis of my findings I would have really liked to have seen a comparison of taxon assignment accuracy compared to other pipelines, perhaps trilled on mock community data. I was impressed by the level of documentation, the Github repository for the software and evidence of strong support from the developers.

Comments for the author

My major recommendation is that manuscript is improved for readability, by restructuring by reference to specific question or questions, and ensuring the correct separation of methods and results.

Source

    © 2013 the Reviewer (CC BY 3.0 - source).

Content of review 2, reviewed on December 17, 2013

Basic reporting

I am grateful for the authors' considered responses to my questions and comments and am happy with the revised manuscript.

Experimental design

No comments

Validity of the findings

No comments

Source

    © 2013 the Reviewer (CC BY 3.0 - source).