Content of review 1, reviewed on January 20, 2015

Maudhoo and Norgren report the generation, analysis and deposition of deep sequencing data derived from mRNA from four primary chimpanzee cell lines. This represents a significant addition to the amount of currently available chimpanzee transcriptome data and may prove valuable for completing annotation of the genome. I do, however, have some concerns about the initial manuscript:

  • Major Compulsory Revisions

  • The claim that the new data allowed identification of "1,906 genes that were not previously identified in the Ensembl annotation of the chimpanzee genome" is not clearly justified. It appears to be based on comparing a) the set of gene symbols found by aligning assembled transcripts to human RefSeq mRNAs with b) the set of gene symbols present in the Ensembl gene annotation. Unfortunately, "gene symbol" is not a stable identifier that can be used to unambiguously map genes between arbitrary versions of RefSeq and Ensembl. A more rigorous method for finding novel genes would be to align the assembled chimpanzee transcripts to the chimpanzee genome sequence and then check for the absence of an overlapping Ensembl gene annotation. Care should be taken to avoid artifacts due to ambiguous alignments and some representative examples or novel genes should be provided.

  • The method used to filter reads prior to de novo assembly (Transcriptome assembly - first paragraph) does not appear to be a standard approach used in the field. It is not clear to me how the reported filtering can "eliminate genomic contamination". The wording also suggests that reads were NOT filtered by quality scores, which is unusual. Unless these approaches can be clearly justified, it would be preferable to see the result of processing the data with a standard open source pipeline.

  • Minor Essential Revisions

  • Details on RNA-seq library construction method should be provided.

  • The manuscript states that there are "17,030 unique genes" in the Ensembl CHIMP 2.1.4 annotation, but according to the Ensembl website there are currently 18,759 "coding genes" in this annotation version. This apparent discrepancy should be resolved.

  • The Figure Legend refers to "marmoset" genes instead of chimpanzee.

Discretionary Revisions

  1. In 2nd paragraph of the Background section, it would be appropriate to also acknowledge the work that WUGSC and their collaborators have done to update genome assembly after the publication of the initial draft.

Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on March 21, 2015

The authors have adequately addressed all of my previous concerns.

The data will clearly be a valuable resource for the scientific community. To maximize its impact, I would strongly encourage the authors to work with UCSC Genome Browser, RefSeq and/or Ensembl maintainers to make the data readily available on the public browsers (but this should obviously not be a condition of publication). Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    D., M. M., D., M. J., Jr., N. R. B. 2015. De novo assembly of the chimpanzee transcriptome from NextGen mRNA sequences. GigaScience.