Content of review 1, reviewed on February 13, 2015

In order to provide the source for filling the gap in current chimpanzee genome and experimental validation of candidate gene models, the authors generated~31.2 gigabases Illunima RNA sequencing data from 4 different cell lines of one chimpanzee, and conducted de novo transcritopme assembly. In total, they obtained 147,664 transcripts including 44,269 with full length coding sequences. Compared to ensemble gene annotation, 1,906 genes were species to de novo assembly method while much large portion of genes was not covered.

In general, the data analysis and result presentation has many major problems for considering publication.

Major:

  1. For de novo assembly, the author conducted assembly for each of the four dataset separately. Since data depth is the most crucial factor for successful de novo assembly, combining all RNA-seq data together would expect to produce better assembly result.

  2. Is it a justified method to remove DNA contamination by remove sequencing reads that are not matching to human refseq mRNA data? How many reads were retained after applying this method?

  3. Full length CDS annotation is not enough to provide better gene annotation, the UTR sequence comparison should also be considered.

  4. The author did not provide more informative result for the ensemble gene annotation comparison section. The number of gene overlaps is far from the descent gene annotation comparison result. Besides, since only 8,155 out of 14,864 ensembl annotated genes were covered by assembly method obtained genes, the de novo assembly based gene annotation did not provide improvement for current chimpanzee gene annotation.

Minor:

  1. Why “marmoset” appeared in the figure legend? Isn't it chimpanzee?

  2. All annotation files (such as human refseq gene annotation) should provide version information.

  3. Provide the number of full-length CDS transcript in Table2.

  4. The sequencing data are paired-end? This information should be describe in the method. Level of interest An article of limited interest Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests none

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    D., M. M., D., M. J., Jr., N. R. B. 2015. De novo assembly of the chimpanzee transcriptome from NextGen mRNA sequences. GigaScience.