Content of review 1, reviewed on November 07, 2014

Yoshizaki, Okuda and collaborators identified 178 phosphomotifs in a previous work. Here, Yoshizaki and Okuda identified sequences that matched these phosphomotifs in 9 model organisms. By mapping phosphorylation data from 2 databases (PhosphoSitePlus and PhosphoELM) they defined sets of known and potential phosphosites. Then, they compared the evolutionary conservation of known and potential phosphosites and found that for most phosphomotifs, known phosphorylation sites are more conserved than potential phosphosites. By using the kinase-substrate relationships available on PhosphoSitePlus, the authors determined the kinase preferences for the different phosphomotifs and put these preferences in the context of phosphomotif conservation. Finally, the authors explored the correlations between phosphorylation motifs and cellular functions through a GO enrichment analysis and determined the protein interaction networks for proteins sharing a specific phosphomotif. All the previously described analyses were conducted to the aim of providing a framework to find associations between phosphorylation sites to cellular functions.

I think the data set provided by Yoshizaki and Okuda represents a useful resource for the field of phosphoproteomics. However, I feel that the quality of their data set can be improved by adding more phosphorylation data.

(1) Major Compulsory Revisions

1.1 -- There are others databases and studies that in my opinion would increase the number of known phosphorylation sites considered in the study. I would suggest the authors to include the phosphosites contained in these databases/studies in their dataset. * Beltrao P, Albanese V, Kenner LR, Swaney DL, Burlingame A, et al. (2012) Systematic functional prioritization of protein posttranslational modifications. Cell 150: 413–425. doi: 10.1016/j.cell.2012.05.036 * Gnad F, Gunawardena J, Mann M (2011) PHOSIDA 2011: the posttranslational modification database. Nucleic acids research 39: D253–260. doi: 10.1093/nar/gkq1159 * Minguez P, Parca L, Diella F, Mende DR, Kumar R, et al. (2012) Deciphering a global network of functionally associated post-translational modifications. Molecular systems biology 8: 599. doi: 10.1038/msb.2012.31 * Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, et al. (2009) Human Protein Reference Database–2009 update. Nucleic acids research 37: D767–772. doi: 10.1093/nar/gkn892

(2) Minor Essential Revisions

2.1 -- Figure 2: the x axis of the graph should be labelled.

2.2 -- Figure 2: I suggest to change the color of the bars for the upper part, since I printed the article in black and white and I cannot distinguish the two colors.

##############################

Minor issues not for publication##

##############################

2.3 -- Findings, third phrase: Please change "Comparative genomics analysis used genomes of nine species;" to "Comparative genomics analysis used genomes of nine species:"

2.4 -- Findings, third phrase: please change "Pan troglodyte" to "Pan troglodytes"

2.5 -- Utility of the dataset, Second phrase: "Large-scale mass spectrometry (MS)...". The abbreviation "(MS)" is used only once. I suggest to remove it.

2.6 -- Utility of the dataset, fifth phrase: please "Because many kinases has" to "Because many kinases have". I would also reformulate the phrase. For instance: "Since different kinases target a different and specific sequence motif in te surrounding region of the phosphosite, such "

2.7 -- Assignment of kinase-substrate information, forth phrase: "Most S/T-P (86%) was...". I would reformulate this sentence. For example: Most S/T-P sites (86%) were...

2.8 -- List of abbreviations: please change "has: Homo sapiens" to "hsa: Homo sapiens"

(3) Discretionary Revisions

##############################

Minor issues not for publication##

##############################

3.1 -- Background, first sentence: I suggest to reformulate this sentence, since it is too vague.

3.2 -- Utility of the dataset, Second phrase: "Large-scale mass spectrometry (MS) has allowed the identification of phosphosites, of which >100,000 phosphosites...". I suggest to reformulate the sentence. For instance: Large-scale mass spectrometry (MS) has allowed the identification of phosphosites and >100,000 phosphosites...

3.3 -- Utility of the dataset, fourth phrase: "..., 518 protein kinases for phosphorylation modification have been reported". I would just say: ...,518 protein kinases have been reported

3.4 -- Acknowledgements, second phrase: "This work was supported by a Grant-in-Aid for Young Scientists (B) from the Ministry...". I would add the number of the grant. Level of interest An article whose findings are important to those with closely related research interests Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Authors' response: (http://www.gigasciencejournal.com/imedia/7905334061612342_comment.pdf)

Source

    © 2014 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on January 23, 2015

Yoshizaki and Okuda answered all the points I raised in the first review step and I feel that their work significantly improved compared to the first submission. However, I have still a few questions that need to be addressed before I fully support the publication of this manuscript.

(1) Major Compulsory Revisions

1.1 -- I would ask the authors to carefully review the text one more time to make it more clear (this is a general comment, but the most crucial point would be the "Utility of the dataset" section). Although this point has been raised by reviewer #1, I feel it has not been fully addressed in the current revision. To me, this is the main current limitation of this work, that otherwise provides valuable data that complement those of the previous paper the authors published on this subject.

In this section ("Utility of the dataset"), I would suggest the authors to point out:

  • the novelty of this data set compared to the previous studies on this subject and why this data set is important (for instance, the authors could talk about the fact that their dataset can help elucidating the function of phosphomotifs and their role in cellular signaling by showing how they evolved, by providing information about the kinases that phosphorylate them, by describing the interaction networks of proteins with the same motif and by finding associations between the motifs and biological functions. All the information is already there but it is compressed in a few words.)

  • how with this data set we can learn something new about the evolution of phosphorylation signaling (the authors' results about the kinase expansion in vertebrates show that these data are valuable. They could mention this.)

Other parts that I think may be improved: * Abstract: Findings -- e.g. "The present 178 identified phosphomotifs...". It is the first time the authors talk about phosphomotifs. I feel it would be more appropriate to say "We identified 178..." or, more correctly, "In a previous study we identified..." * Abstract: Findings -- the last two phrases are a bit disconnected from the previous ones. The authors could reiterate the fact that we are talking about phosphomotifs to facilitate the reading.

(2) Minor Essential Revisions

2.1 -- I do not feel that the word "demonstrate" is used correctly in the text. To demonstrate, in a scientific context, means "clearly show the existence or truth of (something) by giving proof or evidence" [1]. I think that in at least two points in the manuscript, the use of the word "demonstrate" is confusing: "we demonstrate the use of screening methods for the assessment of functional phosphorylation signaling" -- this phrase describes more the previous work (BMC genomics, 2014) the authors published on this subject than this one "we demonstrated interaction networks of proteins, identified kinase substrates associated with phosphoproteins..."

[1] http://www.oxforddictionaries.com/definition/english/demonstrate

2.2 -- Description of additional data file 2: "the color of cells indicates that annotations were present, and white coloring indicates the absence of annotation". When I open the file, I see that the cells are black and pink, that does not reflect what is written in the description.

(3) Discretionary Revisions

3.1 -- "including yeast and human genome". I would rather say "that span from yeast to humans" Level of interest An article whose findings are important to those with closely related research interests Quality of written English Needs some language corrections before being published Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Authors' response: (http://www.gigasciencejournal.com/imedia/7905334061612342_comment.pdf)

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 3, reviewed on March 11, 2015

The authors have addressed all my points and I feel the manuscript is clearer now. Level of interest An article whose findings are important to those with closely related research interests Quality of written English Needs some language corrections before being published Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Hisayoshi, Y., Shujiro, O. 2015. Large-scale analysis of the evolutionary histories of phosphorylation motifs in the human genome. GigaScience.