Content of review 1, reviewed on July 11, 2014

Basic reporting

No comments

Experimental design

The datasets used in this study should be extended and diversified. As mentioned by the authors, there is a linear growth of the number of clusters with the number of genomes considered. It is therefore difficult to assess the real benefit of the approach in the treatment large datasets. Moreover, the fungi dataset mainly consists of Ascomycota. The approach should be tested on a larger range of eukaryotic sequences including more divergent sequences and in particular, sequences from Metazoan and plants. Proteins from higher eukaryotes often show a mosaic domain composition and must be taken into account to evaluate the robustness of the sub-sequence homology approach.

Validity of the findings

It could be valuable to see the effect of the different parameters on the number of clusters, the fraction of proteins involved in several clusters and the fraction of “overlapping” clusters.

Comments for the author

The proposed method addresses a crucial problem in the field of orthology inference and more generally in comparative genomics. The strategy based on the transitivity of homology takes into account the complexity of protein evolution by considering protein subsequences and by allowing proteins to be included in several clusters. Results obtained on small subsets are promising but have to be confirmed on a larger set including highly modular proteins from large genomes of plants and animals.

Source

    © 2014 the Reviewer (CC-BY 4.0 - source).