Content of review 1, reviewed on April 04, 2018

Zhao et al reported a much improved genome assembly of moso bamboo, and characterized its alternative splicing (AS) atlas. While I find the assembly result very impressive, I have several questions on the methodology.

  1. According to the method text in the "Additional File", the RNA reads were mapped onto the genome by "BLAT" and refined by HISAT. This is a rather unconventional way to map RNA-seq data to a reference genome. Why not just use HISAT2? BLAT was designed to align transcripts, not individual RNA-seq reads. Also I'm not aware of the adjustment function in HISAT. Please make sure the read mapping was done correctly because this is fundamental to the AS analysis.

  2. One important conclusion the authors made is that the conserved genes tend to have more AS events. It is however unclear to me how the authors measured the degrees of conservation. The authors did a gene family classification, and "obtained 8 datasets of orthologous genes representing different levels of conservation (Fig. 3a) designated dataset8 (more conserved genes) to dataset1 (bamboo-specific genes) based on a phylogenetic relationship of 8 selected species." But there was no further explanation. What does the "most conserved genes" in dataset8 entail? Based on presence or absence? And what is the difference between, say dataset8 and dataset7? How gene families were clustered was also unexplained (at least I couldn't find it). These are critical details that are missing.

  3. A species phylogeny was reconstructed from 8 genomes, but no information is provided about how this was done. What are the "single-copy orthologous genes"? What methods and programs you used for phylogenetic reconstruction?

  4. Species divergence time was estimated, but again, the authors provided no methodological detail. How was the molecular clock estimated? Did you test the validity of assuming a molecular clock (e.g. relative rate test)? Further, the time calibrations listed in the Additional File need citations; they also to me look like secondary calibrations rather than "fossil time".

5.Though I appreciate the artistic value of Fig. 3A, it is scientifically incorrect (or at least very confusing). The x-axis is apparently in unit of substitution/site, which is a branch length measurement. It does not make sense to have a terminal tip linked (by vertical dashed line) to a branch length value. There was also no "divergence times" information in this figure and the legend should be revised.

  1. The expansion of lignin biosynthesis genes could be due to whole genome duplication (WGD), but WGD was not discussed. Are the two decoupled?

Some other comments are listed below. The manuscript has no page number, and the line numbers does not match the actual lines and restart in each page, which make the review difficult. Anyway, I tried my best to point out where in the text I was referring to.

Abstract Line 18 - what is "additional abundance data"? You meant sequencing data? Line 31 - "dramatic evolutionary characteristics" is too dramatic and unclear. Please be specific or take out this sentence. Line 39 - what does "bamboo's specificity in being a woody plant" mean? Please clarify "specificity".

Background Line 20 - change "investigated" to "been carried out". Line 46 - change "is responsible" to "is partly responsible". Line 46 - take out "our colorful dynamic world full of". Line 6 "between conservation and AS" - What conservation? Sequence conservation? Gene functional conservation? Amino acid conservation? Protein structural conservation? Line 6 - change "between evolution and the AS status of genes …" to "examine the evolution of AS status of genes …"

Data description Line 21 - change "different strategies" to "different sequencing strategies".

Analyses Line 53 - "assembly" statistics. Line 34 - change "was higher than" to "more complete than" Line 58 - what is "post-regulation level"? you meant post-translational level? Line 25 - "RNA from a mixture of … " this sentence is unclear. Line 48 - "…uniform AS events…" You meant "unique AS events"? Line 4 - what are the four AS types? You need to introduce them first. Line 6 - "A higher accuracy is a strong indicator of …" A higher accuracy of what? Line 38 - change "were detected to TE-insertion" to "have TE insertion" Line 1 - Define D1-D8 Line 6 - which statistic test you used to derive this p value? Line 8 - what do the "original dataset", "overlapping genes", and "duplicated genes" mean here? Line 15 - change "abundance" to "percentage"

Discussion Line 33 - please rephrase this sentence. Line 13 - "in addition to the protein-coding genes AS generates diverse transcripts of non-coding genes" Citation is needed here Line 35 - I do not follow the logic here. You found "no noticeable relationship between TE genes and AS genes", but you suggested that "TE might be a driving force during the formation process of AS in bamboo"? Line 28 - what do you mean by "redundancy" here? Line 34 - take out "As a necessary substrate for the evolution of AS". Line 47 - what are the "many other AS types"? Line 49 - there is no way you could infer the "intermediate evolutionary stage". Plus I couldn't figure out what are the other AS types. Line 2 - line 36 please revise this paragraph. I could not follow the logic nor find the main point. Line 10 - why having more AS events would have "functional priority"? Line 41 - "uniform" you meant "unique"?

Are the methods appropriate to the aims of the study, are they well described, and are necessary controls included? If not, please specify what is required in your comments to the authors.
No

Are the conclusions adequately supported by the data shown? If not, please explain in your comments to the authors. No

Does the manuscript adhere to the journal’s guidelines on minimum standards of reporting? If not, please specify what is required in your comments to the author
No

Are you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? (If an additional statistical review is recommended, please specify what aspects require further assessment in your comments to the editors.)
Yes, and I have assessed the statistics in my report.

Quality of written English Please indicate the quality of language in the manuscript:
Not suitable for publication unless extensively edited

Declaration of competing interests Please complete a declaration of competing interests, consider the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organization that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this manuscript? If you can answer no to all of the above, write ‘I declare that I have no competing interests’ below. If your reply is yes to any, please give details below.
I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
I agree to the open peer review policy of the journal.

Authors' response to reviews: Responses to the comments of Reviewer #1 The authors provide a high-quality genome assembly and gene annotation of moso bamboo in order to improve the first version published in 2013. Transcriptomic analysis was performed using several tissues to identify alternative splicing events and polymorphism within gene transcription by providing a repertoire of alternative transcription that could support tissue specialization. Additionally, an evolutionary insight, especially regarding the genes involved in lignin biosynthesis. The genomic resource described in this manuscript will facilitate future studies on the evolution and functional genomic of moso bamboo and other grasses by providing a valuable information to the researchers interested in this area. I recommend the manuscript for publication, following some minor revision, which I have listed below by manuscript page (p.) and line (L) numbers.

  1. p. 3 - L20: Instead of "…have investigated in bamboo…" should be '…have been investigated in bamboo...'.

Response: Thank you very much for pointing out this error. We have revised the sentence, as follows: “Only a limited number of genome-wide studies have been investigated in bamboo”

  1. p. 4 - L27: Please add suppl. table reference for transcriptomic data.

Response: Thank you for this excellent suggestion. We have added additional table reference, as follows: “Additionally, for the transcriptomic analysis, approximately 379 Gb and 5 Gb of raw data were produced from the Illumina and PacBio platforms, respectively (Additional Tables S2-7)”

  1. p. 4 - L30: It was found a conflicting value "…We identified 266,711 uniform AS…". In the abstract section, the number of transcripts mentioned is 266,771. Please insert the correct value.

Response: Thank you very much for pointing out this error. The number, 266,711, is right number. We are sorry for the typo error and we have revised the sentence in the Abstract section, as follows: “Moreover, we provide a comprehensive AS profile based on the identification of 266,711 uniform AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and PacBio sequencing platforms.”

  1. p. 4 L49: In the sentence "…we performed the genome assembly using different strategies to obtain a better genome assembly.", I suggest indicating the additional reference for detailed steps of the genome assembly.

Response: Thank you for this excellent suggestion. We have added related descriptions, as follows: “Subsequently, we performed the genome assembly using different strategies to obtain a better genome assembly (see the Additional File for details)”

  1. p. 5 L8: Instead of "…rice genome to find a mean coverage…" should be '…rice genome and we obtained a mean coverage…'.

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Then we aligned the moso chromosomes to the rice genome and we obtained a mean coverage of ~59.77%”

  1. p. 5 L13: About bamboo BAC sequences, I suggest mentioning that these sequences are derived from other bamboo specie (Ph. heterocycla) in this section or in the additional table S6.

Response: We appreciate this observation. In fact, the old Latin name, Ph. heterocycla, is a synonym of Ph. edulis and the both Latin names indicate the same bamboo (moso bamboo). Therefore, the BAC sequences mentioned in our manuscript are also derived from moso bamboo.

  1. p. 5 L24: Provide correct reference - "…we predicted 51,074 high-quality protein-coding loci… (Additional Table S10)". Instead of Table S10, it should be Table S11.

Response: Thank you very much for pointing out this error. We have re-organized and re-numbered the Additional Table, as follows: “we predicted 51,074 high-quality protein-coding loci with intact structures in moso bamboo (Additional Table S17)”

  1. p. 5 L30: Provide correct reference - "… ~17% of the gene models were precisely refined (Additional Table S11)". Instead of Table S11, it should be Table S12.

Response: Thank you very much for pointing out this error. We have revised the table and re-organized and re-numbered the Additional Table, as follows: “According to our results, ~17% of the gene models were precisely refined by the UTR addition and internal structural adjustment (Additional Table S19).”

  1. p. 5 L36: Provide correct reference - Regarding the annotation using BUSCO, the reference should be Additional Table S13 instead of S12 in "(Fig 1d and Additional Table S12)".

Response: Thank you very much for pointing out this error. We have added the reference of BUSCO, and re-organized and re-numbered the Additional Table, as follows: “According to the completeness assessment of the annotation using BUSCO [1], moso bamboo (95.2%) was more complete than Z. mays (92.2%) but close to O. sativa (95.6%) (Fig. 1d and Additional Table S20)”

  1. p. 7 L27: The additional table reference for the enrichment analysis should be S25 instead of S26.

Response: Thank you very much for pointing out this error. We have re-organized and re-numbered the Additional Table, as follows: “As the functional implication of AS genes, the enrichment analysis result showed 885 genes, which alternatively spliced in all samples, significantly enriched in RNA metabolic processing, mRNA processing, RNA processing and RNA splicing in the processes (Additional Table S25).”

  1. p. 7 L32: "…which account for one-third of the AS events (termed as among-tissue)." According to additional Fig S18, the AS events classified as among-tissue correspond to two-third of the AS events.

Response: Thank you very much for pointing out this error. We are sorry for the typo error and we have revised the sentence, as follows: “Since AS possess strong specificity to different tissues or developmental stages, we identified 181,105 tissue-specific AS events (67.57%), which account for two-thirds of the AS events (termed as among-tissue).”

  1. p. 8 L43-45: The sentence "the distribution of the TE genes in the 8 datasets was examined. A substantially negative correlation" could be '…was examined and a substantially negative…'

Response: Thank you for this excellent suggestion. We have revised this sentence, as follows: “Moreover, the distribution of the TE genes in the 8 datasets was examined and a substantially negative correlation was observed, indicating that the more conserved genes had more TE insertions.”

  1. p. 10 L28: "…a higher percentage of IR (38.22%) and other AS types (total 28.18%) were observed in bamboo." In order to avoid misunderstanding, I suggest clarifying that 'other AS types' represent a set of AS events except the main AS types already mentioned in the manuscript.

Response: Thank you for this excellent suggestion. We have added the related descriptions in the Analysis part, as follows: “In subsequent analyses, we defined the four main AS types represented intron retention (IR), alternative 3’ splice site donor (A3SS), alternative 5’ splice site acceptor (A5SS), and exon skipping (ES), and we also defined the other AS types represented some AS types except the above four main AS types.”

  1. p. 16 L50: Please, provide release number of the pfam-A.hmm database.

Response: Thank you for this excellent suggestion. We have added the related information, as follows: “The filtered sequences were subsequently analyzed by hmmsearch using the Pfam-A.hmm database (released 2017/03/31).”

Figures and Tables 15. Figure 2: The figure legend should explain the meaning of the acronyms IR, A3SS, A5SS, and ES.

Response: We appreciate this observation. We have added the related description in the figure legend of Figure 2, as follows: “IR, A3SS, A5SS, and ES represents intron retention, alternative 3’ splice site donor, alternative 5’ splice site acceptor, and exon skipping, respectively.”

  1. Figure S3: In the figure legend "…The while boxes…" should be '…The white boxes…'

Response: Thank you very much for pointing out this error. We are sorry for the typo error and we have revised the figure legend of Figure S3, as follows: “The white boxes in the BAC represent ambiguous bases (Ns) and the yellow line represent well aligned sequences between the BAC and the sequences.”

  1. Fig. S17: Is the x-axis data label named 'AS' correct?

Response: Thank you very much for pointing out this error. We are sorry for the typo error. The second pillar in the X-axis should be ‘IR’ instead of ‘AS’. Therefore, we have revised the figure.

  1. Table S1: Please provide the correct number of libraries in the 'Total' description.

Response: Thank you very much for pointing out this error. The total number should be 61 and we have revised the table

  1. Table S3: Asterisk with description in the legend is not shown in the table.

Response: Thank you very much for pointing out this error. We have added asterisks in the Table S3

  1. Table S28: I recommend excluding the words 'totally' and 'were' in the table legend.

Response: Thank you for this excellent suggestion. We have revised the table legend of Table S28, as follows: “Additional Table S28. One hundred and forty genes of lignin biosynthesis pathway experimentally validated collected from public studies”

  1. Some figures and tables citation are missing in the manuscript, such as Fig. 1a, 1b, and 3d; and additional table S23.

Response: Thank you for this excellent suggestion. We have added the figures and tables citation in the manuscript and reorganized tables citation in the Additional Files, as follows: “Then, the Hi-C assembly was generated with total length reached 1.91 Gb as well as contig and scaffold N50 length with 53.29 Kb and 79.90 Mb based on the Hi-C data and the improved WGS assembly (Fig. 1a and 1b).” “Additionally, compared with the AS events among the genes expressed in samples with different specificities (maxTs) (for details, see Methods), the maxTs obviously increased from D8 to D1, representing an enhancement in the sample specificity from a highly conserved gene dataset to a poorly conserved dataset (Fig. 3d).” “Additionally, for the transcriptomic analysis, approximately 379 Gb and 5 Gb of raw data were produced from the Illumina and PacBio platforms, respectively (Additional Tables S2-7)”

Dataset 22. The PacBio reads (IsoSeq) must also be submitted to GiGADB or SRA and their accession number provided in the manuscript.

Response: We appreciate this observation. We have provided the SRA accession number (SRR7032261-69) for Iso-Seq data in the manuscript, as follows: “RNA-Seq raw sequence data for the 26 samples and Iso-Seq raw sequence data for a mixture sample were deposited in NCBI Short Read Archive database under the accession numbers: SRX2408703-28 and SRR7032261-69, respectively.”

Responses to the comments of Reviewer #2 Reviewer #2: Zhao et al reported a much improved genome assembly of moso bamboo, and characterized its alternative splicing (AS) atlas. While I find the assembly result very impressive, I have several questions on the methodology. 1. According to the method text in the "Additional File", the RNA reads were mapped onto the genome by "BLAT" and refined by HISAT. This is a rather unconventional way to map RNA-seq data to a reference genome. Why not just use HISAT2? BLAT was designed to align transcripts, not individual RNA-seq reads. Also, I'm not aware of the adjustment function in HISAT. Please make sure the read mapping was done correctly because this is fundamental to the AS analysis.

Response: Thank you very much for pointing out this error. We are sorry for the unclear and confusion description of the RNA-Seq analyses in our Additional File. Indeed, as your mentioned, correctly mapping is fundamental for AS analyses, we double-checked our shell scripts and found the aligning RNA-Seq reads only used HISAT2 (release 2.0.4) rather than HISAT and BLAT. We have revised the sentence in the Additional File, as follows: “Similarly, RNA-Seq data, a kind of high-throughput expressed data, were mapped to the genome to identify exon-intron splicing junctions and refine the alignment of RNA-Seq reads to the genome, using HISAT2 (version 2.0.4)[2].”

  1. One important conclusion the authors made is that the conserved genes tend to have more AS events. It is however unclear to me how the authors measured the degrees of conservation. The authors did a gene family classification, and "obtained 8 datasets of orthologous genes representing different levels of conservation (Fig. 3a) designated dataset8 (more conserved genes) to dataset1 (bamboo-specific genes) based on a phylogenetic relationship of 8 selected species." But there was no further explanation. What does the "most conserved genes" in dataset8 entail? Based on presence or absence? And what is the difference between, say dataset8 and dataset7? How gene families were clustered was also unexplained (at least I couldn't find it). These are critical details that are missing.

Response: Thank you for this excellent suggestion. Based on the genome-wide identification of orthologous genes in the selected 8 plants (Amborella trichopoda, A. thaliana, Elaeis guineensis, B. distachyon, O. sativa, Spirodela polyrhiza, S. bicolor and Ph. edulis) and the species divergence time in a phylogeny tree (Fig. 3a), we identified eight orthologous gene datasets. For instance, dataset8 (D8) represents common orthologous genes in the selected 8 plants, which were located in an early divergence time in the phylogeny tree. D7 represents common orthologous genes in the selected 7 plants except A. trichopoda (the specie with the earliest divergence time) and D7 doesn’t contain orthologues genes in D8, and so on. Thus, D1 represents bamboo-specific orthologous genes, which were located in later divergence time. According to a previous study [3], we obtained the divergence times of genes based on the presence and absence of orthologs in the phylogeny. In our subsequent study, thus, we considered the bamboo-specific genes (D1) as a poorly conserved gene dataset and the common genes in all selected plants (D8) as a highly conserved gene dataset, and the degree of conservation decreased monotonically from D8 to D1. Lastly, we have revised the related descriptions and Fig. 3a, as follows: “Evolutionary analysis of AS in moso bamboo Based on the genome-wide identification of orthologous genes in the selected 8 plants (Amborella trichopoda, A. thaliana, Elaeis guineensis, B. distachyon, O. sativa, Spirodela polyrhiza, S. bicolor and Ph. edulis) and the species divergence time in a phylogeny tree (Fig. 3a), we identified eight orthologous gene datasets. For instance, dataset8 (D8) represented common orthologous genes in the selected 8 plants, which were located in an early divergence time in our constructed phylogeny. D7 represented common orthologous genes in the selected 7 plants except A. trichopoda (the specie with the earliest divergence time) and D7 doesn’t contain orthologues genes in D8. And so on. Thus, D1 represented bamboo-specific orthologous genes, which were located in later divergence time. According to a previous study[3], we obtained the divergence times of genes based on the presence and absence of orthologs in the phylogeny. In our subsequent study, thus, we considered the bamboo-specific genes (D1) as a poorly conserved gene dataset and the common genes in all selected plants (D8) as a highly conserved gene dataset, and the degree of conservation decreased monotonically from D8 to D1.”

  1. A species phylogeny was reconstructed from 8 genomes, but no information is provided about how this was done. What are the "single-copy orthologous genes"? What methods and programs you used for phylogenetic reconstruction?

Response: Thank you for this excellent suggestion. We have revised the related methods in the Additional File, as follows: “S3.1 Orthologous Gene and Phylogenetic The identification of orthologous gene clusters was considered as a fundamental aspect of genome evolution. Single-copy gene families and multi-gene families were identified by orthMCL (version 2.0.9) [4] among Ph. edulis and other 7 plant species, including Amborella trichopoda (version 1.0) from Amborella Genome Database (amborella.huck.psu.edu), Elaeis guineensis (GCF_000442705.1) from NCBI database, Arabidopsis thaliana (TAIR10), Brachypodium distachyon (version 3.1), Oryza sativa (version 7.0), Spirodela polyrhiza (version 2) and Sorghum bicolor (version 3.1) from the ENSEMBL database. The statistic of the gene family clustering in the 8 species was showed in Additional Table S24. The comparison of gene family clustering was provided in Additional Fig. S7. Afterwards, all single-copy genes were used to construct the phylogenetic tree by PhyML (version 3.0) [5] specifying a HKY85 substitution model with a gamma distribution across sites (Additional Fig. S8).”

  1. Species divergence time was estimated, but again, the authors provided no methodological detail. How was the molecular clock estimated? Did you test the validity of assuming a molecular clock (e.g. relative rate test)? Further, the time calibrations listed in the Additional File need citations; they also to me look like secondary calibrations rather than "fossil time".

Response: Thank you for this excellent suggestion. We are sorry for the unclear and confusion description in the analysis of the species divergence time. Indeed, we estimated the species divergence time using calibration time rather than fossil time and we have revised the related Method in the Additional File, as follows: “S3.3 Estimation of Divergence Time In order to estimate the divergence time between Ph. edulis and the other 7 sequenced plant genomes, a Bayesian relaxed molecular clock approach was used to estimate the divergence time using MCMCTREE in PAML (version 4)[6]. Calibration times were gained from a previous study [7] (O. sativa vs. B. distachyon: 40-54 Mya; O. sativa vs. S. bicolor: 45-60 Mya; A. trichopoda vs. S. bicolor: 119.7-199.3 Mya).”

5.Though I appreciate the artistic value of Fig. 3A, it is scientifically incorrect (or at least very confusing). The x-axis is apparently in unit of substitution/site, which is a branch length measurement. It does not make sense to have a terminal tip linked (by vertical dashed line) to a branch length value. There was also no "divergence times" information in this figure and the legend should be revised.

Response: Thank you for this excellent suggestion. we have re-made the Fig.3a.

  1. The expansion of lignin biosynthesis genes could be due to whole genome duplication (WGD), but WGD was not discussed. Are the two decoupled?

Response: Thank you for this excellent suggestion. According to the additional analysis of the divergence time of lignin biosynthesis genes, we have added an explanation about the expansion of lignin biosynthesis genes in the aspect of WGD in the Analysis and Discussion, respectively, as follows: In the section of Analysis “Additionally, we calculated the synonymous substitution rate analysis for 13 gene families evolved in the lignin biosynthesis using the yn00, which was a package in PAML to estimate synonymous and nonsynonymous substitution rates. Then, the Ks rate was translated to the divergence time by the formula T=Ks/2r (r=6.5×10-9). As shown in Additional Fig. S22, the result indicated that the divergence time of the lignin biosynthesis genes occurred at the 5~16 million year ago (Mya), which correspond to the whole genome duplication (WGD) time 7~12 Mya in the moso bamboo genome [8].” In the section of Discussion “Combined with the results of the divergence time of the lignin biosynthesis genes and our previous study [8], we estimated the occurrence of a putative WGD event at 7~12 Mya in the moso bamboo genome, suggesting that there might have been a tetraploidization event during bamboo history [8]. Then, the ancient tetraploid moso bamboo evolved into a current diploid moso bamboo. Additionally, WGD could provide more gene copies, which facilitated evolving the genes with new functions [9]. Therefore, the expansion of the lignin biosynthesis genes in moso bamboo could be due to the occurrence of WGD event.”

Some other comments are listed below. The manuscript has no page number, and the line numbers does not match the actual lines and restart in each page, which make the review difficult. Anyway, I tried my best to point out where in the text I was referring to. Abstract 7. Line 18 - what is "additional abundance data"? You meant sequencing data?

Response: We appreciate this observation. Indeed, the data means the sequencing data and we have revised the sentence in the Abstract, as follows: “Here, we provide a chromosome-level de novo genome assembly of the moso bamboo (Phyllostachys edulis) using additional abundance sequencing data and hybrid-combined de novo assembly strategies.”

  1. Line 31 - "dramatic evolutionary characteristics" is too dramatic and unclear. Please be specific or take out this sentence.

Response: Thank you for this excellent suggestion. We have removed the sentence in the Abstract, as follows: “Via comparison with orthologous genes in related plants, we observed that the AS genes are concentrated in more conserved genes that tend to accumulate higher expressed transcripts and share less specificity.

  1. Line 39 - what does "bamboo's specificity in being a woody plant" mean? Please clarify "specificity".

Response: Thank you for this excellent suggestion. Our result indicated moso bamboo has the features of woody bamboo in the grass family based on the analysis of the lignin biosynthesis pathway. To properly express the meaning, we have revised the sentence in the Abstract, as follows: “Furthermore, gene family expansion, abundant AS and positive selection were identified in crucial genes involved in lignin biosynthesis, indicating that moso bamboo is a woody plant in the grass family.

Background 10. Line 20 - change "investigated" to "been carried out".

Response: Thank you very much for pointing out this error. We have revised the sentence, as follows: “Only a limited number of genome-wide studies have been investigated in bamboo.”

  1. Line 46 - change "is responsible" to "is partly responsible".

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Species-specific AS is partly responsible for a wide variety of biodiversity with limited repertoires of protein coding genes”

  1. Line 46 - take out "our colorful dynamic world full of".

Response: Thank you for this excellent suggestion. We have removed the part, as follows: “Species-specific AS is partly responsible for a wide variety of biodiversity with limited repertoires of protein coding genes.”

  1. Line 6 "between conservation and AS" - What conservation? Sequence conservation? Gene functional conservation? Amino acid conservation? Protein structural conservation?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “We performed a genome-wide investigation to determine the relationship between amino acid conservation and AS and examine the evolution of AS status of genes that are involved in the lignin biosynthesis.”

  1. Line 6 - change "between evolution and the AS status of genes …" to "examine the evolution of AS status of genes …"

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “We performed a genome-wide investigation to determine the relationship between amino acid conservation and AS and examine the evolution of AS status of genes that are involved in the lignin biosynthesis.”

Data description 15. Line 21 - change "different strategies" to "different sequencing strategies".

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “For the assembly of the moso bamboo genome, approximately 603.3 Gb genome data with different sequencing strategies were generated.”

Analyses 16. Line 53 - "assembly" statistics.

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Compared with those of our previous version[8], the assembly statistics and quality of the new WGS assembly were obviously improved (Additional Tables S9-10).”

  1. Line 34 - change "was higher than" to "more complete than"

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “According to the completeness assessment of the annotation using BUSCO, moso bamboo (95.2%) was more complete than Z. mays (92.2%) but close to O. sativa (95.6%).”

  1. Line 58 - what is "post-regulation level"? you meant post-translational level?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “To facilitate the genome-wide investigation of the AS landscape in moso bamboo and comprehensively identify the factors that influence AS at the post-translational level, we performed high-throughput RNA sequencing (RNA-Seq) using the Illumina HiSeq-4000 platform.”

  1. Line 25 - "RNA from a mixture of … " this sentence is unclear.

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “The full-length cDNA sequencing of alternatively spliced isoforms (Iso-Seq) used RNA from a mixture of 26 samples.”

  1. Line 48 - "…uniform AS events…" You meant "unique AS events"?

Response: We appreciate this observation. The number of the total AS events identified in our study were counted after removing repeated AS events in all 26 samples. Therefore, the word “unique” properly expressed the meaning and we have revised the sentence, as follows: “In total, 266,711 unique AS events were identified in 25,225 AS genes, accounting for ca. 49.39% of all annotated genes.”

  1. Line 4 - what are the four AS types? You need to introduce them first.

Response: Thank you for this excellent suggestion. We have added the introduction in Page 7, as follows: “In subsequent analyses, we defined the four main AS types represented intron retention (IR), alternative 3’ splice site donor (A3SS), alternative 5’ splice site acceptor (A5SS), and exon skipping (ES) [10], and we also defined the other AS types represented some AS types except the above four main AS types.”

  1. Line 6 - "A higher accuracy is a strong indicator of …" A higher accuracy of what?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Thus, a higher proportion of the PacBio-Illumina overlapping AS genes is a strong indicator of the validity of the computationally predicted AS”

  1. Line 38 - change "were detected to TE-insertion" to "have TE insertion"

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “The transposable element (TE) analysis showed 26,366 genes have TE insertion, accounted for 51.62% of all genes, and the total length of TE-insertion in genes was ~46 Mb.”

  1. Line 1 - Define D1-D8

Response: Thank you for this excellent suggestion. Based on the genome-wide identification of orthologous genes in the selected 8 plants (Amborella trichopoda, A. thaliana, Elaeis guineensis, B. distachyon, O. sativa, Spirodela polyrhiza, S. bicolor and Ph. edulis) and the species divergence time in a phylogeny tree (Fig. 3a), we identified eight orthologous gene datasets. For instance, dataset8 (D8) represents common orthologous genes in the selected 8 plants, which were located in an early divergence time in the phylogeny tree. D7 represents common orthologous genes in the selected 7 plants except A. trichopoda (the specie with the earliest divergence time) and D7 doesn’t contain orthologues genes in D8, and so on. Thus, D1 represents bamboo-specific orthologous genes, which were located in later divergence time. According to a previous study [3], we obtained the divergence times of genes based on the presence and absence of orthologs in the phylogeny. In our subsequent study, thus, we considered the bamboo-specific genes (D1) as a poorly conserved gene dataset and the common genes in all selected plants (D8) as a highly conserved gene dataset, and the degree of conservation decreased monotonically from D8 to D1.

  1. Line 6 - which statistic test you used to derive this p value?

Response: Thank you for this excellent suggestion. We used Mann-Whitney U test for P value and we have revised the sentence, as follows: “AS was detected in all datasets, but the proportion of AS genes in each dataset gradually decreased from D8 to D1 (Mann-Whitney U test with p<0.05).”

  1. Line 8 - what do the "original dataset", "overlapping genes", and "duplicated genes" mean here?

Response: Thank you for this excellent suggestion. We have revised the part, as follows: “This trend was also observed in the two other datasets, i.e., removing common genes in more than two gene datasets in eight original datasets and using single-copy genes in eight original datasets. The eight-original dataset was derived from the genome-wide identification of orthologous genes in the selected 8 plants.”

  1. Line 15 - change "abundance" to "percentage"

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “A high percentage (>75%) of AS events was observed in the 4-coumarate: CoA ligase (4CL), hydroxycinnamoyl transferase (HCT) and cinnamyl alcohol dehydrogenase (CAD) gene families.”

Discussion 28. Line 33 - please rephrase this sentence.

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “High-throughput genome sequencing and assembly strategy were broadly applied in current plant genomic studies with the development of new technologies and more useful data.”

  1. Line 13 - "in addition to the protein-coding genes AS generates diverse transcripts of non-coding genes" Citation is needed here

Response: Thank you for this excellent suggestion. We have removed the sentence.

  1. Line 35 - I do not follow the logic here. You found "no noticeable relationship between TE genes and AS genes", but you suggested that "TE might be a driving force during the formation process of AS in bamboo"?

Response: Thank you for this excellent suggestion. A previous study [11] shown TEs constitute crucial gene regulatory elements and influence gene transcription and gene expression. However, the noticeable relationship between TE genes and AS genes was unavailable in our study. Therefore, combined with our result and the previous study, we had implied that TE might be not a main reason of generating alternative splicing and might be a driving force during the formation process of AS in bamboo, although the mechanism of AS formation is still unknown.

  1. Line 28 - what do you mean by "redundancy" here?

Response: Thank you for this excellent suggestion. The redundancy means some genes appeared in more than gene datasets and we have revised the sentence, as follows: “This finding was robust based on we analyzed using the orthologous genes only in one dataset and using single-copy genes in selected species, respectively.”

  1. Line 34 - take out "As a necessary substrate for the evolution of AS".

Response: Thank you for this excellent suggestion. We have removed the part, as follows: “New genes might first generate a single-functional gene without an AS event and then gradually form multifunctional and conserved genes with many AS events [12]”

  1. Line 47 - what are the "many other AS types"?

Response: We appreciate this observation. Many AS types represented some AS types except the main four AS types and we have added the introduction in Page 7, as follows: “In subsequent analyses, we defined the four main AS types represented intron retention (IR), alternative 3’ splice site donor (A3SS), alternative 5’ splice site acceptor (A5SS), and exon skipping (ES) [10], and we also defined the other AS types represented some AS types except the above four main AS types.”

  1. Line 49 - there is no way you could infer the "intermediate evolutionary stage". Plus I couldn't figure out what are the other AS types.

Response: We appreciate this observation. We have added the introduction to the other AS types in Page 7 (see the above answer for details) and revised the sentence, as follows: “Thus, the four main AS types were conserved, and other types might represent an intermediate stage”

  1. Line 2 - line 36 please revise this paragraph. I could not follow the logic nor find the main point.

Response: Thank you for this excellent suggestion. We have greatly revised the paragraph, as follows: “According to our results, the highly conserved gene datasets had more AS genes and events, which either produce functional alternative protein-coding transcripts with distinct functions in biological processes or modulate the functional spliced transcript level by producing certain non-coding transcripts [12]. We hypothesize that the highly conserved genes with more AS events might be critical for evolution and function in generating gene functional diversity and the generation process of the highly conserved genes might undergo rigorous regulation during long-term evolution since the poorly conserved genes had less AS events than the highly conserved genes. Additionally, compared with the poorly conserved gene datasets, the highly conserved AS gene datasets had a low tissue-specific expression profile, indicating these genes might be core genes in fundamental functions, such as serving as hubs in gene-gene networks. Therefore, we proposed that functionally important genes are generated by more frequent AS events. As an essential biological process, AS plays a crucial role in acquiring more functions, which might explain why the highly conserved AS possesses more AS events. We also hypothesize that this phenomenon likely applies not only to bamboo but also to other plants or even animals.”

  1. Line 10 - why having more AS events would have "functional priority"? Response: We appreciate this observation. We have removed the confused description and revised the related description, as follows: “In bamboo, the HCT family has more members and AS events than the CHS family, which indicate that the HCT family might be in a dominant position in the competition to bind p-coumaroyl CoA.”

  2. Line 41 - "uniform" you meant "unique"?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Based on the chromosome-level genome sequence and the abundant transcriptomic data from multiple tissues from six main bamboo producing areas in China, we provide a comprehensive AS perspective of moso bamboo by identifying 266,711 unique AS events in 25,225 AS genes using both the Illumina and PacBio sequencing technology platforms.”

References: 1. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. 2. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods. 2015;12:357–60. 3. Zhang YE, Vibranovski MD, Landback P, Marais GAB, Long M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. Barton NH, editor. PLoS Biology. 2010;8:e1000494. 4. Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Research. 2006;34:D363–8. 5. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology. 2010;59:307–21. 6. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution. 2007;24:1586–91. 7. International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463:763–8. 8. Peng Z, Lu Y, Li L, Zhao Q, Feng Q, Gao Z, et al. The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nature Genetics. 2013;45:456–61. 9. Taylor JS, Raes J. Duplication and divergence: the evolution of new genes and old ideas. Annual Review of Genetics. 2004;38:615–43. 10. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, et al. The Evolutionary Landscape of Alternative Splicing in Vertebrate Species. Science. 2012;338:1587–93. 11. Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nature Reviews Genetics. 2007;8:272–85. 12. Roy SW, Irimia M. Splicing in the eukaryotic ancestor: form, function and dysfunction. Trends in Ecology and Evolution. 2009;24:447–55.

Source

    © 2018 the Reviewer (CC BY 4.0).

Content of review 2, reviewed on May 01, 2018

The authors have clarified some of the issues I raised, but I still have a few comments/suggestions/edits. Please also see the marked word document attached.

Note the the page and line numbers below are based on the attached word file.

Page 2, line 12. What does "uniform" mean here. You meant "unique"?

Page 2, line 16. Please be specific about "specificity" here. You meant "tissue specificity"?

Page 2, line 17. This sentence does not make sense. AS and positive selection on lignin biosynthesis do not indicate moso bamboo is a woody plant.

Page 4, line 6. What is "evolutionary landscape"?

Page 7, line 2. "the four main AS types" appears too suddenly here. You'd need to explain what these four types are first.

Page 7, line 5-6. This correlation was found among the four AS types? If so, you'd need to describe the four AS types first, and then say you found correlation of event number and gene number among the four AS types.

Page 7, line 18. You have two "two-thirds" here, which does not make sense mathematically.

Page 8, line 4. Why do species divergence estimates have anything to do with ortholog classification you described in the following?

Page 8, line 6. "which were located in an early divergence time in our constructed phylogeny" this statement does not make sense. You cannot "locate" in a "time" in a "phylogeny".

Page 8, line 8. "According to a previous study [27], we obtained the divergence times of genes based on the presence and absence of orthologs in the phylogeny" please clarify this sentence. Presence/absence of orthologs cannot tell you the divergence of genes.

Page 8, line 12. Please clarify "removing common genes in more than two gene datasets in eight original datasets and using single-copy genes in eight original datasets". I could not follow. What does "genes in more than two gene datasets" mean?

Page 8, line 16. Did you do statistical test on all these four datasets?

Page 8, line 21-25. Again, what is the statistical result?

Page 8, line 25. "specificity" you meant "tissue specificity"?

Page9, line 11-16. A lot of this can go into the method.

Page 9 Discussion. Discussion in the current form is poorly organized. There are redundant points appearing in multiple paragraphs. Adding subheadings would help.

Page 9, line 28. "High-throughput" is not appropriate to describe "assembly strategy". Also this sentence is written as like you developed new technologies, but I don't think so?

Page 10, line 1-2. I still do not understand why TE could be "a driving force during the formation process of AS in bamboo". Where is the evidence?

Page 11, "More AS events were identified in the sample with vigorous growth, which is consistent with the previous studies" - this was not mentioned in the Results.

Page 11, "Obvious differences were observed in the AS event numbers in the final three shoot developmental stages, likely contributing to the fast growth during shoot development" - this was not mentioned in the Results. Also please provide statistical results to support "obvious".

Page 11, line 23-24. I couldn't follow this sentence: "This finding was robust because we analyzed using the orthologous genes only in one dataset and using single-copy genes in selected species, respectively."

Page 11 line 27. I found the discussion on "new genes" is not well thought out. For example, the "hub genes" or "conserved genes" should have less functional diversity, not higher as the authors asserted here. I suggest drop this section.

Page 12, line 3-5. "Additionally, the four main AS types were abundant in the highly conserved gene datasets, and many other AS types appeared in the poorly conserved datasets. Thus, the four main AS types were conserved, and other types might represent an intermediate stage." I do not follow the logic here. Why AS of other types in the poorly conserved datasets would suggest they are "intermediate". I suggest remove this section.

Page 12, line 14-18. "We hypothesize that the highly conserved genes with more AS events might be critical for evolution and function in generating gene functional diversity and the generation process of the highly conserved genes might undergo rigorous regulation during long-term evolution since the poorly conserved genes had less AS events than the highly conserved genes." This sentence is too long and complicated. Please rephrase.

Page 13, line 2-3. "During the evolutionary process, a new gene might be generated by duplication, which then forms less AS under strict constraints." Again this argument is flawed. A newly duplicated gene should have a "relaxed" functional constraint because a redundant copy is created.

Page 13, line 19-20. What is the rationale that more AS would indicate "a dominant position in the competition to bind p-coumaroyl CoA"?

Page 14, line 12. What does "bamboo evolutionary landscape" mean?

Page 14, line 19. You meant "HuNan", not "HuHan" right?

Page 17, line 8. "identity > 95%"? Could you double check this threshold? Nucleotide identity > 95% is extremely stringent, and I cannot imagine you could get anything out.

Figure 2. Change "PacBio" to "Iso-Seq".

Figure 3C. It is unclear to me what this panel is showing. The figure legend also did not help much.

Figure 3D. Explain the y-axis: what does "number" and "rate" refer to? Also the x-axis "Species" should be replaced by something like "Datasets" right?

Declaration of competing interests Please complete a declaration of competing interests, considering the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below.
I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
I agree to the open peer review policy of the journal.

Authors' response to reviews: Responses to comments of Reviewer #2

The authors have clarified some of the issues I raised, but I still have a few comments/suggestions/edits. Please also see the marked word document attached. Note the page and line numbers below are based on the attached word file.

1.Page 2, line 12. What does "uniform" mean here. You meant "unique"?

Response: Thank you for this excellent suggestion. According to your suggestion, we have revised the sentence, as follows: “we provide a comprehensive AS profile based on the identification of 266,711 unique AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and PacBio sequencing platforms.”

  1. Page 2, line 16. Please be specific about "specificity" here. You meant "tissue specificity"?

Response: Thank you for this excellent suggestion. Indeed, the description of “tissue specificity” was more proper and we have revised the sentence, as follows: “Via comparison with orthologous genes in related plant species, we observed that the AS genes are concentrated in more conserved genes that tend to accumulate higher expressed transcripts and share less tissue specificity.”

  1. Page 2, line 17. This sentence does not make sense. AS and positive selection on lignin biosynthesis do not indicate moso bamboo is a woody plant.

Response: Thank you for this excellent suggestion. We have removed the confused description, as follows: “Furthermore, gene family expansion, abundant AS and positive selection were identified in crucial genes involved in the lignin biosynthesis pathway of moso bamboo.”

  1. Page 4, line 6. What is "evolutionary landscape"?

Response: Thank you for this excellent suggestion. In the latest submission, we have used the description of “evolutionary aspect”, instead of “evolutionary landscape”, as follows: “In conclusion, our analysis not only provides a global profile of AS in bamboo for further experimental studies investigating the functions of genes and regulatory networks but also reveals the roles of AS from the evolutionary aspect.”

  1. Page 7, line 2. "the four main AS types" appears too suddenly here. You'd need to explain what these four types are first.

Response: Thank you for this excellent suggestion. We have transferred the related explanation to the first mentioned place of the four main AS types, as follows: “In subsequent analyses, we defined the four main AS types as: intron retention (IR), alternative 3’ splice site donor (A3SS), alternative 5’ splice site acceptor (A5SS), and exon skipping (ES), and we also defined the other AS types represented some AS types except for the above four main AS types. Then, we found that on average, 80.37% of the AS events and 95.59% of the AS genes overlapped among the four main AS types (Additional Fig. S17).”

  1. Page 7, line 5-6. This correlation was found among the four AS types? If so, you'd need to describe the four AS types first, and then say you found correlation of event number and gene number among the four AS types.

Response: Thank you for this excellent suggestion. According to your suggestion, we have transferred the related explanation to the first mentioned place of the four main AS types and provided the correlation among the four main AS types, as follows: “In subsequent analyses, we defined the four main AS types as: intron retention (IR), alternative 3’ splice site donor (A3SS), alternative 5’ splice site acceptor (A5SS), and exon skipping (ES), and we also defined the other AS types represented some AS types except for the above four main AS types.” “The AS event number was strongly and positively correlated with the AS gene number and those among the four main AS types (R2>0.91, Mann-Whitney U test with p value <0.05) (Fig. 2c).”

  1. Page 7, line 18. You have two "two-thirds" here, which does not make sense mathematically.

Response: Thank you very much for pointing out this error. We have revised the sentence, as follows: “Since AS possess strong specificity to different tissues or developmental stages, we identified 181,105 tissue-specific AS events (67.57%), which account for two-thirds of the AS events (termed as among-tissue). Then, the remaining one-third of the AS events were detected based on comparisons of the transcript isoforms within individual tissues (termed as within-tissue) (Additional Fig. S18).”

  1. Page 8, line 4. Why do species divergence estimates have anything to do with ortholog classification you described in the following?

Response: Thank you for this excellent suggestion. We had performed a genome-wide classification of orthologous genes in the 8 groups. These groups were identified based on the species tree (Fig. 3a). The species divergence time was used to exhibit the origination time of genes in different datasets. The Fig. 3a facilitated to vividly exhibit the relationship. Additionally, we have revised the related description and added the description of origination time, as follows: “Based on the genome-wide identification of orthologous genes in the selected 8 plant species (Amborella trichopoda, A. thaliana, Elaeis guineensis, B. distachyon, O. sativa, Spirodela polyrhiza, S. bicolor and Ph. edulis) and the constructed phylogeny (Fig. 3a), we identified eight unique orthologous gene datasets (D8-D1) based on the origination times of genes in each dataset. For instance, unique orthologous gene dataset 7 (D7) only contained orthologous genes which originated between 164.9 million year ago (Mya) and 213.6 Mya. In addition, we also extracted single-copy genes respectively from above datasets and termed as D8s-D1s.”

  1. Page 8, line 6. "which were located in an early divergence time in our constructed phylogeny" this statement does not make sense. You cannot "locate" in a "time" in a "phylogeny".

Response: Thank you for this excellent suggestion. We have revised the description, as follows: “For instance, unique orthologous gene dataset 7 (D7) only contained orthologous genes which originated between 164.9 million year ago (Mya) and 213.6 Mya.”

  1. Page 8, line 8. "According to a previous study [27], we obtained the divergence times of genes based on the presence and absence of orthologs in the phylogeny" please clarify this sentence. Presence/absence of orthologs cannot tell you the divergence of genes.

Response: Thank you for this excellent suggestion. According to your suggestion, we have removed the description.

  1. Page 8, line 12. Please clarify "removing common genes in more than two gene datasets in eight original datasets and using single-copy genes in eight original datasets". I could not follow. What does "genes in more than two gene datasets" mean?

Response: Thank you for this excellent suggestion. We have revised the description, as follows: “This trend was also observed in the single-copy datasets (D8s-D1s).”

  1. Page 8, line 16. Did you do statistical test on all these four datasets?

Response: Thank you for this excellent suggestion. We have conducted a Chi square test on each corresponding orthologous group between D8-D1 and D8s-D1s (for example: D7 vs D7s), with p-value ranging from 0.86 to 0.98. Therefore, we concluded that the identical trends were detected in the two datasets. Additionally, we have revised the description, as follows: “We investigated the distribution pattern of the four focal AS types in each dataset and found the identical trends (Fig. 3b), but the proportion of the AS types differed (IR>A3SS>A5SS>ES, Chi square test with p-value >0.86).”

  1. Page 8, line 21-25. Again, what is the statistical result?

Response: Thank you for this excellent suggestion. We have conducted Pearson correlation between the median of maxTs and origination time in each group on D8-D1 (R2=0.863 and p value <0.01). Additionally, we have revised the description, as follows: “Additionally, we compared with the AS events among the genes expressed in samples with different tissue specificities (maxTs) (for details, see Methods). The maxTs=1 and maxTs=0 represented constitutive expression and tissue specific expression, respectively. We found that the maxTs was negatively correlated with the origination time of the genes in D8-D1 (R2 > 0.86 and p value <0.01), representing an enhancement in the tissue specificity from a highly conserved gene dataset to a poorly conserved dataset (Fig. 3d).”

  1. Page 8, line 25. "specificity" you meant "tissue specificity"?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “Additionally, compared with the AS events among the genes expressed in samples with different specificities (maxTs) (for details, see Methods), the maxTs obviously increased from D8 to D1, representing an enhancement in the tissue specificity from a highly conserved gene dataset to a poorly conserved dataset (Fig. 3d).

  1. Page9, line 11-16. A lot of this can go into the method.

Response: Thank you for this excellent suggestion. We have revised the part and transferred the related description to the Method, as follows:

Analysis: “Additionally, the divergence time of the gene involved in the lignin biosynthesis pathway (Additional Fig. S22) occurred at the 5~16 Mya, which correspond to the whole genome duplication (WGD) time 7~12 Mya in the moso bamboo genome.”

Method: “We calculated the synonymous substitution rate analysis for 13 gene families evolved in the lignin biosynthesis using the yn00, which was a package in PAML to estimate synonymous and nonsynonymous substitution rates. Then, the Ks rate was translated to the divergence time by the formula T=Ks/2r (r=6.5×10-9).”

  1. Page 9 Discussion. Discussion in the current form is poorly organized. There are redundant points appearing in multiple paragraphs. Adding subheadings would help.

Response: Thank you for this excellent suggestion. According to your suggestion and the author instruction of GigaScience, we have removed redundant description and added the subheading. Please see the new revision for details due to many modifications.

  1. Page 9, line 28. "High-throughput" is not appropriate to describe "assembly strategy". Also this sentence is written as like you developed new technologies, but I don't think so?

Response: Thank you for this excellent suggestion. We have revised the sentence, as follows: “High-throughput genome sequencing and improved assembly strategy were broadly applied in current plant genomic studies with the development of new technologies and more useful data.”

  1. Page 10, line 1-2. I still do not understand why TE could be "a driving force during the formation process of AS in bamboo". Where is the evidence?

Response: Thank you for this excellent suggestion. We have removed the description.

  1. Page 11, "More AS events were identified in the sample with vigorous growth, which is consistent with the previous studies" - this was not mentioned in the Results.

Response: Thank you for this excellent suggestion. We have removed the description in discussion.

  1. Page 11, "Obvious differences were observed in the AS event numbers in the final three shoot developmental stages, likely contributing to the fast growth during shoot development" - this was not mentioned in the Results. Also please provide statistical results to support "obvious".

Response: Thank you for this excellent suggestion. We have removed the description in discussion.

  1. Page 11, line 23-24. I couldn't follow this sentence: "This finding was robust because we analyzed using the orthologous genes only in one dataset and using single-copy genes in selected species, respectively."

Response: Thank you for this excellent suggestion. We have revised the description, as follows: “This finding was robust because we found the identical trends in the two types of eight gene datasets (D8-D1 and D8s-D1s).”

  1. Page 11 line 27. I found the discussion on "new genes" is not well thought out. For example, the "hub genes" or "conserved genes" should have less functional diversity, not higher as the authors asserted here. I suggest drop this section.

Response: Thank you for this excellent suggestion. According to the references focusing on new genes, we have tried to revise the description, as follows: “Previous reports have demonstrated that duplication is a major source of functional diversity and the generation of new genes [35], and conserved genes tend to have higher connectivity in gene-gene interaction networks, indicating their functional importance, while new genes were firstly added into gene-gene interaction networks with low connectivity and then gradually increased their connectivity and acquire pleiotropic roles [22,36]. In our study, highly conserved genes tended to have more AS events than poorly ones, which was consistent with the trend that conserved genes were apt to have higher connectivity in gene-gene interaction networks. Thus, we proposed that the AS may be associated with the increase of gene connectivity during evolution.”

  1. Page 12, line 3-5. "Additionally, the four main AS types were abundant in the highly conserved gene datasets, and many other AS types appeared in the poorly conserved datasets. Thus, the four main AS types were conserved, and other types might represent an intermediate stage." I do not follow the logic here. Why AS of other types in the poorly conserved datasets would suggest they are "intermediate". I suggest remove this section.

Response: Thank you for this excellent suggestion. According to your suggestion, we have removed the section in discussion.

  1. Page 12, line 14-18. "We hypothesize that the highly conserved genes with more AS events might be critical for evolution and function in generating gene functional diversity and the generation process of the highly conserved genes might undergo rigorous regulation during long-term evolution since the poorly conserved genes had less AS events than the highly conserved genes." This sentence is too long and complicated. Please rephrase.

Response: Thank you for this excellent suggestion. We have rewritten the section and removed the redundant point in discussion, as follows: “Previous reports have demonstrated that duplication is a major source of functional diversity and the generation of new genes [35], and conserved genes tend to have higher connectivity in gene-gene interaction networks, indicating their functional importance, while new genes were firstly added into gene-gene interaction networks with low connectivity and then gradually increased their connectivity and acquire pleiotropic roles [22,36]. In our study, highly conserved genes tended to have more AS events than poorly ones, which was consistent with the trend that conserved genes were apt to have higher connectivity in gene-gene interaction networks. Thus, we proposed that the AS may be associated with the increases of gene connectivity during evolution.”

  1. Page 13, line 2-3. "During the evolutionary process, a new gene might be generated by duplication, which then forms less AS under strict constraints." Again this argument is flawed. A newly duplicated gene should have a "relaxed" functional constraint because a redundant copy is created.

Response: Thank you for this excellent suggestion. Indeed, the generation of a new genes was likely caused by either relaxation of functional constraint or positive Darwinian selection [1,2] and we have reorganized the section and removed the redundant point, as follows: “Previous reports have demonstrated that duplication is a major source of functional diversity and the generation of new genes [35], and conserved genes tend to have higher connectivity in gene-gene interaction networks, indicating their functional importance, while new genes were firstly added into gene-gene interaction networks with low connectivity and then gradually increased their connectivity and acquire pleiotropic roles [22,36]. In our study, highly conserved genes tended to have more AS events than poorly ones, which was consistent with the trend that conserved genes were apt to have higher connectivity in gene-gene interaction networks. Thus, we proposed that the AS may be associated with the increases of gene connectivity during evolution.”

  1. Page 13, line 19-20. What is the rationale that more AS would indicate "a dominant position in the competition to bind p-coumaroyl CoA"?

Response: Thank you for this excellent suggestion. HCT generates lignin by catalyzing p-coumaroyl CoA, which is also catalyzed by CHS to generate flavonoids. Thus, HCT and CHS compete with each other to bind p-coumaroyl CoA. In bamboo, the HCT family has more members and AS events than the CHS family as well as positive selection was detected in HCT family, which likely indicate that HCT family, compared to the CHS family, might be in a dominant position in the competition to bind p-coumaroyl CoA. Additionally, we have revised the description, as follows: “In bamboo, the HCT family has more members and AS events than the CHS family, which likely indicate that the HCT family, compared to the CHS family, might be in a dominant position in the competition to bind p-coumaroyl CoA.”

  1. Page 14, line 12. What does "bamboo evolutionary landscape" mean?

Response: Thank you for this excellent suggestion. We have used the description of “evolutionary aspect”, instead of “evolutionary landscape”. Additionally, we have revised the description, as follows: “In summary, these results will likely provide important resources for studies investigating bamboo’s unique woodiness in the Grass family (Poaceae) and exploring AS from the bamboo evolutionary aspect.”

  1. Page 14, line 19. You meant "HuNan", not "HuHan" right?

Response: Thank you very much for pointing out this error. We have revised the sentence, as follows: “(4) TaoJiang, HuNan Province (N:28°28′39.74″, E:112°11′18.62″, 320 M),”

  1. Page 17, line 8. "identity > 95%"? Could you double check this threshold? Nucleotide identity > 95% is extremely stringent, and I cannot imagine you could get anything out.

Response: Thank you for this excellent suggestion. Indeed, the threshold was mistakes by double-checking our script and we have revised the description, as follows: “Briefly, we performed standard protein BLAST searches (version 2.2.26) against the six genome sequences including moso bamboo using the coding sequence of the known genes with the following cut-off values: E-value <1e-10; identity > 40%; and coverage rate > 95% of query sequence.”

  1. Figure 2. Change "PacBio" to "Iso-Seq".

Response: Thank you for this excellent suggestion. According to your suggestion, we have revised the Figure 2.

  1. Figure 3C. It is unclear to me what this panel is showing. The figure legend also did not help much.

Response: Thank you for this excellent suggestion. According to your suggestion, we have revised the Figure 3C.

  1. Figure 3D. Explain the y-axis: what does "number" and "rate" refer to? Also the x-axis "Species" should be replaced by something like "Datasets" right?

Response: Thank you for this excellent suggestion. Number and Rate refer to AS number and maxTs. According to your suggestion, we have revised the Figure 3D.

Responses to the comments of Reviewer #3

Reviewer #3: The manuscript presents a comprehensive assembly of the bamboo genome which provides assembled chromosomes; an improvement from the current more fragmented assembly for the species. It also identifies alternative splicing events from transcriptome data corresponding to 26 different tissues. I believe that the data presented will be of great use to the scientific community, in particular those working on genomics and those interested in transcriptomics and alternative splicing.

Main comments: 1. FOCUS AND JUSTIFICATION. However, I think that study could be better justified and given a focus. While the relevance of having a more complete genome for an important plant is well justified, it is not clear how this relates to alternative splicing. There is also no justification as to why examining alternative splicing is important. For example, in the abstract it is stated that the paper assembles the genome and identifies alternative splicing events but does not explain WHY has alternative splicing was an important aspect to explore in a paper presenting a more complete assembly of the bamboo genome. Thus, one has the impression that this paper contains two separate stories running side by side. Perhaps one solution is to explain that gene duplication and alternative splicing are important drivers of functional evolution in genomes. That incomplete and fragmented scaffolds of genomes makes it difficult to assess patterns of gene duplication and that low coverage transcriptomes of only a handful of tissues does not allow to fully understand the extent of alternative splicing. Thus, having a fully assembled genome as well as an extensive RNA sequencing for recovering RNA isoforms is required. It is important to explain in detail WHY is alternative splicing important.

Response: Thank you for this excellent suggestion. According to your suggestion, we have added the information in the Background, as follows:

“The incomplete and scattered scaffolds of moso bamboo genome and the low coverage transcriptomes of a handful of tissues make it difficult to fully dissert the AS profiles. Therefore, a high-quality assembled genome and an extensive RNA sequencing are critical for the comprehensive AS identification.”

  1. JUSTIFICATION AND ORDER OF SPECIFIC ANALYSES. It is unclear to me why the description of the transposable element content should be in the section discussing alternative splicing events rather than on the section describing the genome sequence obtained.

Response: Thank you for this excellent suggestion. Due to the major modification, the description of the transposable element was removed in the Discussion.

  1. DISCUSSION. Justification and relevance of many of the tests done is more evident in the discussion some of this information would be better placed in the introduction or as brief sentences in the results so that the analyses make sense as they are presented.

Response: Thank you for this excellent suggestion. According to your suggestion, we have majorly revised the Discussion. Please see the details in the latest revision.

More specific points 4. On the evolution of AS section specify how many orthologs were found when comparing between species and what percentage of the total number of genes in the bamboo this represents. Even if these numbers are shown in the figure/tables it is always helpful to get an idea of the patterns from the text alone.

Response: Thank you for this excellent suggestion. According to your suggestion, we have added the orthologs number in the main text, as follows: “We considered the bamboo-specific genes (4,023 orthologous genes; termed as D1) are poorly conserved, whereas the genes present in all selected plant species (18,997 orthologous genes; termed as D8) are highly conserved.”

  1. On the evolution of AS section it is not clear what analysis was done to compare the ortholog genes in other species. If I understand correctly, the analyses tries to assess whether having an older ortholog is associated with higher or lower rates of alternative splicing in the bamboo genome? This needs to be better worded. It is also important to explain WHY would this pattern be interesting important to understand evolution of alternative splicing. There is also a reference to a "robust pattern", does this refer to past literature in other species? If so then this needs to be better explained. If actual conservation or overall patterns of AS in other plants, rather than presence absence of ortholog genes, were compared to the bamboo then this needs to be better explained as at the moment it is very unclear.

Response: Thank you for this excellent suggestion. Your understanding of this part is correct. Indeed, we tried to assess whether having an older ortholog is associated with higher or lower rates of alternative splicing in the bamboo genome? We then found that more conserved genes had more AS genes in bamboo. We are sorry for confusions. According to your suggestion, we have greatly revised this part, as follows: “Based on the genome-wide identification of orthologous genes in the selected 8 plant species (Amborella trichopoda, A. thaliana, Elaeis guineensis, B. distachyon, O. sativa, Spirodela polyrhiza, S. bicolor and Ph. edulis) and the constructed phylogeny (Fig. 3a), we identified eight unique orthologous gene datasets (D8-D1) based on the origination times of genes in each dataset. For instance, unique orthologous gene dataset 7 (D7) only contained orthologous genes which originated between 164.9 million year ago (Mya) and 213.6 Mya. In addition, we also extracted single-copy genes respectively from above datasets and termed as D8s-D1s. We considered the bamboo-specific genes (4,023 orthologous genes; termed as D1) are poorly conserved, whereas the genes present in all selected plant species (18,997 orthologous genes; termed as D8) are highly conserved. The degree of conservation decreased monotonically from D8 to D1. AS was detected in all the datasets, but the proportion of AS genes in each dataset gradually decreased from D8 to D1 (Mann-Whitney U test with p value <0.05). This trend was also observed in the single-copy datasets (D8s-D1s). Therefore, the result was robust that more conserved genes had more AS genes in bamboo.”

  1. WHY the authors examine proportions of AS events by type is not entirely clear. There is an extensive literature on the patterns of prevalence of AS types as well as the differences in their potential contribution to functional adaptation.

Response: Thank you for this excellent suggestion. The difference in the frequencies or proportions of the AS types may reflect differences in their pre-mRNA splicing and this analysis is common in most genome-wide identification of AS. Thus, we examined the proportion of AS types and provided the details of bamboo for comparative analysis. Additionally, we have revised the description, as follows: “The difference in the frequencies or proportions of the AS types may reflect differences in their pre-mRNA splicing and this analysis is common in most genome-wide identification of AS. The distribution of the AS types depicted that IR occupied the dominant position, indicating that the importance of IR could be inferred from inspecting its prevalence throughout evolution in plants. Nevertheless, a higher percentage of IR (38.22%) and other AS types (total 28.18%) were observed in bamboo.”

  1. It is unclear what the correlations for CDS length and intron number, etc. involved. Is this to correlate these parameters among the bamboo with its ortholog in the other plants? It is also not explained WHY was this done.

Response: Thank you for this excellent suggestion. To obtain an overview of the landscape of AS and its relationships with gene features, and to evaluate the factors that influence AS, we perform correlations between AS distribution and some gene features (e.g. the gene length, CDS length, intron length, exon number, exon cassette length, and intron cassette length) in the datasets. Additionally, we have revised the description, as follows: “To obtain an overview of the landscape of AS and its relationships with gene features and to evaluate the factors that influence AS, we also examined the correlations between AS distribution and gene features in the datasets (Additional Fig. S21).”

  1. In the last section of the results, the title implies that the EVOLUTION of gene families was assessed. However the text below does not give any details of how was this done or whether there has actually been an expansion and if so, with respect to WHAT species… It is also not explained WHY only 13 gene families were assessed.

Response: Thank you for this excellent suggestion. To better understand the identification of gene involved in the lignin biosynthetic pathway, we have revised the related method in the following. Additionally, according to the identification described previously [3], 13 gene families belong to the lignin biosynthesis pathway.

In the method “Genome-wide identification of genes involved in the lignin biosynthetic pathway

The five genome sequences of A. thaliana (TAIR10), B. distachyon (v3.1), O. sativa (v7.0), Populus trichocarpa (JGI2.0.31), and S. bicolor (v3.1) were downloaded from the ENSEMBL database [4]. According to our literature-based investigations, 140 genes from the lignin biosynthetic pathway was experimentally validated from previous studies (Additional Table S28), and then, these known genes were collected and used as the query sequences for further identification. We identified lignin biosynthetic genes using a BLAST search and domain analysis as described in a previous article[5]. Briefly, we performed standard protein BLAST searches (version 2.2.26) against the six genome sequences including moso bamboo using the coding sequence of the known genes with the following cut-off values: E-value <1e-10; identity >40%; and coverage rate >95% query sequence. The filtered sequences were subsequently analyzed by hmmsearch (version 3.1b2) using the Pfam-A.hmm database (released 2017/03/31). Consequently, unclear sequences with incomplete domains were discarded after manual correction. Phylogenetic analyses were carried out following. We also calculated the synonymous substitution rate analysis for 13 gene families evolved in the lignin biosynthesis using the yn00, which was a package in PAML to estimate synonymous and nonsynonymous substitution rates. Then, the Ks rate was translated to the divergence time by the formula T=Ks/2r (r=6.5×10-9).”

  1. The discussion states that the paper presents evidence consistent with the role of TE in driving AS. The results only present a description of the rates of AS and the presence of TEs. If one drives the other then some analysis to link the two should be presented.

Response: Thank you for this excellent suggestion. According to your and other reviewer’s suggestion, we have removed the description in Discussion.

  1. I could not find in the results section a reference to the sample with more vigorous growth to have higher AS as it is stated in the introduction. Could this be made more prominent so that when reading the discussion this result can be easily found.

Response: Thank you for this excellent suggestion. Due to greatly modification of Discussion, we have removed the description.

  1. I think a better explanation of what is a poorly conserved dataset and a highly conserved dataset means.

Response: Thank you for this excellent suggestion. According to your suggestion, we have greatly revised the part of “Evolutionary analysis of AS in moso bamboo” and provided an explanation about the poorly/highly conserved dataset, as follows: “We considered the bamboo-specific genes (4,023 orthologous genes; termed as D1) are poorly conserved, whereas the genes present in all selected plant species (18,997 orthologous genes; termed as D8) are highly conserved. The degree of conservation decreased monotonically from D8 to D1.”

References: 1. Chen S, Zhang YE, Long M. New genes in Drosophila quickly become essential. Science. 2010;330:1682–5. 2. Long M, Betrán E, Thornton K, Wang W. The origin of new genes: glimpses from the young and old. Nature Reviews Genetics. Nature Publishing Group; 2003;4:865–75. 3. Vanholme R, Demedts B, Morreel K, Ralph J, Boerjan W. Lignin biosynthesis and structure. PLANT PHYSIOLOGY. American Society of Plant Biologists; 2010;153:895–905. 4. Kersey PJ, Allen JE, Allot A, Barba M, Boddu S, Bolt BJ, et al. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Res. 2018;46:D802–8. 5. Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, et al. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics. Hoboken, NJ, USA: John Wiley & Sons, Inc; 2011;Chapter 6:Unit6.12.1–19. 6. Zhang YE, Vibranovski MD, Landback P, Marais GAB, Long M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. Barton NH, editor. PLoS Biol. Public Library of Science; 2010;8:e1000494.

Source

    © 2018 the Reviewer (CC BY 4.0).

Content of review 3, reviewed on August 10, 2018

I believe the authors have nicely addressed all my previous concerns!!

Declaration of competing interests Please complete a declaration of competing interests, considering the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below.
I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
I agree to the open peer review policy of the journal

Source

    © 2018 the Reviewer (CC BY 4.0).

References

    Hansheng, Z., Zhimin, G., Le, W., Jiongliang, W., Songbo, W., Benhua, F., Chunhai, C., Chengcheng, S., Xiaochuan, L., Hailin, Z., Yongfeng, L., LianFu, C., Huayu, S., Xianqiang, Z., Sining, W., Chi, Z., Hao, X., Lichao, L., Yihong, Y., Yanli, W., Wei, Y., Qiang, G., Huanming, Y., Shancen, Z., Zehui, J. 2018. Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis). GigaScience.