Content of review 1, reviewed on October 06, 2015

In the current manuscript, the authors present a de novo transcriptome assembly for the Western
tarnished plant bug, Lygus hesperus. The manuscript is well written and the new sequencing data
should be useful for other researchers studying this pest species and related Hemiptera.

In addition to the raw data from their previous study, the authors have generated a large amount
of new Illumina reads for the transcriptome assembly. This is certainly a highlight of the study
and should be mentioned prominently in the abstract. Whereas, in its current form, the abstract
reads as if the authors merely generated a new assembly from already published data. The
manuscript should also make it more clear how much new data was produced specifically for this
study and how much derives from the authors' previous sequencing effort. From Table 1, I
conclude that 11 new Illumina runs were performed on 5 cDNA libraries from different tissues,
which presumably resulted in ~293,000,000 bp of new raw sequencing data. This information
should be provided in the text.

Table 2 compares the new transcriptome assembly to previous Lygus transcriptomes. The current
assembly is based on more than three times the amount of read pairs compared to the previous
Illumina transcriptome (~438,000,000 vs. ~145,000,000). However, despite this significant
improvement in terms of raw data, the total amount of assembled bases has decreased by more
than 50% (~46,000,000 bp vs. ~102,000,000 bp), the same is true for the number of transcripts
(22,022 vs. 45,706), the average transcript length is also slightly lower (2075 bp vs. 2237 bp). I
find these numbers a bit troubling considering that the raw data of the old transcriptome was also
included in the new assembly. If the differences are due to changes in methodology (e.g.
assembly parameters), the authors should discuss their choices in the manuscript.
It is not clear why the authors have chosen to use CEGMA for the comparison of the new
assembly with previous Lygus transcriptomes, but employed BUSCO for the comparison with
other insect transcriptomes and genomes. Considering that CEGMA has been discontinued, I
suggest making all comparisons with BUSCO. This would certainly improve the readability of
Table 3.

A tblastx analysis against selected gene sets comprising genes encoding neuropeptides, G
protein-coupled receptors and chemosensory receptors found higher numbers of homologs in the
current assembly than in previous Lygus transcriptomes. This result is not surprising, considering
that the newly added sequence data derives mainly from sample pools of antennae and heads. In
my opinion, the approach is not suited to "evaluate the overall depth of the respective
assemblies". It does, however, show that the authors' sampling strategy was successful in
improving the coverage of genes involved in chemosensation.

Overall, I do think that the sequencing data presented in the current manuscript are a valuable
resource and warrant publication. However, the assembly metrics do not compare favorably with
the previously published transcriptome of Lygus hesperus. The authors should either explain the
differences and show in which ways the new assembly is an improvement or redo the assembly.
It might prove to be useful to also include the 454 sequencing data in a hybrid assembly.
Alternatively, maybe the approach of pooling all sequencing runs for a single assembly should
be dismissed in favor of tissue specific transcriptomes?

Level of interest
Please indicate how interesting you found the manuscript:
An article of importance in its field

Quality of written English
Please indicate the quality of language in the manuscript:
Acceptable


Declaration of competing interests
Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that holds
or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews:

Reviewer #1: The manuscript describes the development of a new de novo assembly of the Lygus hesperus transcriptome, an insect with agricultural importance, based on new sequencing of organ specific libraries, together with previous data from whole individuals under distinct biological conditions. The work clearly indicates the quality improvement of the newly assembled transcriptome. The only remark necessary refers to the NCBI Bioproject id number, which appears to include only one of the libraries. The additional accession id for the antennae (PRJNA210219) and the Accessory glands (PRJNA210220) should be included.

The filtered and annotated transcriptome assembly was deposited with GenBank as a TSA using the BioProject id number PRJNA284294. To address any confusion regarding availability of the associated data (eg. antennae, accessory glands, and heads), we have clarified by adding the following to pg 5 lines 127-129 – “NCBI accession identifiers for all of the associated SRA, Biosample, and Bioproject data repositories are listed in Table 1.”



Reviewer #2: In the current manuscript, the authors present a de novo transcriptome assembly for the Western tarnished plant bug, Lygus hesperus. The manuscript is well written and the new sequencing data should be useful for other researchers studying this pest species and related Hemiptera.

In addition to the raw data from their previous study, the authors have generated a large amount of new Illumina reads for the transcriptome assembly. This is certainly a highlight of the study and should be mentioned prominently in the abstract. Whereas, in its current form, the abstract reads as if the authors merely generated a new assembly from already published data. The manuscript should also make it more clear how much new data was produced specifically for this study and how much derives from the authors' previous sequencing effort. From Table 1, I conclude that 11 new Illumina runs were performed on 5 cDNA libraries from different tissues, which presumably resulted in ~293,000,000 bp of new raw sequencing data. This information should be provided in the text.

Thank you for the excellent suggestion. We have rewritten the abstract to better illustrate that the assembly includes 293 million bp of new data representing five tissue specific cDNA libraries comprising 11 Illumina sequencing runs.

Table 2 compares the new transcriptome assembly to previous Lygus transcriptomes. The current assembly is based on more than three times the amount of read pairs compared to the previous Illumina transcriptome (~438,000,000 vs. ~145,000,000). However, despite this significant improvement in terms of raw data, the total amount of assembled bases has decreased by more than 50% (~46,000,000 bp vs. ~102,000,000 bp), the same is true for the number of transcripts (22,022 vs. 45,706), the average transcript length is also slightly lower (2075 bp vs. 2237 bp). I find these numbers a bit troubling considering that the raw data of the old transcriptome was also included in the new assembly. If the differences are due to changes in methodology (e.g. assembly parameters), the authors should discuss their choices in the manuscript.

The differences between the previous assembly and the current assembly are largely the result of changes in the Trinity pipeline and inclusion of a modified normalization process that specifically removed low abundance isoforms. We have added new text (lines 140-149 on pgs 5-6) to the manuscript clarifying the observed variation. In addition, unigene specific metrics have been added to Table 2. Furthermore, BUSCO analyses (revised Table 3 and lines 155-157 pg 6) suggest that a certain percentage of the overall read space of the previous assembly was likely inflated by sequence duplications and fragmentation.


It is not clear why the authors have chosen to use CEGMA for the comparison of the new assembly with previous Lygus transcriptomes, but employed BUSCO for the comparison with other insect transcriptomes and genomes. Considering that CEGMA has been discontinued, I suggest making all comparisons with BUSCO. This would certainly improve the readability of Table 3.

We have removed the CEGMA analyses and re-analyzed the Lygus hesperus transcriptomes using BUSCO. The results of these analyses have been included in the revised Table 3 and discussed in lines 150-158.

A tblastx analysis against selected gene sets comprising genes encoding neuropeptides, G protein-coupled receptors and chemosensory receptors found higher numbers of homologs in the current assembly than in previous Lygus transcriptomes. This result is not surprising, considering that the newly added sequence data derives mainly from sample pools of antennae and heads. In my opinion, the approach is not suited to "evaluate the overall depth of the respective assemblies". It does, however, show that the authors' sampling strategy was successful in improving the coverage of genes involved in chemosensation.

We agree that these results are not surprising. However, our purpose was to demonstrate the applicability of the new assembly for specific gene sets, such as neuropeptide and chemosensory-related transcripts, which were not as well represented in the previous assembly. As the reviewer can well-appreciate, the generation of a comprehensive molecular resource for non-model organisms that lack sequenced genomes is an iterative process. To better clarify our purpose, this section of the manuscript (lines 159-234 of pg 6-7) has been extensively re-written.

Overall, I do think that the sequencing data presented in the current manuscript are a valuable resource and warrant publication. However, the assembly metrics do not compare favorably with the previously published transcriptome of Lygus hesperus. The authors should either explain the differences and show in which ways the new assembly is an improvement or redo the assembly. It might prove to be useful to also include the 454 sequencing data in a hybrid assembly. Alternatively, maybe the approach of pooling all sequencing runs for a single assembly should be dismissed in favor of tissue specific transcriptomes?

The differences in the metrics of the L. hesperus assemblies have been addressed in previous comments (see above). Initial efforts made to fold the 454 data into the Illumina assemblies had little to no effect on the robustness or quality of the assembly. We agree that there is a certain utility to having tissue specific assemblies, however, the availability of a single comprehensive reference transcriptome significantly simplifies the annotation process and makes it less cumbersome to identify potential genes of interest.


The reviewed version of the manuscript can be seen here:

All revised versions are also available:

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on December 03, 2015

The authors have addressed all concerns raised by the referees. While I agree with the authors' assessment that the reduced number of transcripts may (for the most part) be due to the removal of duplicates and low abundance isoforms, the data in the revised Table 3 also indicates a slight reduction in the number of hits against single-copy orthologs. However, the author's have shifted the focus of the manuscript from a general comparison of the overall depth of the assemblies to the improvements in coverage of specific sets of genes that were previously poorly represented. The authors clearly demonstrate these improvements and I recommend the manuscript for publication in GigaScience.

Level of interest
Please indicate how interesting you found the manuscript:
An article of importance in its field

Quality of written English
Please indicate the quality of language in the manuscript:
Acceptable


Declaration of competing interests
Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that holds
or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.


The reviewed version of the manuscript can be seen here:

All revised versions are also available:

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    E., T. E., M., G. S., Brian, H., A., F. J., S., B. C., Joe, H. J. 2016. De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus. GigaScience.