Content of review 1, reviewed on October 08, 2015

The introduction reads well and makes a nice argument for why they conducted the research.

Overall the paper is potentially interesting although the specific research is well outside my field
of expertise. However from a data analysis point of view, in its current state the paper is far from
reproducible and is lacking some important supporting data in usable formats.
I have recomended it be accepted with only minor revisions as (in my opinion), the actual
manuscript doesn't require much work, but the supporting data and information does.

Minor points:
1 - The word "unigenes" appears to be used in various places to mean "unique genes", to
save confusion I would prefer this to be corrected to "unique genes" or "unique
transcripts" because the name "UniGene" is a well known database that has no relevance
to what the authors are referring to here.

2 - There are several minor grammatical mistakes that hopefully copy-edit will pick up. e.g.
line 163 " a clustering step were performed ", line 222 "The same situation also esits
between", line 384 "As expectation, most of the identified conopeptides (183) were
novel, ".... there are more like these and the grammer/spellings should be carefully
checked by an expert.

3 - while it is mentioned in the discussion I feel it would be appropriate to include the
rational for the selection of different sized organisms earlier, perhaps at the first mention
of it (around line 150). Also is there evidence that just the size of a cone snail is
indicative of its developmental stage? I would have thought the stages of development
are well documented (egg, planktonic larvae, sub-adult, adult) perhaps the relationship of
the toxins produced with size of the cone-shell has more to do with the sorts of prey the
animal can realistically hunt?! (again, sorry for showing my ignorance of the topic here!)

4 - the last two sentences (line 390-395) of the conclusion need to be reworded as it doesn't
read well.
5 - The local reference database they used is provided as "additional file 1", this is an Excel
file. It should be reformatted as Fasta and included in the GigaDB dataset.

6 - There is no discussion of sequencing depth/coverage. Have the authors found
all/most/some/afew of the possible genes of interest in each sample, a measure of
sequencing depth of genes with a better understood expression level may provide some
insight into this? (As I do not know anything about this organism I don't know if that is
even possible?)

7 - line 434, why was one dataset treated differently to the rest? i.e. assembled with
SOAPdenovo-Trans 1.02 instead of Trinity.

8 - The methods mention the use of a HMM method of identifying conopeptides, this section
is too brief and would be impossible to reproduce from the limited information provided,
and I think its technically incorrect. The authors state that the 6 datasets were grouped
into superfamilies in order to create the HMM profiles, which they then used to identify
the conotoxins within the 6 datasets!? I think they probably mean the reference database
was grouped and used to generate profile HMMs? I believe to make this useful to others
it would be nice to provide the alignment file used to generate the pHMM as well as the
actual pHMM file created.

9 - The results of the pHMM scans for each dataset should also be provided in GigaDB.

Major issue:
The major concern I have with this manuscript is the description of the samples and the matching
of available data to those samples, below are a series of related points that need to be addresses:

1 - In the methods section, line 399 it states "Eight specimens were collected..." they used 6
middle size specimen venom ducts to create an EST library, they used 1 middle sized
specimen to construct a normalised Illumina cDNA library. Then constructed four nonnormalized
Illumina cDNA libraries, using mRNAs from the venom ducts of three snails
as well as the venom bulb of one of those 3 snails.

By my maths thats used 10 specimens not 8, unless some specimens were used for multiple
sample preps? This information would be very important in the interpretation of the results and
should be included in this section as well as in the database(s) hosting the data (SRA and
GigaDB).

2 - The data is presented as 6 sets; 5 sequenced by NGS and deposited in SRA, the 6th was
the cDNA library sequenced on an ABI3730, which they have not deposited in SRA or
INSDC. The EST data from the sanger sequencing needs to be in the INSDC databases
(GenBank/ENA).


3 - The deposited sequences in the SRA are quoted by SRX accessions, personally I would
prefer the use of the sample accessions (SRS1009725 - SRS1009729), but the major issue
I have with the deposition is the fact that there are NO sample attributes included, and no
way of telling which sample accession is which sample as named in paper?

4 - In addition, there are only 5 samples worth of analysis results in the GigaDB staging area,
and currently no adequate description of the files, so again its not possible to associate
these data with the sample names as used in the manuscript.

5 - The Total conopeptide dataset (the 215 novel genes they say they have identified) also
needs to be available as Fasta format rather than Excel. In addition these annotated gene
sequences should be submitted to the INSDC (GenBank/ENA) as annotated genes
sequences, thereby allowing future NCBI-BLAST users to benefit from this work.

Level of interest
Please indicate how interesting you found the manuscript:
An article of importance in its field

Quality of written English
Please indicate the quality of language in the manuscript:
Needs some language corrections before being published

Declaration of competing interests

Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I am the Biocurator for GigaScience Database, hosting the data associated with this manuscript.
To the best of my knowledge I have reviewed this manuscript fairly and there is no personal (or
otherwise) gain from the outcome of this review process.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/3630358422010547_comment.pdf)


Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on January 28, 2016

In the replies my comments you mentioned that you have uploaded/updated several files on the GigaDB server, but I can see no changes that have been made to the GigaDB server. I think perhaps you have uploaded additional files to GigaScience journal instead? If so, those files will need to upload the GigaDB FTP server.

In particular the files for:

1 - the fasta file of the reference database of conopeptides

2 - alignment files uploaded as additional files 2 & 3

3 - the pHMM scan results

4 - the readme file with details of the file names to sample names

5 - fasta file of novel identified genes (Additional file 4)

I notice that the ABI3730 sequence data (i.e. the 11,000 reads) has not been deposited anywhere,
these data should be submitted to the NCBI or ENA as EST data, and linked to the same
BioProject accession (PRJNA290540) as the SRA sequence data (It is noted that the assembled
gene sequences from those reads have been deposited, but I believe making the reads available as
EST sequences will enable users to reproduce/confirm your assemblies as well as potentially
make use of those reads as indicative of relative expression levels).

Also with regards to deposited data, you should update the SRA sample information with some
metadata, including the name used for each sample in the manuscript (alias), tissue type,
collection date, geographic location etc...

Are the methods appropriate to the aims of the study, are they well described, and are
necessary controls included?
If not, please specify what is required in your comments to the authors.

Yes.


Are the conclusions adequately supported by the data shown?
If not, please explain in your comments to the authors.

Yes.

Does the manuscript adhere to the journal’s guidelines on
'http://www.gigasciencejournal.com/authors/instructions/minimum_standards_reporting'
minimum standards of reporting?
If not, please specify what is required in your comments to the authors.

Yes.

Are you able to assess all statistics in the manuscript, including the appropriateness of
statistical tests used?
(If an additional statistical review is recommended, please specify what aspects require further
assessment in your comments to the editors.)

There are no statistics in the manuscript.

Quality of written English
Please indicate the quality of language in the manuscript:

Acceptable .

Declaration of competing interests
Please complete a declaration of competing interests, considering the following questions:

1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?

2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?

3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?

4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?

5. Do you have any other financial competing interests?

6. Do you have any non-financial competing interests in relation to this paper?

If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.


I am a current employee of GigaScience. I have reviewed this manuscript with a particular
emphasis on the availability of supporting data. The subject matter review is already well
covered by the other referee's, and is beyond my field of expertise.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1426776608201054_comment.pdf)


Source

    © 2016 the Reviewer (CC BY 4.0 - source).

References

    Chao, P., Ge, Y., Bing-Miao, G., Chong-Xu, F., Chao, B., Jintu, W., Ying, C., Bo, W., Yabing, Z., Zhiqiang, R., Xiaofei, Z., Xinxin, Y., Jie, B., Jia, L., Zhilong, L., Shijie, Z., Xinhui, Z., Ying, Q., Jieming, C., L., C. S., Jiaan, Y., Ji-Sheng, C., Qiong, S. 2016. High-throughput identification of novel conotoxins from the Chinese tubular cone snail (Conus betulinus) by multi-transcriptome sequencing. GigaScience.