Journal

GigaScience

Official partner
About

GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database (GigaDB) that hosts all associated data, as well as provides data analysis tools through our GigaGalaxy server. Further promoting transparency in the review process, we have open review as standard for all our peer-reviewed papers.

Our scope covers not just 'omic' type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.

Published by
Review policy on Publons
  • Allows reviews to be published
  • Allows reviewers to display the title of the article they reviewed
Reviews

240

Interested in reviewing for this journal?
Editors on Publons
Top reviewers on Publons (Manuscripts reviewed in last 12 months)
Endorsed by

Reviews

  • I'll take this in two sections: comments about the web app and comments about the manuscript.

    Web App This is a nice web-app that attempts a difficult job. It is great that it is presented in multiple forms, including standalone and VMs.

    Often people do not like the idea that their email will be taken, even if it does mean that they can retrieve results later. A less intrusive mechanism is to provide a job id on submission which can be used to get at results, either through a form or a URL.

    The help mouse overs on the main page do not work.

    Manuscript The language describing what a reciprocal BLAST can achieve is inconsistent. Although it is clear later in the manuscript you are aware that a RB can only find putative orthologues, it should be clearly stated throughout. At some point you might want to clarify how orthologue detection requires strong phylogeny.

    The manuscript as written kind of makes a trap for itself. It tries to claim that RBH and this tool will help large scale analysis where only small scale analysis could be done before, but in doing so it invites us to consider RBH in a new (ish) mode - that of big data tool. And if we are to use it there, well, we need to know how well it does in that domain. In this case then this manuscript must do that large scale comparison and sadly, the manuscript is a bit lacking.

    Considering RBH as a method for big data then, I don't think the utility of the method and the tool is sufficiently well tested. A glaring omission is that there is no attempt to describe the error of the method, that is the number of false putative ortholgues or any attempt to develop a metric for the believability of the whole set of data. For a tool whose main selling point is that it can find broad patterns in large datasets, then it really needs some measure of how many of the putative orthologues it classifies are right or wrong. The experiment carried out is simply a run of the tool. As a minimum I'd hope that you'd compile a list of known orthologues curated carefully and manually (there are lots of databases with these, (orthologene.org, orthoMCL, compara @ EMBL) , then run your tool and assess how many of these you'd found. Until a true benchmark experiment is done, then the manuscript lacks a sufficient demonstration of the tool's utility.

    If we aren't supposed to consider this as a new big data tool and just a useful implementation of RBH, then the language and claims about being able to compare gross patterns across large phylogenetic distances need to be toned down in the manuscript.

    Published in
    Reviewed by
    Ongoing discussion
  • This manuscript analyses the fungal community of different stages of spontaneous fermentation through two approaches, ITS-phylotyping and Shotgun sequencing. I found thee manuscript very interesting. It is the first study to my knowledge that uses shotgun approach in wine environment so I do believe it is very innovative. However, my main criticism is that after all the work done and (money invested), the "application " section and the" abstract", just highlight that shotgun was able to uncover an amplicon bias towards Metschnikowia spp. So i do believe the discussion/conclusion (an abstract) needs to be expanded and several inserting results that could be taken out from the study should be included. I would have expected to see an extended discussion/conclusion about how this methodology was able to reach to a higher taxonomic depth for certain taxa, even able to detect some strains, and some bacteria too. The fact that around 85% of the reads mapped to a wine related microorganism reference genome is something to be definitely highlighted as a conclusion. No deep discussion is made about the community dissimilarities found among samples with the two approaches ( e.g comparing the samples clustering obtained in figure 1c and figure 2c). Another drawback is that the work does not include any metadata at all regarding the must/vineyard characteristics, and thus it lacks a discussion about why first stages of must and ferments coming from different wineries are that different

    I am a bit confused about certain sections of the paper that I think should be further explained. And also certain sections from the discussion should go into material section. For example, I think that authors should explain how the "Control populations" were maid and this should be included in the methods section and well indicated with a header ( e.g in line 323)

    The analysis and statistical approaches are in overall sound. However, I found some figures conclusions to be difficult to follow (In particular the conclusions get in figure 2c are misleading for me, line 278-283)

    Comments/Corrections

    -Line 30 and line 49-"To the map" change it by " To map" - line 91: what is "with several amplicon based methods" referring to? That they use different primers, different sequencers…? I would say " several amplicon based studies have being conducted". - Line 118 : "D0 ( at inoculation)"?? But this is spontaneous fermentation! - Line 119- "As seen with the selective plating…" here it is my understanding that you are talking about previous culture dependent works? This sentence needs a reference. -line 119-122- which data are you referring to for this statement?is it figure 1??? You need to add it -Suggestion: move 122-124 to 118 (as it refers to the fermentation characteristics, then talk about microbes) -line 129-132 I would suggest this to be part of methods, not in this section -line 139 "all the of otus" change for "all the otus" -Line 139 : why 78 samples ? but there are 66 -line 171 "figure (1B)" should be "figure (1C)" -line 192, I would say move this lines above, after line174 to link it with the PCOA -Line 194- there is not vintage information in table 1 - line 211- I am curious to know how many of the reads aligned to vitis and were discarded? you could include the values in table 1 -line 216: having only 15% of reads unable to align is really good result, it seems like a "too" good results. Does it mean that the wine environment microbes are very well characterized/(we have the genome available for most of the wine environment microbes…?) or is it that when you mapped your reads with the genome references you were not very restrictive on the aligment? (only q10?) I would say this is something that might be interesting to discuss in the paper line 236- authors have identified several Prokaryotes. I suggest you compare those taxa identified by shotgun with what is already known in other studies that have used 16s RNA phylotyping. does it fit with the most abundant bacteria community found in other ferment samples? -Line 248: why not taking this reads and for example blast them to see whether they are from mitochondria? -line 253-260; this paragraph seem to be referring to Fig 2A, but in line 260 authors talk about other microorganisms rather than the ones in Fig 2a. An the results from the microbes specified in fig2a are not discussed line 278-280: Figure 2C. "D1 samples (T1,t2 and Y3) largely differentiated …" I can see T1 and T2 as a differentiated group, but Y3 does not seem to me that different from Y1d1? I guess I am not interpreting the ordination plot correctly…. ( Any way, it would have been nice to do a PCOA with ItS data and another PCOA with shotgun data with just the samples that are in common, in order to see if the clustering of the groupings is similar with both methods ( as it seems to be), maybe as supplementary table? Line 287: 23 taxonomic identifiers. Are authors comparing the shotgun data that were mapped with the reference genomes?( or also the ones with metaphlan?) are there only 23 taxa in common in both method? -line 323- I understand this refers to the control samples I am confused about this one, as this is included under the header "ferment samples"? -line 393 -how much of the total abundane do this 30 otus make? Why just doing it with this 30? -Line 396 - change "and two winery samples " with "from two winery samples" - line 437: in methods nothing is said about how did you compare both ITS and shotgun sequencing results

    Figures and tables

    FIG1c: despite the colors, I would suggest authors to write the whole name ( e.g T1D0,T1D1…) to refer to the samples rather than just referring to them by the color and fermentation stage. ( same for figure 2c) Fig 1b:. include species name in the figure -Table s3 some of the species do not have the accession numbers, they have a blank cell -table 3: In the legend authors need to explain that " alignment" means aligning to a reference genome -table 4: I am confused about the spacer, what is the dot referring to?( to any nucleotide ( in that case better specify it as "n" or the lack of a nucleotide?

    Published in
    Ongoing discussion
  • RecBlast provides a tool to find orthologs across genomes. I think that, in general, the manuscript misses information on benchmarking and application. How does this method compare in time and accuracy to other methods that attempt to find orthologs? I tried to install the standalone software and notice that the dependency of the "mygene" package should be added as an additional dependency. "seaborn" is also needed for the script to run and is not a standard Python package. It doesn't seem like the script is able to take in a list of genes in FASTA format and find orthologs. This seems to be a major limitation of the script. I tried to download the results from the web page to see what the output looks like, but I was denied access, so I'm unclear of how useful the results may be. One way to improve the utility of the script would be to include output that contains concatenated proteins from orthologs that could be used in phylogenetics. In that way, trees could be created from orthologs from distantly related taxa.

    Specific comments:

    L39: define what you mean by "the evolutionary tree". L4-L29: The description of the algorithm is confusing to me. Sometimes the authors use "organism", "target organism", "target species", "original organism". I would change the text to clarify what you are comparing. L14: Clarify what you mean by "matches a protein". What criteria is this based on? L19: I would change "cater for" to "cater to" L34: I'm not clear what you mean by "any computational background" L45: I would list the github address here. I see that it's in the supplemental info as well

    Figure 2a. Does this mean the number of human orthologs between these different species? Do these results make sense? Figure 2b. Does this clustering make sense from a biological perspective?

    Published in
    Reviewed by
    Ongoing discussion
  • In the article "fastBMA: Scalable Network Inference and Transitive Reduction", the authors present an improved tool, fastBMA, as an extension of their prior work on the inference of genetic networks. Most comments below are related to the article text and writing style, rather than any major concerns related to the scientific results. However, there should be significant adjustment to the text to improve the scientific clarity of the findings (ie, not all figures in the article are referenced in the text, article does not follow style guidelines, etc).

    1. For technical notes, the article sections should include only "Findings" and "Methods", which can then be broken down into subsections. While this has been done for a portion of the article, the article flow could be improved significantly to increase the clarity of the scientific content. Conclusions should also be moved into a subheading of "Findings", instead of falling after "Methods". Results should also be integrated into the "Findings" section.
    2. Would recommend placing "Related Work" in the background and integrating "Our Contributions", rather than including this as a separate section.
    3. Several references to the speed of fastBMA are made in the Background/Contributes/Related Work sections, without any supporting evidence or figures in those sections.
      • Second paragraph of "Our Contributions" in 2 locations
      • In "Estimating model posterior probabilities" and others, should indicate/explain what is meant by "faster C++ code" for fastBMA -- do the other applications use a different language? Less performant algorithms?
    4. The implementation methods of fastBMA are also described in the "Our Contributions" section, prior to "Related Work"
    5. Methods are written more like results (ie, "Algorithmic outline..." discusses the performance enhancements rather than just the approach) and discussion sections instead of being used as an explanation of implementation details and data sets
      • "Replacing the hash table" has similar issues, and also discusses "crashing a 56 GB machine" with minimal explanation (possibly out of memory? unclear how large of a dataset for this to occur).
      • Most of the "Replacing the hash table" section appears to reference ScanBMA rather than fastBMA -- would focus on methods of fastBMA and how this improves on the prior work in the findings, instead of going into in-depth explanations in the methods
      • The end of this section states that fastBMA is much faster than using a full hash table, but no supporting data are provided (only a description of the approach)
    6. Figure 3 is never referenced in the text
    7. The text in the section "Transitive reduction to eliminate redundant edges" is not entirely clear. While the purpose is in the title, the text does not necessarily support the title, nor offer any evidence (figures, data) to support the conclusions in the section
    8. While the fastBMA results in Fig 4B cannot all be compared to ScanBMA since runs with equivalent data were not possible, the statement that all fastBMA lines are to the left of ScanBMA should be better explained in the text, as the larger fastBMA data with (without priors) takes as long or longer than ScanBMA (agree these cannot be compared, but the text does not explain this as currently written). This may be clarified by splitting references to Fig 4A and 4B in the text, rather than only referencing "Figure 4". May also want to explain why running with priors takes substantially less time than running with priors on fastBMA.
    9. More background on what informative priors were used from external data sets may be of benefit
    10. For the 32 core cluster, was this multiple machines totaling 32 cores? Or a single 32 core node?
    11. Some discussion as to why the AUC is better in Fig 4A for fastBMA 8 core compared to fastBMA 1 core would be warranted
    12. The OR parameter used for fastBMA in Figure 5 should be stated, to better compare results from the AUC and Precision-Recall curves
    13. Can reduce the number of times links to the software in the article are referenced (ie, the Docker images are noted in the abstract, contributes, and conclusion)
    14. For DREAM4 data set, both 10-gene and 100-gene data are referenced in the "Datasets" section, but not indicated which was used in the results/figures
    15. A prior ScanBMA article appears to have used all 3556 variables in the Yeast data set (http://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-8-47) -- any reason that ScanBMA was only run with 100 variables+prior here, instead of including the 3556 without prior?
    16. Explanation of the software environment setup and its impact on performance/run time should be included -- were all tools installed on a single virtual machine? Running the same OS? Were they run within Docker containers? Any potential performance changes due to the use of shared/virtual hardware? Were the applications run a single time, or were they run multiple times to determine if there was any variability between runs based on potential storage/network capacity within the shared environment? Were data sets stored locally on within the instance?

    Published in
    Reviewed by
    Ongoing discussion
  • Some comments below: In the "Classes of multiplicity for analyses of track suites" section, authors mentions figure3a, 3b ... instead of 4a, 4b, ... Moreover, on the Figure 4, there is no a), b), c) or d). Testing https://hyperbrowser.uio.no/hb/#!mode=basic (warning, as I have had very low internet connection, issues mentionned here are maybe due to this technical limitation and not to a Galaxy instance problem): In the basic mode section / - click here to load a sample track with Multiple Sclerosis-associated regions, expanded 10kb in both directions (you will be automatically redirected back to this page if you choose this option); Not automatically redirected to the mentionned page, but on the home page Same with 2b step - click here to load a sample GSuite of DNaseI accessibility for different cell types (you will be automatically redirected back to this page if you choose this option); *As you are using "customhtml" output to propose to the user to export raw data table, it seems that user can't easily create a workflow using this output table. Maybe it can be of interest to propose directly on the tool formular an option for a "classical" text export.

    Published in
    Reviewed by
    Ongoing discussion
  • The authors of the manuscript titled: "fastBMA: Scalable Network Inference and Transitive Reduction" have developed a fast and scalable gene regulatory network reconstruction algorithm which is a faster and more accurate version of their previous algorithm scanBMA. It also features a network post-processing method based on transitive reduction of graphs. Below are my comments on this manuscript.

    In general the manuscript is relevant to current research, especially in the field of systems biology and biostatistics. It is well written and clearly understandable. It is a welcome addition to the arsenal of scalable algorithms for gene regulatory network inference. However, I think the paper can be improved significantly by addressing the following comments.

    1) The authors claim that the transitive reduction based network post-processing method is a novel and important feature of their algorithm. Firstly, very similar techniques were previously used in many papers, some of which were cited by the authors in their manuscript. Therefore, I do not think it is appropriate to call it novel. Secondly, in the benchmarking studies, the transitive reduction method did not seem to improve the accuracy of the networks inferred by the fastBMA algorithm. If it does not improve the performance of fastBMA then why is it being packaged together with fastBMA and being presented as an important feature of the fastBMA algorithm?

    2) In the "Background" section (under "Findings") the authors cited many relevant research papers. However, in the regression based methods category the authors mostly cited their own work. I thinks the authors should cite other similar works in the same category, e.g. doi:10.1038/srep37140, http://dx.doi.org/10.1039/C4MB00053F,https://doi.org/10.1093/bioinformatics/bti487.

    3) It seems that the underlying principles of the fastBMA algorithm is written under the heading "Related work". This is confusing since "related work" typically refers to similar work by other researchers.

    4) The authors claimed that their algorithm can incorporate prior knowledge of the network topology in the inference process. In the benchmarking studies they have shown how prior knowledge improve the performance of their algorithm. However, I did not find a description of how prior knowledge is incorporated in the core algorithm. A brief description of this process will help readers understand the algorithm in its entirety.

    5) The benchmarking studies performed in this manuscript are not convincing. The authors did not compare the performance of their algorithm with some of the most well known methods such as GENIE3 (http://dx.doi.org/10.1371/journal.pone.0012776 and JUMP3), JUMP3 (10.1093/bioinformatics/btu863) which were shown to be significantly superior to algorithms such as ARACNE, MRNET, CLR etc. which were used to compare the performance of scanBMA whose performance was compared with the fastBMA algorithm in this manuscript. To gain a better understanding of where their algorithm stands in terms of accuracy, compared to the current state of the art, they should compare the performances of their algorithm with the current top performers.

    6) The authors did not properly discuss the weaknesses of their algorithm, for instance in which scenarios their algorithm is not expected to perform well?

    Published in
    Reviewed by
    Ongoing discussion
  • Published in
    Ongoing discussion
  • Dear authors,

    I am OK with the very last version of your paper now as a note.

    I still apologize that the genome is badly available. We can of course download the data, but biologists will need a database to browse, search for annotated genes and so on and so forth. And again, i5K@NAL provides very easily this service. It's up to you to follow or not this advice, but integratinfg in i5k database will give more visibility to your genome.

    Table 1 indicates differences in the two genomes that are mainly due, to my opinion, to the different approaches used for sequencing, assembling and annotating the genomes.

    Now that those two strains have been sequenced, this opens new tracks for further studies on population genomics and phylogeograpghy, which is great.

    All the best

    Published in
    Reviewed by
    Ongoing discussion
  • The authors address all my previous points correctly. It is a good contribution for biomedical field and use of new and trend platforms for medical imaging.

    Level of interest
    Please indicate how interesting you found the manuscript:
    An article of importance in its field

    Quality of written English
    Please indicate the quality of language in the manuscript:
    Acceptable

    Declaration of competing interests
    Please complete a declaration of competing interests, considering the following questions:
    Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future?
    Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future?
    Do you hold or are you currently applying for any patents relating to the content of the manuscript?
    Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript?
    Do you have any other financial competing interests?
    Do you have any non-financial competing interests in relation to this paper?
    If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below.

    I declare that I have no competing interests.

    I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
    I agree to the open peer review policy of the journal.

    To further support our reviewers, we have joined with Publons, where you can gain additional credit to further highlight your hard work (see: https://publons.com/journal/530/gigascience). On publication of this paper, your review will be automatically added to Publons, you can then choose whether or not to claim your Publons credit. I understand this statement.
    Yes


    Published in
    Reviewed by
    Ongoing discussion