Content of review 1, reviewed on April 01, 2015

Major Compulsory Revisions 1. The abstract states - “To further aid accessibility, the tools, Galaxy and data are all provided pre-installed in a virtual machine that can be downloaded from the GigaDB repository.” If tools here refers to “Galaxy tool”s - as in the XML files that describes the Galaxy interface to these applications and how to build a command line for them - then indeed those are available in the virtual machine. But the binary of Matlab that is distributed with the VM is not licensed properly and will not run. This is not unexpected - but it means the VM can not be used to immediately rerun analyses, explore parameters, analyze new data, etc…. I believe this needs to be made more explicit in the paper. I think the VM does indeed provide a great deal of provenance and makes the data very accessible - but I think important aspects of reproducibility are lost when building on closed source and non-free applications. I feel like not addressing these points will leave many readers expecting the VM to be more useful and the whole pipeline more portable than in fact they actually are. To be frank, if this were a genomics article I would recommend not publishing a purely computational methods paper when large parts of the pipeline are non-free and closed source - limiting both the reproducibility and transparency of the pipeline. Realistically though my understanding is that this is quite common in metabolomics - and this paper does in fact advocate computational methods which are an advancement for the field. Galaxy and workflow platforms more generally are not the only way to do open, reproducible ‘omics analyses - there are lots of great emerging techniques - IPython/Jupyter, RStudio, methodologies taught in Software and Data Carpentry, etc…. My understanding is that none of these are common in metabolomics and it is fact data analysis is generally done in Windows GUIs using closed-source commercial and/or vendor toolkits. I would have indicated the paper was of more broad interest if there was at least one complete open source pipeline for data analysis (perhaps one leaning more heavily on XCMS). 2. There are real drawbacks to handling collections of data files and the way that you are doing here in early steps of these workflows. Dealing with paths directly break Galaxy abstractions for data security, quota tracking, metadata collection, advanced object stores (e.g. S3), remote computation, etc…. I think all of that is certainly fine if your computation is restricted to a virtual machine or a small server for a single lab. But I think this limitation should be spelled out and the limitations of Galaxy pushing you in this direction should be spelled out. The point was made in the paper that Galaxy is a potentially exciting platform for enabling multi-omics analyses. My entirely biased is that this is in fact the case, but breaking Galaxy abstractions this way limit the number of potential deployments that can use these tools and make that promise that little more out of reach. I think simply spelling out these limitation would address my concern regarding potential publication - but longer term the authors should consider exploring newer Galaxy features - in particular dataset collections [bit.ly/gcc2014workflows] - as a way to accomplish many of the same things in a more portable way. Minor Essential Revisions 3. I had several suggestions for the actual implementation and documentation on GitHub. Rather then spelling them out here I opened a pull request [https://github.com/Viant-Metabolomics/Galaxy-M/pull/1] correcting these. That pull request was merged by the authors so this point has already been addressed. 4. This - “we have implemented Wine software [24] (i.e. free and open source compatibility layer software application) and Python [25] to read metadata from the .RAW files” - feels awkwardly worded to me. I would suggest perhaps - “we implemented software using Wine [24] (i.e. free and open source compatibility layer software application) and Python [25] to read metadata from the .RAW files”. Discretionary Revisions 5. In response “We anticipate in the near future a Galaxy Toolshed that will include a wide range of tools and workflows for processing and analysing multiple types of metabolomics data” - I would strongly urge the authors to include their tools in the central Galaxy Tool Shed maintained at Penn State rather than starting a new Tool Shed. Maintaining a Galaxy Tool Shed is difficult and having tools scattered across many different dilutes the benefits of publishing tools in a Galaxy Tool Shed. Having all the tools in the central Tool Shed will make it easier for deployers to install these metabolomics tools, it will allow other developers to easily leverage the package definitions, and will allow the publication of workflows that span multiple ‘omics types. 6. Also when discussing the Toolshed, the authors should consider citing doi:10.1186/gb4161.

Level of interest:

An article of importance in its field

Quality of written English:

Acceptable

Statistical review:

No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests:

I am paid to develop Galaxy - a free and open source platform used by the authors of this paper.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/7048735118495094_comment.pdf)


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/1623648222162780_manuscript.pdf
All revised versions are also available:
Draft - http://www.gigasciencejournal.com/imedia/1623648222162780_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    L., D. R., M., W. R. J., Haoyu, L., Archana, S., R., V. M. 2016. Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. GigaScience.