Content of review 1, reviewed on February 25, 2015

Major Compulsory Revisions

None

Minor Essential Revisions

Under the section 'Availability of supporting data', there are three problems that need fixing: 1) the article states that the only data included in the AMI and test website is the lab strain however, the clinical and fluorescent strains are also included - the article should be changed to reflect the true availability of the data. 2) the article states that two published histories are included in the AMI and test website - this is true however, attempting to access the data or outputs of those histories produces errors of the form "File Not Found (/mnt/galaxy/galaxy-dist/database/files/000/dataset_184.dat)". This happens in both AMI and Viramp.com. If possible, this should be fixed, if not then the situation should be more clearly described in the article. 3) Viramp.com has an extra dataset available that is very small and aimed at allowing users to test the system on very limited resources. It would be good if this were added to the AMI and perhaps mentioned briefly in the article - perhaps a couple of sentences in the 'availability' section.

4)In section 9 of the pipeline description, it is stated that "Galaxy also offers a downloadable version that allows the user to alter the existing tools and plugins, and to run this on their local server". It is not clear if this is simply referring to Galaxy's general availability or whether it is implying that the VirAmp pipeline is in a Galaxy Toolshed somewhere. This should be clarified, and if the latter is true, the location of that toolshed should be specified (I cannot find the tools in the main https://toolshed.g2.bx.psu.edu toolshed).

Using the virtual machine outside of Amazon AWS. There are 2 problems with doing this: 5) Amazon only allows download of a machine image if it was originally uploaded to AWS - any images that are initiated purely within AWS can not be downloaded. The shared machine image was created within AWS and therefore cannot be exported from AWS. 6)The AMI comes with 2 volumes attached - one that contains the OS etc and one that contains the Galaxy files and associated Python environment. This makes it hard to export the image from AWS because that can only be done on single volume instances. It would be possible for the user to work around this but it would be better if the AMI was altered to use only one volume. 7) GigaScience's policy is to have a snapshot of the virtual machine stored within GigaDB. This could be achieved by addressing points 5 and 6 and exporting an image from AWS but an alternative would be to recreate the machine image in a non-cloud based system e.g. VirtualBox, and then transfer that direct to the GigaScience repository. If this is done then there is less need for the user to be able to export from AWS and points 5 and 6 become discretionary.

Discretionary Revisions

As mentioned above, if point 7 is fulfilled then point 5 and 6 become discretionary. Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests.

Authors' response: (http://www.gigasciencejournal.com/imedia/1872046444164715_comment.pdf)

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on March 25, 2015

The authors have answered all my earlier concerns and I will be recommending that this paper be accepted.

That said, I have 2 minor suggestions for making the link between publication and Github and Galaxy more obvious and 1 broad comment to make regarding the situation with the VM only being available in the cloud. This latter point needs no action - the others have made their work highly accessible, reusable and reproducible - but I would like to take advantage of the Open Review system to make public my views on VMs, especially as they augment the authors' views.

Thus, a minor revision that can be considered essential for the publication, discretionary for GitHub:

1) GitHub allows users to download specific revisions or 'commits'. It would be helpful if the specifying code was placed in the availability section of the paper AND in the GitHub readme file - e.g.

Paper: VirAmp project is available via GitHub at https://github.com/ywan/viramp-project The specific commit SHA at the time of publication is 73a443970baa0a3d2cace953c06e4d2d628aa116

GitHub Readme: This repository has been published in GigaScience (link) To access the repository as at time of publication, use commit SHA 73a443970baa0a3d2cace953c06e4d2d628aa116

One discretionary revision relating to Galaxy workflows and Github:

2) The Galaxy workflows that have been created and stored in the AMI should be exported and added to the GitHub repository. In recommending this, I'm echoing Galaxy's own John Chilton who has recommended this in past reviews of other projects.

And below is my general comment on use cases for standalone VMs - not needing any action.

3) The authors have argued that creating a standalone VM from scratch would be time consuming and significantly delay publication. As they provide a publicly available test website, VM in the cloud (AMI) and GitHub repository, I feel that they are adequately doing their bit for reproducibility and reusability and the paper should be published without delay. However, I would like to use this Open Peer Review forum to air and record the following considerations:

a) If the authors are put off from using their GitHub to build a new standalone VM, this does suggest that other people might be too.

b) The authors cannot see an occasion where someone would prefer a standalone VM to a cloud based AMI. There are several such cases: i) Some people may not have access to Amazon AWS (countries such as China and India are consistently in the news regarding their national firewalls - systems like Amazon AWS can become unreachable overnight ) ii) Some people may not have access to credit facilities to use Amazon AWS (the cost is cheap but it does require sophisticated payment systems) iii) Some people may be morally opposed to a specific Cloud service provider - e.g. Amazon has been getting a lot of negative press due to perceived issues with employment conditions and there is a strong boycott movement in some spheres iv) Some people may not have stable internet for interacting with a server in the cloud or using its web interface - although such people would probably also have difficulty obtaining a copy of several Gb of standalone VM too, but then USB drives are easy to ship.

As I say, the authors have gone out of their way to make this work Open and reproducible, and on such terms this is an exemplary work. Level of interest An article of importance in its field Quality of written English Acceptable Statistical review No, the manuscript does not need to be seen by a statistician. Declaration of competing interests I declare that I have no competing interests

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Yinan, W., W., R. D., Istvan, A., L., S. M. 2015. VirAmp: a galaxy-based viral genome assembly pipeline. GigaScience.