Content of review 1, reviewed on December 22, 2014

The authors present a tool suite focused on working with whole genome NGS data in an assembly-free fashion. This tool suite utilizes de Bruijn graphs and bloom filters to keep time and memory requirements low. These tools have been integrated into Galaxy, using the GenOuest ToolShed as a distribution mechanism. This is a nice set of tools and I would like to see this article published, but I have provided some notes, comments, and suggestions below.


Testing was performed within an Ubuntu 12.10 virtual machine and a fresh checkout of the Galaxy platform (revision 15835:815d38c48a56).
Install Notes:
• Additional system packages installed using apt-get: build essentials, cmake, boost, doxygen
• Additional packages installed manually:
o gatb-core
• Some packages (e.g. TakeABreak) come with the libraries as part of their own packages, others do not (e.g. LoRDEC)
• The download pack for the gatb-core package has several versions, up to 1.0.5 as the latest, however, this version is not compatible with e.g. LoRDEC, and I had to install 1.0.4 instead.
• Had to manually change the default system shell (sh) to bash (from dash) to enable Mapsembler and TakeABreak to install (the end of these installation processes call shell scripts that make use of bash syntax, but which are incorrectly declared as standard-posix compatible, e.g. see the non-bash shebangs and the direct use of ‘sh ./scriptname’ for the packages in these tools – instead provide a bash shebang, and call ‘./scriptname’ directly).
• Each of the tools was installed as a group using the ‘colibread_package’ toolsuite.


I was not able to use or access the listed GenOuest Galaxy server mentioned in the manuscript (see later comments), but did have access to the GenOuest (GUGGO) Toolshed.


• Major Compulsory Revisions (which the author must respond to before a decision on publication can be reached)
1. ) gatb-core:
This is not a standard package that can be expected to exist on anyone’s machine.
The tools do not currently install via the toolshed, without a fair-deal of (command-line / sysadmin) effort. Each tool relies upon the gatb-core package, but this is not provided as e.g. a toolshed dependency for the tools. A gatb-core toolshed package should be made available so that these tools can be installed without having to e.g. manually add libs/includes/bins to /usr/local/. – In the very least, what is exactly required to get this all working should be laid out step-wise. Please make sure to mention the versions of this package that are compatible with these tools, as the latest (1.0.5) did not work.

2.) Additional installation issues:
2 tool’s dependencies will fail to install, unless if the system shell is Bash.

3.) I would really like to see this tool suite install using only the Galaxy admin interface, taking into account e.g. https://wiki.galaxyproject.org/PackageRecipes, but at least provide step-by-step instructions for those that will not be able to figure out manual installation of libraries, etc.


4.) I could not find or locate the “Graph of Sequence Viewer” (GSV) visualization tool. Where is it, how can it be installed to a local Galaxy instance?

5.) The manuscript states “Corresponding tools are installed on our production Galaxy instance [34, 35] allowing scientists to use Colib’read tools freely after registration on the GenOuest core facility [36].”. However, gaining access to this resource is problematic, requiring several rounds of registration (http://genoweb1.irisa.fr/AppliESIB/access/access-en.php), acceptance of terms that are only available in French, and finally: “Warning : a printed copy of the Terms and Conditions (pdf file here), signed by your director, must be sent by postal mail to the GenOuest core facility, in order to finish the account creation . The postal address of the platform is: Plate-forme GenOuest, INRIA Rennes Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes Cedex ).”
It does not seem that most scientists can realistically gain access the GenOuest Galaxy instance, and therefor have no real way of running these tools.


• Minor Essential Revisions (such as missing labels on figures, or the wrong use of a term, which the author can be trusted to correct)



6.) In the tool.xml files, be sure to put quotes around inputs that could contain spaces, such as filenames, for the command-line generation – also be sure quotes exist around any input that is entered by the user (e.g. read set name).

7.) The filetype/datatype “fastq.gz” is not a standard Galaxy datatype and is not installed with the Colib’read packages, so it’s datatype is determined to be equivalent to be the base ‘data’ allows any/all history items to be selected, instead of only the desired fasta/fastq files. Either this datatype needs to be defined in the package, or not specified as allowed input.




• Discretionary Revisions (which are recommendations for improvement but which the author can choose to ignore)

8.) Several of the tools (e.g. LoRDEC), have python scripts that do nothing other than reading the options passed in, and then simply call the underlying binary. These wrapper scripts could be removed and the commands passed directly to the underlying binaries.

9.) Within the wrapper scripts, it would be advisable to provide a list of command args to subproccess.Popen, instead of a string, in order to handle e.g. filenames with spaces, etc. See also: https://docs.python.org/2/library/subprocess.html#subprocess.Popen

10.) Some of the wrapper scripts (e.g. discoSnp.py) have a mixing of spaces and tabs for indentation – you should only use spaces, but must not mix.

11.) Instead of using Popen to call the system ‘cp’ command, use e.g. shutil.copy

12.) Some tools allow the user to specify the number of threads to use within the tool form – it might be worthwhile to look into and use $GALAXY_SLOTS for this.


13.) The input to Commet / output from Prepare commet tools should have their own datatype that is a subclass from text, (e.g. commet_read_set) to simplify selection for the end user – currently all files are allowed to be selected from the user’s history.


14.) Several examples of runtime and maximum RAM usage are reported, but the specs of hardware used in the benchmarking is not indicated.

15.) The Mapsembler tool only takes fasta files for input as reads, should fastq also be allowed?


16.) The ‘colibread_package’ probably should have been called ‘suite_colibread’ or similar, since by convention the Galaxy ToolShed uses package_ to refer to underlying dependency packages. See also: https://wiki.galaxyproject.org/Tools/BestPractices

17.) Example page/history/set of workflows showing these tools in action would be very helpful to many people, however, not as helpful if they are locked down behind the GenOuest server that most people will not have access to.


Daniel Blankenberg


Level of interest An article whose findings are important to those with closely related research interests
Quality of written English Acceptable
Statistical review No, the manuscript does not need to be seen by a statistician.
Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/8525833381635405_comment.pdf)


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/3563096631515222_manuscript.pdf
All revised versions are also available:
Draft - http://www.gigasciencejournal.com/imedia/3563096631515222_manuscript.pdf

Source

    © 2014 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on March 25, 2015

Starting with a Fresh Ubuntu 14.04 VM, I created a new Galaxy instance. Added myself as an admin user, configured the tool dependency directory and then added the the Genouest Toolshed.


I then installed these packages via apt-get:
build-essential
cmake
libboost-dev
zlib1g-dev


Then the colibread_’tool_suite’ was installed into this Galaxy instance. Things were much smoother this time, but there is still an issue with the dependencies.


Major Compulsory Reviews

1.) The GATB package is attempting to be installed system wide (it fails due to permissions in this test case). GATB can be installed to its managed install directory by changing the cmake line in the tool_dependencies.xml file to e.g.:
<action type="shell_command">cmake -DCMAKE_INSTALL_PREFIX:PATH="${INSTALL_DIR}" .</action>

Likely additional paths should then be available for export via the set_environment tags (the include and lib dirs, in addition to the PATH that is being set now.)

Unfortunately I was not able to find a github/bitbucket/similar repo with these Galaxy wrappers to submit pull requests with any changes.


2.) Subsequently, the LorDec package fails to compile (even with a fixed GATB package). For the LorDec package, you will want to use set_environment_for_install to make the GATB lib/include dirs available during LorDec compilation.


3.) Also note, that the Main toolshed (toolshed.g2.bx.psu.edu) has dependency packages for zlib and boost, so it could be possible to not require these to be installed manually. (See also e.g. https://wiki.galaxyproject.org/Admin/Config/ToolDependenciesList)


If a user has a copy of GATB already locally installed, everything seems to work correctly, but this is not a likely scenario. So for testing, be sure to start with a clean environment.

4.) I would really like to see this suite install completely via the UI as stated in the manuscript, and it looks like it is close, if the LorDec/GATB dependencies are fixed.


5.) Ensure that all Tool failures are being properly conveyed to Galaxy. Some Tools (e.g., discoSnp++, Mapsembler, etc) might fail due to a particular step missing binaries or a segmentation dump, and this failure isn’t being passed on to Galaxy currently.



Discretionary revisions.

6.) It would be easier for a novice Galaxy user to install this tools if they were located in the main public toolshed, as they wouldn’t need to add a new toolshed to an xml file – this would hold especially true for someone using e.g. Galaxy in the Cloud using CloudMan.

7.) I was able to register an account at the Geneoust Galaxy instance. It appears to require manual approval to be given and account, and there is no anonymous access. This likely limits the users that will be able to take advantage of this resource, and raises sustainability questions.

8.) It might be nice to have an e.g. github repository for the Galaxy specific tool info, in addition to the ToolShed, to facilitate fixes/enhancements to the Tools by the community. The Galaxy Team and IUC have been doing for several months now, with great success (https://github.com/galaxyproject/tools-devteam and https://github.com/galaxyproject/tools-iuc).

9.) Right now, the webpage listed under ‘Project home page’ (http://colibread.inria.fr/colibread-on-galaxy/), is very short and just has links to the genoust Galaxy server and the Genoust toolshed. Some more information on configuration, usage, and e.g. setting up Colib’read in Galaxy, including the GSV viewer, would be helpful to users.


10.) A Galaxy Page at the Genoust Galaxy instance containing Histories with examples of these Tools being used would be helpful. There was a library available that contains some sample data, but a Page with Histories and workflows would really help to show these tools in use.


Daniel Blankenberg

Level of interest An article whose findings are important to those with closely related research interests
Quality of written English Acceptable
Statistical review No, the manuscript does not need to be seen by a statistician.
Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1179499628175166_comment.pdf)


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/2040455091163541_manuscript.pdf
All revised versions are also available:
First revision - http://www.gigasciencejournal.com/imedia/2040455091163541_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

Content of review 3, reviewed on August 25, 2015

Using the same test vm as before, I was able to install the toolset from the main Galaxy ToolShed via the colibread_tool_suite repository using only a few clicks. The tools and their dependencies installed, compiled, and mostly ran successfully. The example Page, datasets, histories, and workflows are a great addition. The github repository and docker files are great.

The authors have addressed all of my previously raised concerns.




Discretionary Revisions

1.) The manuscript references the GUGGO Tool Shed (which is fine), however the tools are now available from the main galaxy toolshed (where I tested from – and available by default to all users) as well. This causes an issue of there being 2 different copies of the same tools that are not aliased as the same (functionality not available); this will cause an issue where e.g. workflows exported from instances using tools from the GUGGO toolshed will not work in Galaxy instances with tools installed from the main toolshed (and vice versa). It might be a good idea to suggest using one of the toolsheds, to allow greater interoperability between Galaxy instances.
2.) discoSnp is called discoSnp++ in the version installed from the main Galaxy toolshed, but the version installed in http://colibread.genouest.org/galaxy/ is just ‘discoSNP’, with a lower Galaxy wrapper version number.
3.) When running Mapsembler2, the end of the ‘out.txt’ dataset ends with e.g. “accepted extensions are .fa[.gz], .fasta[.gz], .fna[.gz] for fasta format and .fq[.gz], .fastq[.gz], .txt[.gz] for fastq format, wrong /home/dan/galaxy/galaxy-blankenberg/database/files/000/dataset_2.dat, exit
“. Should confirm that this step in the pipeline is working correctly with a file with a ‘.dat’ extension, and if not, use e.g. symlinks to create a temp file with a correctly suffixed filename.
4.) The ‘Prepare commet’ and ‘commet’ tools could be combined into one ‘commet’ tool pretty easily if the commet tool allowed multiple input files to be selected within a repeat (‘prepare commet’ interface) and the current output of ‘prepare commet’ was created as a configfile in the ‘commet’ tool.
5.) Tools that accept ‘fastq.gz’ in their format list will allow all ‘text’ files to be selected, since this format is not a standard Galaxy format and is not provided as proprietary datatype from the toolshed.
6.) Commet appears to require R, but R is not provided/specified as a dependency (it is available in the main toolshed), so the tool will always fail due to the .png files not being created.
7.) The LoRDEC tools, e.g. lordeccorrect were exiting with exit code of 1 and with empty green outputs (using inputs copied from supplied examples). Nothing was reported to stderr/stdout, but it is likely that there is an error/issue with the dependency as installed (maybe quietly missing some sub-dependency).

Daniel Blankenberg

Level of interest An article whose findings are important to those with closely related research interests
Quality of written English Acceptable
Statistical review No, the manuscript does not need to be seen by a statistician.
Declaration of competing interests I declare that I have no competing interests.

Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1179499628175166_comment.pdf)


The reviewed version of the manuscript can be seen here:
http://www.gigasciencejournal.com/imedia/1870102164175168_manuscript.pdf
All revised versions are also available:
Second revision - http://www.gigasciencejournal.com/imedia/1870102164175168_manuscript.pdf

Source

    © 2015 the Reviewer (CC BY 4.0 - source).

References

    Yvan, L. B., Olivier, C., Cyril, M., Vincent, L., Eric, R., Claire, L., Vincent, M., Gustavo, S., Camille, M., Bastien, C., Zine, E. A. A., Leena, S., Susete, A., Alexan, A., Raluca, U., Pierre, P. 2016. Colib'read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads. GigaScience.