Content of review 1, reviewed on December 18, 2018

This manuscript proposes cwl-metrics, a way to collect metrics from Common Workflow Language executions of bioinformatics tools using Docker containers.

The manuscript is submitted as a Technical Note, but the research is of such a high quality that this could even be a Research article had it also provided a broader Background and a Discussion with comparison of workflow metrics systems beyond CWL.

The authors have been diligent in reproducibility and been good practitioners of Open Science, recording rich details of their evaluations and providing installation scripts for not just the software but also the evaluation setup. Some small issues remain to make this truly reproducible.

The English language of the manuscript is however not of a good quality for publication, to a degree where this can be confusing. Knowing the authors work from the CWL community I have provided feedback within the annotated manuscript (attached PDF) using ISO5776 text proof notation.

See the detailed review PDF for further comments according to GigaScience review guidelines, in particular the reproducibility section.

My detailed review is also web-accessible at the (secret) URL https://gist.github.com/stain/30e49363238d5a35e26f9fb1a31ebf8e (and below on Publons)

Minor Revisions required:

  • English Language needs to be revised
  • Missing DOIs for citations
  • Missing DOIs for code/workflows, not (just) GitHub links
  • License on CWL workflows/tools
  • Fix Reproducibility issue in Notebook (missing files)
  • License and source/upstream attribution for quay.io docker images

Other suggestions in this review are recommended, but not required.

Detailed review

Review: GigaScience submission GIGA-D-18-00427

http://creativecommons.org/licenses/by/4.0/"\ alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" />
This review is licensed under a http://creativecommons.org/licenses/by/4.0/"Creative> Commons Attribution 4.0 International License.

Summary and review comments

Hi, I am Stian Soiland-Reyes http://orcid.org/0000-0001-9842-9718 and have pledged the Open Peer Review Oath https://doi.org/10.12688/f1000research.5686.2:

  • Principle 1: I will sign my name to my review
  • Principle 2: I will review with integrity
  • Principle 3: I will treat the review as a discourse with you; in particular, I will provide constructive criticism
  • Principle 4: I will be an ambassador for the practice of open science

This review is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/ and is also available at the secret URL https://gist.github.com/stain/30e49363238d5a35e26f9fb1a31ebf8e

This manuscript proposes cwl-metrics, a way to collect metrics from Common Workflow Language executions of bioinformatics tools using Docker containers.

The manuscript is submitted as a Technical Note, but the research is of such a high quality that this could even be a Research article had it also provided a broader Background and a Discussion with comparison of workflow metrics systems beyond CWL.

The authors have been diligent in reproducibility and been good practitioners of Open Science, recording rich details of their evaluations and providing installation scripts for not just the software but also the evaluation setup.

The English language of the manuscript is however not of a good quality for publication, to a degree where this can be confusing. Knowing the authors work from the CWL community I have provided feedback within the annotated manuscript attached using ISO5776 text proof notation.

See reproducibility notes at the end of this review.

Evaluation

1. Is the rationale for collecting and analyzing the data well defined?

Is the work carried out on a dataset that can be described as "large-scale" within the context of its field? Does it clearly describe the dataset and provide sufficient context for the reader to understand its potential uses? Does it properly describe previous work?

The workflows are using reference chromosome GRCh38 as stated in manuscript, but in manuscript detailed RRID or download links are not provided, just generic citations of UCSC Genome Browser and GenCode.

2. Is it clear how data was collected and curated?

Credit should be given for transparency and provision of all supporting information.

The authors describe in detail how sequences were selected and why they made those selections for performance evaluations.

While the authors have tested a wide range of tools for equivalent workflows, it is not clear how those tools were chosen. A claim is made that BowTie is "popular" yet "outdated", while this may pragmatically be true amongst practitioners, this is an unfounded claim that either needs evidence/citations or be removed from the manuscript.

3. Is it clear - and was a statement provided - on how data and analyses tools used in the study can be accessed?

While we make every effort to make sure this information is available, we appreciate reviewers providing an extra eye to make absolutely certain that this information is clearly stated and properly available. Data availability and access to tools are essential for reproducibility and provide the best means for reuse.

The authors have been diligent and provided data, software, workflows and documentation using GitHub and FigShare. I have inspected these and verified they are executing as intended.

Persistent versions/commit have however not been provided for the new software/workflows created, URLs go to the "master" branch and so are subject to change. I would recommend GitHub Releases with corresponding Zenodo DOIs - see https://guides.github.com/activities/citable-code/ making one DOI for each of:

The corresponding Zenodo citation would then also give proper credit to other contributors to those Git repositories as shown in GitHub (as far as I can tell: Tazro Inutano, Ohta, Tomoya Tanjo, yyabaki, Michael Crusoe).

The Jupyter Notebook, responsible for the figures in the manuscript, has some issue with reproducibility due to missing files, see comments further down.

4. Are accession numbers given or links provided for data that, as a standard, should be submitted to a community approved public repository?

Following community standards for data sharing is a requirement of the journal. Additionally, data sharing in the broadest possible manner expands the ways in which data and tools can be accessed and used.

No accession numbers needed as no new genetic data is provided. Result data (Metrics) are shared using Figshare and GitHub.

Workflow execution results are not currently shared. These are not essential for this manuscript, but as the workflows are performing genuine and comparable bioinformatics analytics this would be useful to preserve/share, for instance as a Research Object using "--provenance" or simply the result folders as generated by the author's "run-cwl" tool

The test scripts download reference data and indexes like https://s3.amazonaws.com/nig-reference/GRCh38/bowtie2_index/bowtie2_GRCh38.tar.gz and - these are however not attributed directly in the manuscript nor in the README of the workflow - only from within the test script.

5. Is the data and software available in the public domain under a Creative Commons license?

Note, that unless otherwise stated, data hosted in our database (GigaDB) is available under a CC0 waiver. Additionally, did the authors indicate where the software tools and relevant source code are available, under an appropriate Open Source Initiative compliant license? If the source code is currently not in a hosted repository, we can help authors copy it over to a GigaScience GitHub repository .

Runtime metrics where provided as TSV file https://doi.org/10.6084/m9.figshare.7222775.v1 under CC-BY - but as commented several files are missing here for the Jupyter Notebook.

Docker images are provided in https://quay.io/user/inutano for those that did not already exist in BioContainers, and the workflows are diligently using tagged versions of the images, which improves reproducibility. Although these images seem to be built from GitHub, no repository documentation is provided (e.g. https://quay.io/repository/inutano/run-dmc ) as to what is the software, its (hopefully open source?) license or origin.

Workflows are provided in https://github.com/pitagora-galaxy/cwl but does NOT declare a LICENSE, neither in the repository or in the cwl files. It is unclear if all of the tools/ CWL files are written by the GitHub contributors or if these are sourced from elsewhere as they also lack license and attribution.

See https://www.commonwl.org/user_guide/rec-practices/ for how to add license and attribution to cwl files.

General practice in CWL community is to use Apache License 2.0 for workflows and tools - unless files are re-used from elsewhere under other licenses, my recommendation would be to stick with that license.

The repository https://github.com/pitagora-galaxy/cwl/ should have a better name than "cwl" as this name is too generic when checking out or cloning the repository. Similarly confusing here is perhaps the link to "Galaxy" - are these workflows based on orgiinal Galaxy workflows that should be attributed?

The "cwl" workflow repository is well organized, but the README files assume that workflows will be run with the test/bin/run-cwl script. However this script performs a second download of the repository, which is unecessary and prevents reuse/repurpose.

It would be beneficial for the README files to declare how the workflow files can be executed manually without the help of the run-cwl script.

6. Are the data sound and well controlled?

If you feel that inappropriate controls have been used please say so, indicating the reasons for your concerns, and suggesting alternative controls where appropriate. If you feel that further experimental/clinical evidence is required for obtaining solid biological conclusions and substantiating the results, please provide details.

Yes, the authors have used repeated runs on multiple cloud instance types.

7. Is the interpretation (Analysis and Discussion) well balanced and supported by the data?

The interpretation should discuss the relevance of all the results in an unbiased manner. Are the interpretations overly positive or negative? Note that the authors may include opinions and speculations in an optional 'Potential Implications' section of the manuscript; thus, if there is material in other parts of the manuscript that you feel would be better suited in such a section, please state that. Conclusions drawn from the study should be valid and result directly from the data shown, with reference to other relevant work as applicable. Have the authors provided references wherever necessary?

The interpretations are sound from the experiments performed. I would like to see some discussion on how cwl-metrics could (or should not) be used by the more production-ready and large-scale CWL implementation like toil or Arvados, in particular executed over several worker cloud instances. Executions outside Docker (e.g. with BioConda or Singularity) would similarly also not be captured.

One valid explanation might be that cwl-metrics focus on metrics of the individual tool execution rather than the workflow system overhead, and as such it makes sense for the tools to execute in isolation (which is the default in cwltool), as such findings will give "best case" metrics for tool comparison, rather than more noisy and variable metrics if a cloud instance is executing multiple tools in paralell.

8. Are the methods appropriate, well described, and include sufficient details and supporting information to allow others to evaluate and replicate the work?

Please remark on the suitability of the methods for the study.

If statistical analyses have been carried out, please indicate if you feel they need to be assessed specifically by an additional reviewer with statistical expertise.

Yes, the authors have been truly diligent, in examining multiple tool combinations over multiple cloud instance types with multiple sequence lengths.

Rather than deep statistical analysis for comparing tools the results are shown mainly graphically with error bars. As explained in the paper, many times clear correlation with instance types cannot be found, yet different tools show different characteristics in how they can take advantage of the instance types or not.

All the workflows are provided, and good attention to detail have been taken such as listing the current cloud instance machine type hardware characterics.

9. What are the strengths and weaknesses of the methods?

Please comment on any improvements that could be made to the study design to enhance the quality of the results. If any additional experiments are required, please give details. If novel experimental techniques were used please pay special attention to their reliability and validity.

The proposed cwl-metrics system is a strong mechanism to evaluate and compare individual tools of workflows. However the authors do not expand on how other metrics could also be important. For instance, data transfer is not measured, yet the different tools have varying size reference data (1.2GB vs 6.9 GB), and for cloud computation data transfer can also come with a cost if not carefully managed.

The overall efficency of the workflow system is not measured - it would be interesting to hear the authors thoughts on how that could be fitted alongside the cwl-metrics method. This could be particularly important if comparing CWL implementations, or different deployment setups (e.g. Docker vs Singularity vs uDocker vs BioConda).

For instance, even if just measuring the tool executions, a large delay between a step's finish time and subsequent step start as measured by cwl-metrics could indicate workflow system inefficiency. Currently in-between step activities are not measured.

cwl-metrics relies heavily on the cwltool reference implementation, e.g. by parsing its log files and having strict requirements for its execution options. This is a weakness of the method as it limits the applicability (although it helps to ensure accurate metrics, and I would like this to be better acknowledged in the manuscript along with thoughts on future directions.

One point here - as cwl-metrics relies on the cwltool log file, which is unstructured, then a strict version dependency on cwltool should be enforced or at least documented, as otherwise cwl-metrics is likely to break whenever the cwltool change parts of its logging output.

10. Have the authors followed best-practices in reporting standards?

This is an essential component as ease of reproducibility and usability are key criteria for manuscript publication. Please note, the methodology sections should never contain “protocol available upon request” or “e-mail author for detailed protocol”. Have the authors followed and used reporting checklists recommended in our page on the Biosharing network and if the methods are amenable, have the authors used workflow management systems such as Galaxy, Taverna or one of the many related systems listed on MyExperiment? We can also host these in our GigaGalaxy server if they currently do not have a home. We also encourage use of virtual machines and containers such as Docker. And the use and deposition of both wet-lab and computational protocols in a protocols repository like protocols.io, and code in the cloud-based computational reproducibility platform CodeOcean

The authors have used Docker and Common Workflow Language, along with open source code in GitHub. Reproducibility measures are of a very high quality; except as noted for the final Jupyter Notebook.

Given the file availability of the Jupyter Notebook is sorted, along with a GitHub release version, then it should be possible to use CodeOcean or https://mybinder.org/ to have an actually executable notebook.

11. Can the writing, organization, tables and figures be improved?

Although the editorial team may also assess the quality of the written English, please do comment if you consider the standard is below that expected for a scientific publication.

If the manuscript is organized in such a manner that it is illogical or not easily accessible to the reader please suggest improvements. Please provide feedback on whether the data are presented in the most appropriate manner; for example, is a table being used where a graph would give increased clarity? Do the figures appear to be genuine, i.e. without evidence of manipulation, and of a high enough quality to be published in their present form?

The English language is unfortunately not of a high enough quality for publication, in several cases confusing the scientific message.

I have suggested grammatical fixes in the annotated manuscript.

The organization of the manuscript is good, and the figures are helpful.

12. When revisions are requested.

Reviewers may recommend revisions for any or all of the following reasons: the data require additional testing to ensure their quality, additional data are required to support the authors' conclusions; better justification is needed for the arguments based on existing data; or the clarity and/or coherence of the paper needs to be improved.

Minor Revisions required:

  • English Language needs to be revised
  • Missing DOIs for citations
  • Missing DOIs for code/workflows, not (just) GitHub links
  • License on CWL workflows/tools
  • Fix Reproducibility issue in Notebook (missing files)
  • License and source/upstream attribution for quay.io docker images

Other suggestions in this review are recommended, but not required.

  1. Are there any ethical or competing interests issues you would like to raise?

The study should adhere to ethical standards of scientific/medical research and the authors should declare that they have received ethics approval and/or patient consent for the study, where appropriate.

Whilst we do not expect reviewers to delve into authors' competing interests, if you are aware of any issues that you do not think have been adequately addressed, please inform the Editorial office.

No ethical concerns or competing interests have been identified.

Reproducibility notes

I installed https://github.com/inutano/cwl-metrics according to its installation instructions. I tested on a Linux server with Ubuntu 16.04.5, 100 GB free diskspace, 24 GB of RAM, 8 cores Xeon X5672. I installed cwltool 1.0.20181201184214 using pip3 and 3.5.2. Tested with Docker 17.05.0-ce, build 89658be and docker-compose 1.22.0, build f46880fe

Note: Instead of "curl | bash" constructs, which tell people to run things untrusted straight from the internet, the instructions should rather say what needs to be downloaded, and then what needs to run, so that the users are allowed to read through the code/sources first.

The installation correctly set up cwl-metrics and its constitutent docker containers, and started monitoring.

I tested cwl-metrics this by downloading the workflows from https://github.com/pitagora-galaxy/cwl

Instead of running the curl|bash construct I instead ran test/bin/run-cwl which nicely showed possible workflows to run.

I tested:

  • hisat2-cufflinks_wf_se (single-ended)
  • hisat2-cufflinks_wf_pe (pair-ended)
  • kallisto_wf_se
  • kallisto_wf_pe
  • tophat2-cufflinks_wf_pe
  • tophat2-cufflinks_wf_se

All the above workflows executed successfully using the default inputs provided by run-cwl. I have not evaluated their workflow outputs or parameter settings with respect to bioinformatics correctness, but verified that they are containing outputs of the expected format and quantity.

The preparation script downloaded reference data required by multiple workflows, totalling some 30 GB. It was not clear during execution that this data would be stored in the $HOME rather than the current directory, so I had to increase the size of my /home. I can understand the reason though, as it means reference data were only downloaded once. Disk space requirements should be noted in README.

The urls for these reference data are currently hidden inside the download scripts. It appears these downloads have not got any checksums so I am unsure if I have tested against the same chromosome references as the author.

A quick "grep http" show the URLs as:

https://s3.amazonaws.com/nig-reference/GRCh38/bowtie2_index/bowtie2_GRCh38.tar.gz" http://data.dbcls.jp/~inutano/reference/hisat2_index/refMrna.tar.gz" https://s3.amazonaws.com/nig-reference/GRCh38/kallisto_index/GRCh38Gencode.gz" https://s3.amazonaws.com/nig-reference/GRCh38/rsem_index/GRCh38.tar.gz" https://s3.amazonaws.com/nig-reference/GRCh38/sailfish_index/sailfish_GRCh38.tar.gz" https://s3.amazonaws.com/nig-reference/GRCh38/salmon_index/salmon_GRCh38.tar.gz" https://s3.amazonaws.com/nig-reference/GRCh38/star_index/star_GRCh38.tar.gz" http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/refMrna.fa.gz"

It is unclear who provides these S3 downloads, who made them, from what source, and for how long they will remain available.

I would lift the checksums and download URIs to be shown in the README of the whole repository, as it is likely that some of these reference data files might disappear or get different URLs after some years.

Perhaps snapshots of the required data could be deposited in an archive like Zenodo or similar for longevity?

Due to the helpful preparation scripts hiding the execution details, it was initially a bit unclear for me what were the actual workflows and input data, but the script also generates tidy results directories like test.kallisto_wf_se/result/SRR405 that includes copies of the workflow, so it was easy to keep the different runs separate and inspect manually.

I subsequently used "cwl-metrics fetch" (After modifying the PATH according to https://inutano.github.io/cwl-metrics/#launch-cwl-metrics-system) to successfully inspect the TSV output. I was unable to quickly make sense of the more complete JSON output, but inspected it partially in the running Kibana instance.

It was quite unclear to me how to use the Kibana interface to find useful information, although the cwl-metrics README guided me partially on the way. Perhaps this guidance could be extended, e.g. to show how to visualize one of the metrics. I can see this to be powerful, but also having a steeper learning curve than the summarized TSV output.

I downloaded the https://github.com/inutano/cwl-metrics-manuscript repository to try the Juyter Notebook.

The initial part of the notebook works well in that it downloads the TSV file from the cited Figshare. I needed to manually create the "data" directory for the download to work.

The subsequent steps from "Load RNA-seq sample metadata" however did FAIL because files like "ec2_instance_types.tsv" and "sample_metadata.tsv" are missing from the Figshare download. A hint about "merge_multiple_metrics.rb" was given in the notebook, but it was unclear to me how to use this.

I tried a naive:

system("../lib/merge_multiple_metrics.rb", intern=TRUE) Warning message: “running command '../lib/merge_multiple_metrics.rb' had status 126”

Obviously some JSON files would be needed for this to work. It would be good to have two ways to do this, one is to have the JSON files from the complete experiment also in the FigShare upload, secondly instructions on how to replicate this from new workflow runs performed with cwl-metrics locally.

The rest of the notebook fails as a cascade error from the missing files.

As the authors have been very diligent in making the research reproducible and workflows runnable out of the box, then I kindly request also for the last mile to be completed so the Jupyter Notebook (which as far as I understand makes all the figures of the paper) would also be reproducible on an empty setup, even if it would be using CWL Metrics snapshots from Figshare.

Confidential remarks for editor

n/a

Declaration of competing interests Please complete a declaration of competing interests, considering the following questions: Have you in the past five years received reimbursements, fees, funding, or salary from an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold any stocks or shares in an organisation that may in any way gain or lose financially from the publication of this manuscript, either now or in the future? Do you hold or are you currently applying for any patents relating to the content of the manuscript? Have you received reimbursements, fees, funding, or salary from an organization that holds or has applied for patents relating to the content of the manuscript? Do you have any other financial competing interests? Do you have any non-financial competing interests in relation to this paper? If you can answer no to all of the above, write 'I declare that I have no competing interests' below. If your reply is yes to any, please give details below.
I declare that I have no competing interests

I agree to the open peer review policy of the journal. I understand that my name will be included on my report to the authors and, if the manuscript is accepted for publication, my named report including any attachments I upload will be posted on the website along with the authors' responses. I agree for my report to be made available under an Open Access Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments which I do not wish to be included in my named report can be included as confidential comments to the editors, which will not be published.
I agree to the open peer review policy of the journal.

Authors' response to reviews: Reviewer #1:

Overall the authors provide a useful utility to cwltool that allows for an easy to use collection of runtime metrics. These metrics can be used to make informed decisions on estimating cost and which VM flavors are most efficient. My biggest concern is that the language throughout the manuscript except for a small part of the discussion conveys the idea that this tool is CWL-specific. CWL is simply the specification for defining a workflow and does not define anyway to report logs or runtime metrics which are defined by specific CWL engines. Thus, CWL-metrics is actually dependent on the cwltool reference implementation and the way it currently outputs logs. To me, CWL-metrics is more like an enhancement utility to the cwltool package than a part of CWL itself. I think this is an important distinction to make, especially since cwltool currently only handles serial operation of highly parallelizable workflows. So, if you were really concerned about cost, you likely wouldn't use cwltool as an engine, but look into something like Cromwell, Cavatica (Seven Bridges), or use a GA4GH TES implementation like Funnel. Again, this tool is indeed useful as I have personally used it, I just think that it is important to better clarify this in the manuscript.

Response: We agree with the reviewer's assessment. We started this project to develop a method to provide runtime metrics of CWL workflows so users can share workflows with their resource requirements. The current implementation of CWL-metrics, however, as the reviewer indicates, is the utility tool for cwltool. This is just because the cwltool is the reference implementation of CWL, and will be maintained with the updates of the specification. Yet we think we need to increase the coverage of workflow runners so users can get more practical information. We agree that it is important to make the current status clear. To emphasize the CWL-metrics depends on cwltool, we changed the sentence in the abstract to "We developed CWL-metrics, a utility tool for cwltool, the reference implementation of CWL, to collect runtime metrics of Docker containers and workflow metadata to analyze resource requirement of workflows" and added "CWL-metrics works with cwltool, the reference implementation of CWL" in the last paragraph of the background section.

Related to this, the authors mention Nextflow and Galaxy among others; however, they don't mention WDL or Cromwell which I believe is much more closely related to CWL and I think merit mentioning, especially since Cromwell can run CWL workflows at scale and in parallel.

Response: In the original manuscript, we mentioned Galaxy, Toil, and nextflow because they are the runners that can collect runtime information like CWL-metrics. We added a new paragraph in the discussion section to mention Cromwell and the limitation of cwltool in parallel job execution.

The following are minor things that I believe need to be addressed. The authors use the term "alignment-like" when referring to tools like Kallisto and Salmon; however, I believe the appropriate term is "pseudo-alignment".

Response: The Kallisto is using the term "pseudo-alignment" in its paper and documentation, but the Salmon does not. The authors of Salmon call its algorithm "quasi-mapping" and did not use the "pseudo-alignment" (https://doi.org/10.1038/nmeth.4197). Therefore we used the term "alignment-like", but it is still confusing. We changed the sentence to call them software use "different alignment approaches".

In figures 4 and 5, I assume the duration y-axis is in seconds; however, this doesn't seem to be mentioned in the description or axis labels and is important especially for people unfamiliar with the workflows.

Response: We added the corresponding units to the Y-axis labels of the plots. Thank you for pointing out the issue.

Finally, I don't feel like the authors provided any clear "future features" they would like to work on (or have the community contribute to). For example, they mention some limitations (e.g., scatter/gather across nodes) that could be overcome by having a centralized service (could be containerized) that all workers post metrics to.

Response: We added a new paragraph in the discussion section with the future prospects including the support of parallelized job execution and the different container runtime like Singularity.

Reviewer #2:

In this paper the authors describe a system to collect execution runtime metrics for computational workflows described using the Common Workflow Language notation. They also provide a benchmark of different tools executed against different dataset to show the benefits for their approach.

The topic is quite interesting because, given the exponential growth of genomics data, there's a pressing need to optimise bioinformatics tools and workflows for better resources allocations and usage in order to optimise the overall costs of long running in-silico data analyses.

The paper is easy to read, well structured and informative. It's particularly interesting the benchmark comparing different genome sequence aligners resource usage.

The only point to make is that the manuscript would capture the interest of a broader audience if the authors would provide a more balanced comparison with similar technologies such as Galaxy and Nextflow. For example the authors mention that their system only work for specific a CWL implementation (cwl-runner) and require the usage of Docker compliant system along with the deployment of other third party tools (eg. Telegraph, Elasticsearch, etc.) which installation could be challenging the average workflow users. Nextflow implements a very similar feature as the one describe in this manuscript to collect, visualise and export the execution metrics. However it can be used irrespective the execution platform supported by the tool (ie. local execution, clusters and clouds), in a single node or multi-node deployment and does not require the installation of any third party software components, either with or without containerised execution (disclaimer, the writer is the creator of the Nextflow tool).

Response: We added a paragraph in the discussion section for a more clear comparison of runners in terms of parallel job execution. The main difference between the cwltool and the other runners that can collect runtime metrics is the ability to capture the parallelized workflow job. We described the limitation of the current implementation of CWL-metrics that depends on cwltool. We want to note that the CWL-metrics users do not need to install Telegraph or Elasticsearch by themselves because the system automatically fetches those components as Docker containers. The prerequisites of the system are git, curl, Perl, Docker, and Docker compose. We added a sentence to mention this in the "Implementation of CWL-metrics" in the result section.

Very minor note, the usage of a human friendly format and data units for time and memory (e.g. seconds or Mega/Giga bytes) values would make the charts more readable.

Response: We added the corresponding units to the Y-axis labels of the plots. Thank you for your suggestion.

  • Paolo

Response: Thank you very much, Paolo! Taz :)

Reviewer #3:

This manuscript proposes cwl-metrics, a way to collect metrics from Common Workflow Language executions of bioinformatics tools using Docker containers.

The manuscript is submitted as a Technical Note, but the research is of such a high quality that this could even be a Research article had it also provided a broader Background and a Discussion with comparison of workflow metrics systems beyond CWL.

Response: We submitted this manuscript via the direct submission system through bioRxiv, which did not give us an article type selection. We suppose that the GigaScience editorial office made a decision to make this a Technical Note, but we would be happy to change the category to Research article if possible.

The authors have been diligent in reproducibility and been good practitioners of Open Science, recording rich details of their evaluations and providing installation scripts for not just the software but also the evaluation setup. Some small issues remain to make this truly reproducible.

The English language of the manuscript is however not of a good quality for publication, to a degree where this can be confusing. Knowing the authors work from the CWL community I have provided feedback within the annotated manuscript (attached PDF) using ISO5776 text proof notation.

See the detailed review PDF for further comments according to GigaScience review guidelines, in particular the reproducibility section.

My detailed review is also web-accessible at the (secret) URL https://gist.github.com/stain/30e49363238d5a35e26f9fb1a31ebf8e

Response: Thank you very much for the many practical suggestions and corrections in detail. The correction of English writing was very helpful. We greatly appreciate your help.

Minor Revisions required:

  • English Language needs to be revised

Response: We modified the text following your suggestions and corrections.

  • Missing DOIs for citations

Response: We added the DOIs to the reference section.

  • Missing DOIs for code/workflows, not (just) GitHub links

Response: For our GitHub repositories, we assigned DOIs and used them in the reference section.

  • License on CWL workflows/tools

Response: We added the license to the repository and to the individual files (https://github.com/pitagora-network/pitagora-cwl). The license is Apache-2.0 following the best practice of the Common Workflow Language project.

  • Fix Reproducibility issue in Notebook (missing files)

Response: We uploaded the missing files to the Figshare and fixed the Notebook code to fix the issue. We also tested the Notebook correctly works on a new machine environment.

  • License and source/upstream attribution for quay.io docker images

Response: We added the GPL-3.0 license for the Docker images we uploaded to Quay.io.

Other suggestions in this review are recommended, but not required.

Response: We also fixed documentations on GitHub to provide more details for workflows and test scripts we provide. The documentation for reference data we used for the benchmark is also available on the new GitHub repo (https://github.com/pitagora-network/pitagora-cwl). We uploaded the reference data index to Zenodo (https://doi.org/10.5281/zenodo.2587201). We also uploaded the intermediate files and final outputs of the workflows executed for the benchmarking to Zenodo (https://doi.org/10.5281/zenodo.2586546). Thank you very much again for your suggestions, we are sure that the CWL-metrics project is now more reproducible.

Source

    © 2018 the Reviewer (CC BY 4.0).

References

    Tazro, O., Tomoya, T., Osamu, O. Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection. GigaScience.