Content of review 1, reviewed on December 18, 2012

Overall, the paper is a descriptive account of establishment and operation of the UPPNEX infrastructure for next-generation sequencing based bioinformatics in Sweden. Although there is interesting information in the paper, a lot of it is anecdotal and it’s not clear what the conclusion of the paper is beyond some specific technical comments on the overall setup, and some more general comments on the operational requirements for such a facility as experienced by the authors. Some of the more interesting findings, for example the relationship with non-technical users, are perhaps treated a little superficially - I would have been very interested to learn how the centre has trained biologists to use technically demanding tools, for instance, and how this model is scalable, and details on how the biologists interact with the range of staff at the UPPNEX.

The title of the paper is ‘Lessons learned...’ but I’m not sure after reading the paper, what these lessons were? Perhaps a list of recommendations would have been useful, or an analysis of the type of user, or a breakdown of the types of applictions and jobs the centre caters for with some interpretation that would aid those setting up similar centres.

The paper also suffers from some expression problems. Some examples are below. These problems may be fixed but I think that the larger problem of the contribution of the paper being unclear makes it very hard to recommend this paper for publication at this stage.

  1. Does it address an important or timely issue?

Yes, there is a clear need for useful guideance on setting up and running centres supporting modern genomics.

  1. Is it well reasoned?

The paper really could do with some editing, perhaps by someone not involved as an author. It doesn't quite come across as a reasoned overview and seems to jump around a little and use anecdotes rather than referenced facts. That is not to say that the anecdotes are necessarily wrong, but it reads a little as opinion; e.g..:

"Scientists in these fields also have learned the skills needed to work with high-performance computers, which traditionally has meant programming and scripting in text-based command line environments."

"As new algorithms keep emerging, no one wants to delete raw data as it could be used later with new tools to reveal new information."

Both of these are probably true, but don't read well in a scientific paper.

  1. Is it relatively balanced, or does it make plain where the author's opinions might not represent the field as a whole?

It is balanced in that there are no ‘controversial’ views expressed, but there are a number of opinion based statements in the paper.

  1. Is the standard of writing acceptable?

Unfortunately I would have to say no. There are frequent expression issues, for example:

"Bioinformatics has emerged as a key discipline, and it is envisioned to increase in importance as more and more biological experimentation utilize high-throughput instrumentation or is being outsourced [3]."

"With larger problems the computational power of desktop computers are insufficient,"

"and several mature software suites that are well integrated in multi-computer infrastructures have emerged."

"resulting in pipelines for NGS analyses being a hot topic with several proposed solutions [6{11]."

"To make matters worse, some available software requires NGS data to be in uncompressed and inefficient data formats that dramatically increases the footprint on storage media."

"In this paper we present a Swedish infrastructure (UPPNEX) providing these resources; a high-performance cluster and storage with a maintained bioinformatics software suite, as well as application experts that assist with expertise in bioinformatics analysis."

“These users are not seldom newly graduated Master students,“

Level of interest: An article whose findings are important to those with closely related research interests

Quality of written English: Not suitable for publication unless extensively edited

Statistical review: No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests

Source

    © 2012 the Reviewer (CC-BY 4.0 - source).

Content of review 2, reviewed on April 02, 2013

I think the paper has improved from the previous version. The actual lessons listed are useful and add a lot to the paper.

I think the authors should spend more time editing and checking...the first thing that stood out was this paragraph under ‘Lessons learned’:

“Another advantage of sharing the cluster with other user groups at UPPMAX is that other scientic domains often have a different user pattern than UPPNEX users (data not shown) Do we need to state that the data is not shown? Or is that implicit, since we don't give a reference to it? ..“

Some of the paper still seems quite anecdotal:

‘Conclusions’ paragraph

“What distinguishes UPPNEX from most other sites providing resources for NGS analysis are:
i) UPPNEX has several platforms as clients with all major sequencing technologies;
ii) Three different sequencing platforms have three different approaches and implementations of analysis pipelines;
iii) UPPNEX provides a wide range of analysis tools for NGS analysis;
iv) the system is not limited to bioinformatics but is shared with other scientic domains;
v) as a national resource, serving most of the NGS community in Sweden, UPPNEX is challenged with strategies for managing data growth.”

This paragraph is a mixture of strange expression and anecdote... are all these statements about why UPPNEX is different really true? Two of the three centres that are compared are both said by the authors in the preceding paragraphs to have lots of different analysis tools also (point iii);

Under ‘UPPNEX Infrastructure’>Hardware

“After 2 years (in 2011) the parallel storage was expanded to approximately 900 PiB.”

Is this really true? The paragraphs below indicate that it went from 545Tib to 900TiB.

  1. Does it address an important or timely issue?

As before. Yes, there is a clear need for useful guideance on setting up and running centres supporting modern genomics.

  1. Is it well reasoned?

In my first review I said: “The paper really could do with some editing, perhaps by someone not involved as an author.”

I am going to say again: “The paper really could STILL do with some editing, perhaps by someone not involved as an author.”

It has a better flow, but still reads a bit as a anecdotal history of UPPNEX rather than a science-driven analysis of UPPNEX’ role and performance. I think this paper could be markedly improved if it contained case studies from two different high-throughput genomics centres like UPPNEX rather than one to generalise and contrast, but I recognise this is a lot of work.

And I still think this is true: “It doesn't quite come across as a reasoned overview and seems to jump around a little and use anecdotes rather than referenced facts. That is not to say that the anecdotes are necessarily wrong, but it reads a little as opinion.”

  1. Is it relatively balanced, or does it make plain where the author's opinions might not represent the field as a whole?

It is balanced in that there are no ‘controversial’ views expressed, but there are a number of opinion based statements in the paper. As above, it is narrow in scope because of the case study nature of the paper.

  1. Is the standard of writing acceptable?

Not really. There are STILL expression issues, for example:

“since many NGS software applications are not prepared to run on multi-user, multi-project HPC systems.

“The Lonestar cluster has the most common NGS softwares installed.”

“but it turns out that the NGS community will require much more computational resources in the coming years.“

“The only main drawback is that if the file system goes down“ (‘goes down’ is a very informal term)

“The UPPNEX users work interactivly with the”

“the users was forced to reserve a full node (24 GiB) if needing more than 3 GiB, which has often been too small for many jobs, accordig to many users (personal communication).”

There are probably others but I have spent some time on this paper already and it’s not really my job to find these type of errors....

Regarding acceptance/rejection:

I am going to leave it to the editor to decide! It has some useful stories to tell, but is not a rigorous analysis of the requirements of high throughput genomics; more a case study with some broader analysis. It also still needs a good edit for language and expression.

My inclination is that this is probably still some way from being publishable, both in presentation and in content. But it's really the editor's call.

I am sorry that I cannot cleanly split this into 'major, minor and discretionary' revisions.

Level of interest: An article whose findings are important to those with closely related research interests

Quality of written English: Needs some language corrections before being published

Statistical review No, the manuscript does not need to be seen by a statistician.

Declaration of competing interests: I declare that I have no competing interests

Source

    © 2013 the Reviewer (CC-BY 4.0 - source).

References

    Samuel, L., Martin, D., I., O. P., Jonas, H., Ola, S. 2013. Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data. GigaScience.