Some errors in research are near impossible to spot with the naked eye, and it can lead to fraudulent and careless research slipping through the gaps.
Recently named one of Nature’s Top ten people who matter in science, Publons speaks with cancer researcher Dr. Jennifer Byrne (University of Sydney) who analyzes childhood and adult cancers at a molecular level by day, and by night, weeds out problematic cancer research with a tool she co-created with Dr. Cyril Labbé, called Seek & Blastn.
Dr. Jennifer Byrne is the Professor of Molecular Oncology Discipline of Child and Adolescent Health, The Children's Hospital at Westmead Associate Dean (Transition Strategy & Special Projects), Sydney Medical School.
Jo Wilkinson: Hey, I’m Jo Wilkinson and you’re listening to a Publons audiocast on trust and integrity in peer review.
In our last blog in this series we talked about fake peer review and what editors and publishers can do about it. We discussed it as stemming from today’s publish or perish culture, and from a lack of transparency and recognition in peer review. Unfortunately, it’s led to a limited number of peer reviewers who have the time to do a thorough and thoughtful job.
But even when reviewers do have the time, no matter how hard they analyse a paper, fraudulent and careless research can still slip through the gaps.
This can often be related to trust. Some errors are near impossible to spot with the naked eye -- or they require being there when the experiment is conducted. In these cases it’s less of a matter of spotting the mistakes, but being able to rely on the researcher to conduct their work in a robust or honest way.
Here’s to talk with us about that today is Dr. Jennifer Byrne.
Jennifer Byrne: I think investigating these papers made me realise that trust is the forgotten essential in scientific research. Every time we read a publication really we have no idea how those experiments were done. We’re taking it on trust that the authors have described exactly what they did. If we can’t assume that then it really doesn't matter how many brains we have and how much money we have. If you can’t trust somebody else's work you can’t build upon that so it's very important that we have systems that allow that trust to be maintained.
JW: Recently named one of Nature’s Top ten people who mattered in science, Dr. Byrne analyzes childhood and adult cancers at a molecular level by day, and by night, weeds out problematic cancer research with a tool she co-created with Dr. Cyril Labbé, called Seek & Blastn.
But before we get into how it works, let’s chat about the story behind it.
Thanks so much for chatting with me today, Jennifer, really happy to have you here.
JB: Thanks Jo, it’s great to be here.
JW: So tell me, how did Seek & Blastn first come about?
JB: Yes well, this is a bit of an unexpected story which I guess these things happen quite a lot in science. In the ‘90s I cloned a number of genes for the first time that formed a gene family, and one of those genes that I’ll just call L2 for short, meaning it’s like another gene, it’s kind of a brother gene. We cloned that gene but we never really studied it much, largely because we were busy studying the first gene, and because you just never have time to follow up everything you find. It’s a little bit like a relative. You still check in on this gene every so often and see if anyone else is studying it.
And so over about 20 years basically I realised no one was studying it. There was about one paper coming out a year about this gene. So it was a bit of a surprise I guess that in 2014/2015 that 5 papers appeared. Which, you know, seemed a little bit of a blip on the radar but when I looked at these papers I was very confused by the fact they were just so similar to each other.
I certainly wasn’t suspecting anything untoward at the beginning but when I started to compare them they were too similar. They were just too similar: the organisation of the papers, the types of experiments had been done and the way that those results had been organised in figures. There was this pattern of construction that I had not seen before, and that’s not normal when independent groups publish results that look as if it’s following some kind of template down to figure A, B, C being the same in the different papers. That’s just not how independent scientists work. Some of the text was quite poorly prepared and they also contained some similar and at times quite damning mistakes.
One of the things was that I sort of latched onto earlier on was the description of nucleotide sequences within those papers. So, I started to investigate the sequences that these papers were reporting and I realised that in many cases they were incorrect.
Then I contacted who is now my colleague in France, Dr. Cyril Labbé, who is an informatician who has also had some interest in identifying suspect publications using a previous tool that he had written. And so he and I collaborated to try and understand what we were studying and to get some idea about the extent of the problem.
JW: And what did you suspect was going on?
JB: Yeah I really didn’t have any clue for a while. But it was through doing some reading that I realised that in China, so all of these papers were authored by researchers in China, there has been reports of researchers purchasing places on publications, or even purchasing manuscripts that they then put their name on.
We suspected that this was some form of this kind of activity. Possibly people purchasing data, so figures which might explain the similarity of the figures, and then kind of writing the papers themselves or possibly getting some of the writing done for them as well. We don’t really know of course.
JW: So all these papers all got through the peer review process…
JB: Yes and I think from our analysis of these papers, what we’re realising is all of these errors are in the literature because they haven’t been detected by the authors first up, but then they also haven’t been detected by peer reviewers or editors, and largely probably haven’t been detected by readers. So they’re just sitting in papers without anyone knowing that they are there. I think that highlights that reviewers are generally not checking the tiny details of papers as to whether or not this sequence is actually what the authors say it is. Peer reviewers reasonably assume that authors have got those details right. It has been argued that in fact that we don’t want peer reviewers to be peering into these tiny details about papers because they may miss the broader importance of a paper to a field. So in that sense this kind of tool really allows editors and reviewers to do something that they just wouldn’t do themselves.
Image of nucleotide sequence by Gregory Podgorniak.
JW: Ok, so that’s when you and Dr. Cyril Labbé created Seek & Blastn. How exactly does it work?
JB: So Seek & Blastn was really very much my colleague Cyril’s idea. I think why we were both analysing these papers and at one point we realised that we were both using nucleotide sequences as queries for Google Scholar searches. So typically people will use text or author names to look for publications in Google Scholar but we started to try and fish into the literature and figure out if these nucleotide sequences that we knew had been misused in publications whether they were actually out there in other publications that we didn’t know about yet. So we were both starting to use sequences as queries and I think Cyril as an informatician took the step further realising that we could perhaps do this in an automated fashion, that we could design a programme that would extract sequences from publications and use them as queries.
What Cyril’s programme does is it has a seek component, so it uploads a PDF and then massages the text a bit so it’s easier to analyse, and then looks for nucleotide sequences in the text, when it finds them it extracts them, it extracts some information about them, it then conducts an independent database search using the programme BlastN, which is very widely used in biomedicine, and then it tries to match the predicted identity of the sequence in terms of whether it’s a targeting sequence or non-targeting sequence back to the claim status in the paper. So it’s a tremendously complicated and ambitious thing that Seek & Blastn is trying to do.
JW: That’s right. And one that can lead to retractions...
JB: Yes, so quite early on I wrote to the editors those publications and I guess I got very different responses from those different journals. There were four journals involved. Now out of those five publications I think four have been retracted and we also identified and analysed a number of other publications, so we originally published a cohort of 48 and we actually knew about a few more. So I wrote to the editors of all of those and wrote to the papers as well, after our publication appeared, and we’re still in contact with some of those editors, but I think to date a total of 10 publications that have been retracted.
JW: Can everyday peer reviewers and editors use the tool?
JB: Yes we think that we think that they can. We certainly know that because of a variety of factors, because of the wide variety of uses for nucleotide sequences in the literature, because of variations of how they are actually formatted in publications, because of variations in gene identifiers that are used in publications, Seek & Blastn is unlikely to be 100% reliable and in fact we know it isn’t at the moment, but we think it will probably be difficult to design a tool that is going to be completely reliable on its own. So we think it will always need some degree of interpretation and support, but certainly it’s designed to be used by anybody that really understands the use of nucleotide sequences in a paper.
JW: Ok and what about plans for the future. Could a version of this tool be relevant in other fields one day, too?
JB: I think what we’re studying at the moment is probably concentrated within cancer research. We haven’t really had the resources to look more broadly. We have seen a couple of papers that have indicated that perhaps there are similar issues happening in other areas of biomedicine, for example, some animal studies, but we just don’t have the resources to analyse that ourselves.
I think cancer research is possibly attracting some attention because it’s easy for authors to make a claim that their research is relevant to a disease that unfortunately causes suffering and morbidity in the community and so that can be a bit of a handle I guess that can be exploited for honest efforts and perhaps for efforts that are less than honest.
JW: That’s exactly right. And such a sad truth. Before we wrap things up, just one last question: do you have any advice for new or experienced research in cancer research?
JB: Yes, well, I guess I’d encourage people who are reviewing papers that have nucleotide sequences in them, if they are interested they can try the tool. The test version is freely available on a website so they can test it out, but certainly from our perspective of doing a lot of manual follow up of the results of Seek & Blastn, the process of just analysing papers manually without any kind of support is really tedious - particularly when you’re doing large cohorts of papers. It’s not too bad to analyse single papers but it’s a tedious job and Seek & Blastn definitely makes it easier.
JW: Great thanks so much, Jennifer.
That’s all we have time for today but make sure you check out the next post in our series on fraud and integrity in peer review. You can find that and our previous posts at Publons.com/blog.