Content of review 1, reviewed on November 11, 2016

  • Knowledge Gap: Functional annotation, especially for non-coding regions, remains challenging. Current tools (CADD) does not use tissue-specific information provided by projects like ENCODE/RME/GTEx.

  • Strategy:

  • Here, they propose to integrate RME data for more than 100 cell-types to derive 7 tissue-specific predictive models (brain, GI, lung, heart, blood, muscle and epithelium), meaning that one locus might be functional in one tissue but not in others.

  • Their model estimation relies on unsupervised learningand mixture modelling: they assume a joint distribution of their 8 annotations (histone marks) to be a mixture of functional and non-functional positions.
  • To ease the model fitting, they make another assumption (that I think is a bit strong): each annotation mark is conditionally independent given the functionality, and then derive a posterior probability for each locus.

  • Main findings:

  • they estimate that 22% of the genome is functional, for at least one tissue, and ~2% is functional across the 7 tissues.

  • they validate GenoSkyline on known tissue-specific annotations (regions for blood and heart, VISTA enhancers for brain and heart)
  • On Psychiatric Genomics Consortium data (SCZ), they demonstrate a better prioritarization using brain model rather than heart, and opposite results were observed on 23 CARDIoGRAM loci.

  • Strengths:

  • UCSC track for whole-genome annotation

  • preprint of GenoSkyline-Plus that includes RNA-Seq information from RME too

  • Weaknesses:

  • strong assumption on the probabilistic model

  • only integrate one level of information (epigenetics, even if there is multiple marks)

Source

    © 2016 the Reviewer (CC BY 4.0).