Content of review 1, reviewed on November 11, 2016

  • Knowledge Gap: Typical genetic research studies used purpose-built cohorts, which can be costly especially for rare diseases. Another emerging option consists in mining eletronic health records (EHR) in order to find candidates.

  • Strategy: EHR are designed primarily for clinical care, not research, and contain a lot of unstructured data, like narrative free text. In this review, they propose to go through the different types of data available in EHR, how they look like and what could be efficient ways to extract knowledge from them, by using for instance natural language processing, which allows more complex searchs than just keywords.

  • Main findings: They mention some interesting findings. In particular, the phenotype-driven discovery algorithm. It consists in aggregating the different aspects of EHR data into a predictive model, usually Boolean logic rules. They also mention efforts made by the eMERGE network to provide online algorithms for different diseases, including autism.

  • Strengths: Very well-documented introduction to EHR mining. It provides a lot of potential applications, like using billing data which simply consists in a link between a disease and a procedure (done for insurance purpose), to expand your recruitment to undiagnosed but tested patients.

  • Weaknesses: Do not go in details in the Natural Language Processing methods, but it is done in Chapter 16 about text mining. For the design of phenotype discovery algorithm, they mention the potential lack of portability of a logistic regression, compared to Boolean logic rules.

Source

    © 2016 the Reviewer (CC BY 4.0).

References

    C., D. J. 2012. Chapter 13: Mining Electronic Health Records in the Genomics Era. PLOS Computational Biology.