Content of review 1, reviewed on March 24, 2020

Overall statement or summary of the article

This article presents the results of a large cohort study of 30,961 patients with COPD using electronic health records (EHR) from primary care settings. This is the largest study of this kind to identify reproducible COPD subtypes. The authors used a combination of multiple correspondence analysis (a data reduction method) and clustering analyses (k-mean and hierarchical clustering) to discover five COPD phenotypes in 75% of the whole dataset (the training set) and evaluated (reproduced) in the 25% of the remaining data (test set). The derived COPD subtypes were further validated against clinically meaningful outcomes such as acute exacerbations COPD (AECOPD) and mortality.

Overall strengths of the article and what impact it might have in your field

The strengths of this article is a) the large sample size of patients with COPD using data from EHR of primary care settings where the majority of COPD is managed and b) the reproducibility of COPD subtypes as determined by identifying them in the training dataset (75%) and validated them in the test dataset (25%) as well cross-validate them against clinical outcomes (AECOPD and mortality). The findings of this article have great impact on the research of COPD phenotypes as it is the largest study conducted using machine learning methods.

Specific comments on weaknesses of the article and what could be done to improve it

The main weakness is the overlap of the derived clusters (COPD subtypes) suggesting a temporal variability between patients within the same cluster. This means that many patients may belong to multiple clusters with regards to disease severity and progression, which is something that was not addressed in the current article and will be the subject of further investigation.

Source

    © 2020 the Reviewer.

References

    Maria, P., Kathleen, Q. J., Francis, N., Harry, H., Liam, S., Spiros, D. 2019. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC Medical Informatics and Decision Making.