HealthDay News – Modeling has identified the amount and type of data needed to detect prediagnostic heart failure, according to research published online in Circulation: Cardiovascular Quality and Outcomes.
Kenney Ng, PhD, from the T.J. Watson Research Center in Cambridge, Massachusetts, and colleagues used longitudinal electronic health records data to examine the performance of machine learning models trained to detect prediagnostic heart failure in primary care patients. The authors assessed model performance in relation to the prediction window length, observation window length, number of different data domains, number of patient records, and density of patient encounters. Data from 1684 incident health failure cases and 13,525 sex-, age category-, and clinic-matched controls were used.
The researchers observed improvement in model performance as the prediction window length decreased, especially when less than 2 years, and as the observation window length increased, but plateaued after 2 years. In addition, improved prediction was seen with increases in the training data set size, but leveled off after 4000 patients; with use of more diverse data types (combination of diagnosis, medication order, and hospitalization data were most important); and when data were confined to patients with 10 or more phone or face-to-face encounters in 2 years.
“These empirical findings suggest possible guidelines for the minimum amount and type of data needed to train effective disease onset predictive models using longitudinal electronic health records,” the authors wrote.
Ng K, Steinhubl SR, deFillippi C, Dey S, Stewart WF. Early detection of heart failure using electronic health records. Practical implications for time before diagnosis, data diversity, data quantity, and data density. Circ Cardiovasc Qual Outcomes. 2016 Nov 8 doi:10.1161/CIRCOUTCOMES.116.002797.