Publications
72 Publications
Medical
Journal
Cardiovascular
Artificial Intelligence–Enabled ECG Screening for LVSD in LBBB: Evaluating Model Development and Transfer Learning Approaches
Left bundle branch block (LBBB) is a common electrocardiogram (ECG) abnormality associated with left ventricular systolic dysfunction (LVSD). Although artificial intelligence (AI)–driven ECG analysis shows promise for LVSD screening, it remains unclear if a general AI-ECG model or one tailored for LBBB patients yields better performance.This study evaluates 4 AI-ECG models for detecting LVSD in LBBB patients and examines the impact of training cohort definitions.We developed 4 models using 364,845 ECGs from 4 hospitals: 1) a general AI-ECG model; 2) a model trained on automatically extracted LBBB cases; 3) a model trained on a well-curated single-center LBBB data set with expert review; and 4) a hybrid model employing transfer learning by fine-tuning the general model with single-center LBBB data. LVSD was defined as an ejection fraction #40%. All models were externally validated on 1,334 ECGs from another hospital, with performance assessed by area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and predictive values.In external validation, the transfer learning model achieved the highest AUROC (0.903; 95% CI: 0.887-0.918), closely followed by the general model (0.899; 95% CI: 0.883-0.915); the difference was not significant. Models using automated or expert-based LBBB extraction had lower AUROCs (0.879 and 0.841, respectively). The general model demonstrated high sensitivity, whereas the transfer learning model exhibited superior specificity.Our findings indicate that a broad AI-ECG model reliably detects LVSD in LBBB patients, and transfer learning offers modest improvements without requiring curated LBBB data sets. Evaluating algorithms in representative clinical populations is essential.
Tech
Conference
Non-cardiovascular
ALFRED: Ask a Large-language model For Reliable ECG Diagnosis
Leveraging Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) for analyzing medical data, particularly Electrocardiogram (ECG), offers high accuracy and convenience. However, generating reliable, evidence-based results in specialized fields like healthcare remains a challenge, as RAG alone may not suffice. We propose a Zero-shot ECG diagnosis framework based on RAG for ECG analysis that incorporates expert-curated knowledge to enhance diagnostic accuracy and explainability. Evaluation on the PTB-XL dataset demonstrates the framework’s effectiveness, highlighting the value of structured domain expertise in automated ECG interpretation. Our framework is designed to support comprehensive ECG analysis, addressing diverse diagnostic needs with potential applications beyond the tested dataset.
Tech
Journal
Cardiovascular
Transparent and Robust Artificial Intelligence-Driven Electrocardiogram Model for Left Ventricular Systolic Dysfunction
Heart failure (HF) is a growing global health burden, yet early detection remains challenging due to the limitations of traditional diagnostic tools such as electrocardiograms (ECGs). Recent advances in deep learning offer new opportunities to identify left ventricular systolic dysfunction (LVSD), a key indicator of HF, from ECG data. This study validates AiTiALVSD, our previously developed artificial intelligence (AI)-enabled ECG Software as a Medical Device, for its accuracy, transparency, and robustness in detecting LVSD. Methods: This retrospective single-center cohort study involved patients suspected of LVSD. The AiTiALVSD model, based on a deep learning algorithm, was evaluated against echocardiographic ejection fraction values. To enhance model transparency, the study employed Testing with Concept Activation Vectors (TCAV), clustering analysis, and robustness testing against ECG noise and lead reversals. Results: The study involved 688 participants and found AiTiALVSD to have a high diagnostic performance, with an AUROC of 0.919. There was a significant correlation between AiTiALVSD scores and left ventricular ejection fraction values, confirming the model’s predictive accuracy. TCAV analysis showed the model’s alignment with medical knowledge, establishing its clinical plausibility. Despite its robustness to ECG artifacts, there was a noted decrease in specificity in the presence of ECG noise. Conclusions: AiTiALVSD’s high diagnostic accuracy, transparency, and resilience to common ECG discrepancies underscore its potential for early LVSD detection in clinical settings. This study highlights the importance of transparency and robustness in AI-ECG, setting a new benchmark in cardiac care.
Tech
Journal
Non-cardiovascular
A novel XAI framework for explainable AI-ECG using generative counterfactual XAI (GCX)
Generative Counterfactual Explainable Artificial Intelligence (XAI) offers a novel approach to understanding how AI models interpret electrocardiograms (ECGs). Traditional explanation methods focus on highlighting important ECG segments but often fail to clarify why these segments matter or how their alteration affects model predictions. In contrast, the proposed framework explores “what-if” scenarios, generating counterfactual ECGs that increase or decrease a model’s predictive values. This approach has the potential to increase clinicians’ trust specific changes—such as increased T wave amplitude or PR interval prolongation—influence the model’s decisions. Through a series of validation experiments, the framework demonstrates its ability to produce counterfactual ECGs that closely align with established clinical knowledge, including characteristic alterations associated with potassium imbalances and atrial fibrillation. By clearly visualizing how incremental modifications in ECG morphology and rhythm affect artificial intelligence-applied ECG (AI-ECG) predictions, this generative counterfactual method moves beyond static attribution maps and has the potential to increase clinicians’ trust in AI-ECG systems. As a result, this approach offers a promising path toward enhancing the explainability and clinical reliability of AI-based tools for cardiovascular diagnostics.
Medical
Abstract
Cardiovascular
Artificial Intelligence–Enabled Electrocardiography for Detecting Risk of Rehospitalization in patients with Heart Failure
We hypothesized that AI-enabled ECG scores would show distinct temporal patterns after hospital discharge in patients with HF, and that these patterns would differ between patients who experienced rehospitalization and those who did not. This single-center retrospective study analyzed ECG data from patients hospitalized for HF between March 2017 and January 2025 in South Korea. Post-discharge, ECGs were processed using AI-ECG models for left ventricular systolic dysfunction (LVSD), diastolic dysfunction (LVDD), and myocardial infarction (MI). We compared AI-ECG patterns in patients readmitted within six months vs. those not (Figure 1). Temporal trends in AI-ECG scores were assessed using a mixed-effects linear regression model with group and time as fixed effects, and patient as a random effect. Among 1,007 patients, 1,539 hospitalization events were identified. A total of 1,674 ECGs from 269 rehospitalized and 4,066 ECGs from 917 non-rehospitalized patients were collected from 180 days before to 60 days after the index readmission or follow-up end. The mean age was 65.2 years, and 63.1% were male. Diabetes mellitus and chronic kidney disease were significantly more prevalent in the rehospitalization group, whereas other comorbidities were comparable. Significant differences in ECG intervals and axes were also observed, with no notable difference in heart rate. In the LVSD model, rehospitalized patients showed higher scores overall (β = 7.96, 95% CI: 3.18–12.75, p = 0.001) (Figure 2). Time since discharge was associated with decreasing scores (β = –0.096/day, 95% CI: –0.104 to –0.087, p<0.001), but this decline was attenuated in the rehospitalization group (interaction β = 0.092, 95% CI: 0.069–0.115, p<0.001). The LVDD model demonstrated a similar trend, while the MI model exhibited no statistically significant differences in scores (Figure 3). AI-ECG models show potential as dynamic biomarkers for detecting early physiological deterioration and predicting readmission risk in HF patients. These findings support their use in future patient monitoring strategies.
Medical
Abstract
Cardiovascular
EARLY ACUTE MYOCARDIAL INFARCTION RISK STRATIFICATION IN THE EMERGENCY DEPARTMENT: AI-ENHANCED ELECTROCARDIOGRAM AND THE 10-MINUTE RULE
Our team previously developed an AI-ECG method for diagnosing ST-segment elevation myocardial infarction (STEMI) and non-ST-segment elevation myocardial infarction (NSTEMI) using 12-lead electrocardiograms (ECGs), demonstrating superior performance compared to cardiologists (Sci Rep 10, 20495 [2020]). In 2023, this approach was approved as an innovative technology in South Korea (AiTAMI v1.00.00). External validation was conducted across 18 emergency centers (ROMIAE study). Building on these findings, we introduce the “10-minute rule” for early risk assessment of acute myocardial infarction (AMI). We trained AiTAMI v2.00.00 using a foundation model and ECG data from the ROMIAE cohort collected across 14 hospitals. The model was validated at four additional centers, comprising 1,480 patients (Non-AMI = 1,150; NSTEMI = 198; STEMI = 132). Model performance and risk stratification were evaluated using AUROC, clinical endpoints, and decision rule performance. The updated model improved AUROC from 0.887 to 0.906 and AUPRC from 0.760 to 0.795. The 10-minute rule-out strategy identified 23.2% of patients with a negative predictive value (NPV) of 99.7%, while the rule-in strategy identified 24.4% of patients with a positive predictive value (PPV) of 68.5%. AI-ECG utilizing the 10-minute rule can classify 47.6% of chest pain patients early in emergency settings, indicating a potential paradigm shift in the management of AMI.
Tech
Conference
Non-cardiovascular
Benchmarking ECG Delineation using Deep Neural Network-based Semantic Segmentation Models
Accurate electrocardiogram (ECG) delineation is essential for automated cardiac diagnosis, enabling the precise identification of key waveforms such as the P wave, QRS complex, and T wave. This study presents the first comprehensive benchmarking of neural network-based semantic segmentation models for ECG delineation, evaluating their accuracy, resource efficiency, and robustness across both public and private datasets. Our results demonstrate that convolutional neural network (CNN)-based approaches consistently achieve superior accuracy compared to Transformer-based approaches. Additionally, we observed the presence of fragmented segments in the delineation results. To address this issue, we explored post-processing techniques to consolidate or eliminate fragmented segments using an optimal configuration, leading to performance improvements. Furthermore, by analyzing performance variations across different waveform labels, we provide critical insights into key considerations for ECG segmentation tasks. Notably, our findings also reveal that larger model sizes do not necessarily correlate with better performance. Based on our findings, we propose a set of practical guidelines for leveraging segmentation models in ECG delineation, offering valuable direction for future research and clinical applications.
Proceedings of the Conference on Health, Inference, and Learning
June 25, 2025
View original text(in a new window)
Tech
Conference
Non-cardiovascular
Test-Time Calibration: A Framework for Personalized Test-Time Adaptation in Real-World Biosignals
Test-Time Adaptation (TTA) methods have been widely used to enhance model robustness by continuously updating pre-trained models with unlabeled target data. However, in real-world biosignal applications-where factors such as age, lifestyle, and comorbidities induce significant variability—traditional TTA often falls short in addressing personalization needs. To satisfy such needs, we introduce a novel Test-Time Calibration (TTC) framework that integrates continuous self supervised adaptation on unlabeled samples with periodic supervised calibration using the sporadically available ground-truth labels. Our approach leverages a model equipped with dual heads for supervised learning (SL) and self-supervised learning (SSL), and further incorporates a dual buffer along with a weighted batch sampling strategy to effectively manage and utilize both data types during the test phase. We evaluate our framework on two distinct datasets: the publicly available PulseDB, a benchmark for cuff-less blood pressure estimation, and a private ICU dataset collected from critically ill patients. Experimental results demonstrate that our approach improves blood pressure prediction accuracy and robustness, highlighting its suitability for dynamic, personalized biosignal applications.
Proceedings of the Conference on Health, Inference, and Learning
June 25, 2025
View original text(in a new window)
Tech
Journal
Non-cardiovascular
Unveiling the secrets of neural network scaling for ECG classification
We present a new perspective on scaling neural networks for electrocardiograms (ECG). Although ResNet-based models are widely used in ECG classification, the potential benefits of network scaling remain unexplored. Our research investigates the impact of changes in the depth of layers, the number of channels, and the dimensions of the convolution kernels on performance. Contrary to computer vision practices, we found that shallower networks, with more channels and smaller kernels, lead to better performance for ECG classifications. Based on these findings, we provide insights that can guide the efficient development of models in practice. Finally, we explore why scaling hyperparameters affects ECG and computer vision differently. Our findings suggest that the inherent periodicity of the ECG signals plays a crucial role in this difference.
Tech
Conference
Non-cardiovascular
New Test-Time Scenario for Biosignal: Concept and Its Approach
Online Test-Time Adaptation (OTTA) enhances model robustness by updating pretrained models with unlabeled data during testing. In healthcare, OTTA is vital for realtime tasks like predicting blood pressure from biosignals, which demand continuous adaptation. We introduce a new test-time scenario with streams of unlabeled samples and occasional labeled samples. Our framework combines supervised and self-supervised learning, employing a dual-queue buffer and weighted batch sampling to balance data types. Experiments show improved accuracy and adaptability under real-world conditions.
Findings paper presented at Machine Learning for Health (ML4H)
November 26, 2024
View original text(in a new window)
