ECG Data Analysis

The healthcare industry is rapidly changing and will need to deliver better care at better prices in the future. Healthcare professionals are resorting to futuristic tools like Big Data and Analytics to drive innovation along with new medicines. The Industry has historically generated large amounts of data from record-keeping, compliance, regulatory requirements, and patient care. This digitized dataset includes structured and unstructured information. The data is also growing at an exponential rate. Reports say the US healthcare system data alone reached 150 EB in 2011. Such rapid growth means that the data will eventually be in the Zettabytes and eventually Yottabytes.

Healthcare systems and devices generate a massive volume of information. Health data comes in the form of electronic health records (EHRs), medical imaging, patient portals, genomic sequencing, medical device, biometric data, payer records, public records, and wearable devices. It also includes clinical data from CPOE, clinical decision support systems, laboratory, pharmacy, insurance, and other administrative data.

The volume of healthcare data being created will experience a CAGR of 36% through 2025. Globally, Big Data in the healthcare market is expected to reach $34.27 billion by 2022 at a CAGR of 22.07%. Big Data analytics is expected to be worth more than $68.03 billion by 2024. McKinsey estimates that Big Data analytics can help save the US Healthcare system over $300 billion a year.

By using digitizing documents and utilizing Big Data algorithms, healthcare organizations can benefit from detecting diseases earlier and detecting health care fraud more quickly. Certain outcomes can be estimated based on vast amounts of historical data. You can use data points such as length of stay (LOS) information, elective surgery data, information on the best patients for surgery, complications data, patients at risk for sepsis, MRSA, other hospital-acquired illness, etc.

Machine learning, deep learning, and cognitive computing can play a critical role in furthering cardiovascular (CV) medicine and enable precision CV medicine. There are a few problems that would be solved by Data Scientists and Doctors working together to create more accurate automated data analysis. These problems include low cost-effectiveness, overutilization, inadequate patient care, high readmission, and mortality rates in CV clinical.

Cardiovascular diseases and echocardiography can benefit from AI-ML implementation. ML has been successfully used to differentiate hypertrophic cardiomyopathy from normal heart hypertrophy in athletes by studying a cohort of 139 males undergoing 2D-echocardiography, and their classifier achieved an overall sensitivity of 87% and specificity of 82%. Deep learning could also help classifying echocardiography views. A CNN (Convolutional Neural Network) was trained to recognize 15 standard echocardiographic views, using a training and validation set of over 200,000 images and a test set of 20,000. It outperformed board-certified echocardiographers with an overall accuracy of 91.7%.


Historically, the majority of EHRs in spreadsheets and databases. Semi-structured data included instrument readings, data generated by the ongoing conversion of paper records to electronic health and medical records. However, unstructured data is becoming more prevalent. This data is generated from office medical records, handwritten nurse and doctor notes, hospital admission and discharge records, paper prescriptions, radiograph films, MRI, CT, and other images, etc. The congruency in data variety has diminished as it is being stored in many forms. Hence, health care data can be rarely standardized, is often fragmented, or generated in legacy IT systems with incompatible formats, and remains a challenge that needs to be addressed.

Very little of the data streaming from fitness devices, genetics, genomics, and social media can presently be captured, stored, and organized. In particular, healthcare applications need more efficient ways to combine and convert varieties of data, including automating conversion from structured to unstructured data. The lag between data collection and processing while using real-time Big Data analytics is another challenge.

Using Big Data can allow researchers to overcome the sample size limit observed in many clinical research trials. However, it may be prone to selection bias. Data may be different for geographic, insurance, medical history profiles resulting in complexity and inadequate description of the dataset variables and associated metadata. Thus, patients receiving two different treatments may have different distributions of a variable associated with an outcome of interest. In fact, a large volume does not essentially indicate a representative sample and may generate many false positives.

Also, the interpretation of analysis output may be biased by subjective assumptions and cognitive overload unless dedicated training and robust experience are available.

Solution Approach

Some major enablers towards large-scale adoption of AI-ML practices and precision CV-based medicine, available through, include dynamic availability of numerous analytics algorithms, models, and methods in a pull-down type of menu, easy management of important issues like data ownership, governance and standards, continuous data acquisition and data cleansing. platform provides an out-of-the-box development framework. The project was started with the relevant environments, which were then created automatically. Development images configured based on pre-defined templates were installed on-premises or in a development VM within the infrastructure. This enabled authentication using LDAP, seamless project setup using Bitbucket, Jenkins, and Docker (ensuring build and deployment without software compatibility issues).

The platform made available by leverages the latest ML and DL tools while preparing models. It includes Pachyderm-based data versioning, Kubernetes, Kubeflow, and Spark-based ML and DL. It also includes an Istio-based service mesh-enabled microservice architecture, and ELK-based monitoring capability, contributing to a reduction in latency time.

The platform can potentially be fed with input data (relational data for patient demographics, text notes, and image data in DICOM standard).’s MLOps platform allows establishing high-end Alluxio and Presto-based efficient data connectivity and collecting data from diverse sources. was used to migrate data to an integrated data lake based on Hadoop. The solution comprises Big Data components, including Apache Solr as the federated search platform, seamlessly integrated into this data source. image processing components that use ML-based algorithms were employed to generate relevant metadata for each ECG image. The solution enabled quick and informative image retrieval from the integrated Hadoop data lake.

The details collected were added as exploratory variables by using libraries and analyzed. The attributes obtained were used for categorization (employing Pachyderm-based data versioning) and then performing univariate, bi-variate, and Bag of Words analysis — for both structured and unstructured datasets through xpresso Exploratory Data Analysis (Data and Statistical Analysis). Different datasets and their different versions were easily controlled and stored into an xpresso Data Model (XDM)-enabled data store that enabled easy retrieval and storage of datasets/ files into internal XDM. This was achieved using two excellent features of

  1. Data Connectivity Marketplace libraries
  2. Data Versioning

Finally, we were able to process huge volumes of image metadata and scale up by almost 70%, enabling our clients and end-users to derive quicker insights from ECG images.

How can help Healthcare Organizations transform their journey to cognitive AI solutions is an AI/ML Application Lifecycle Management Platform. enables complete lifecycle management of AI/ML solutions, addressing the AI transformation journey of enterprises on any cloud platform of choice. offers functionality essential for building AI/ML solutions – primarily enabling data scientists to rapidly build predictive and prescriptive models. The platform provides a user-friendly interface to develop, deploy, and manage AI/ML solutions at scale. In addition, supports the incorporation of these solutions into business processes, surrounding infrastructure, products and applications.

Key benefits of include:

  • Empowers data scientists to transform AI/ML research into solutions
  • Improves the productivity of data scientists by enabling them to focus on the business problem, developing algorithms and rapid experimentation of models
  • Addresses the shortage of skilled data science resources with automated workflows, toolkits and frameworks
  • Manages AI transformation journey costs without any wastage of R&D efforts
  • Provides an enterprise-ready and secure environment for complete lifecycle management of AI/ML applications
  • Enables at-scale deployment of enterprise AI/ML applications on-premise, cloud (AWS, GCP, Azure), or hybrid environments

Additional details on can be found at: We can schedule a demo of the platform for anyone interested in learning more.

Have Any Questions?

Need more information about the platform?