Insights From Medical Documents

About the Customer & Challenges Faced:

A healthcare payer based in Indiana wanted an innovative solution using the tenets of Artificial Intelligence to extract information from thousands of PDFs of hospital charts and medical reports. The solution is helping them flag symptoms, diseases, and medical conditions that are shared with physicians to facilitate closer engagement with patients for any possible oversight.​

Solution and Approach:

Deep Learning-based solution built using platform, ensured ease-of-access and streamlined the analysis of almost 500 different hospital charts. was used to augment the manual workflow with automation that renders relevant pages to read (manually) instead of the whole document. 

  • Relevant environments were created automatically using framework.​
  • Development images configured based on pre-defined templates were installed on cloud VM within the infrastructure.​
  • xpresso Data Connectivity for collecting data from diverse sources.​
  • Different datasets and their different versions were easily controlled and stored into xpresso Data Model.​
  • The details collected were added as exploratory variables by using libraries and were analyzed. ​
  • The attributes obtained were used for categorization (employing xpresso Data Versioning) and then performing univariate, bi-variate and Bag of Words analysis through xpresso Exploratory Data Analysis (Data and Statistical Analysis). ​
  •’s framework based on Deep Learning models allowed quick reproduction of the model development process, thus enabling model validators to monitor and review the model and its potential limitations more closely.​
  • Deployment using xpresso Deployment Module.​
  • Computer Vision – OCR (Optical Character Reader)-based Decomposition Engine was used for handling 90-degree page rotation, image-level layout analysis, connected component analysis, column/ block finding, and deep-learning-based text extraction from the hospital charts.​
  • Pipeline 1 – CV pipeline which was used for pdf decomposition.​​
  • Pipeline 2 – NLP pipeline to categorize extracted text into disease, procedure, body organ and drug.​​
  • Experiments were conducted for each pipeline to create challenger models using Experimentation Module.​
  • xpresso Inference Service: Output from the two pipelines were combined, duplicate entries from both the medical word and the context were removed, and output was generated in CSV format.​​
  • QuickUMLS (Unified Medical Language System) package was used for identifying medical words (keywords) and context of the word, however, medical words were detected and stored in a user-defined dictionary to pre-define them so that those words were not corrected by the spelling autocorrect module, minimizing risks of data inaccuracy. ​
  • Medical words and their context was defined based on 6 words before and 6 words after the medical word.​
  • xpresso Model Monitoring is being used to monitor model performance.​


  • By using, one can leverage high-end data connectivity, efficient data versioning, perform exploratory data analysis and generate inferences using an intuitive process and through an industry-standardized manner.
  • The unique, containerized platform-centric approach offered by can be used to employ required infrastructure, deploy rapidly to multiple high-availability environments while aligning with best-in-class DevSecOps practices.
  • also brings in-depth QA-QC testing and logging frameworks, synchronous and asynchronous monitoring, and performance tracking ability.
  • also has SSO (single-sign-on) for various in-built tools and subsystems that make the platform access seamless throughout.​
  • In a nutshell, all the above features in a single plate under the same hood make an unbeatable AI Ops framework.

Have Any Questions?

Need more information about the platform?