Anomaly Detection Module

Overview:

The healthcare industry might benefit greatly from machine learning because it may help make sense of the massive volumes of healthcare data generated daily in electronic health records. Using machine learning approaches, such as machine learning algorithms, we can find patterns and insights that would be impossible to spot manually in the healthcare sector [1].

Healthcare providers now have the chance to embrace a more predictive strategy that builds a more cohesive system with improved patient-based processes as machine learning in healthcare becomes more widely used.

Automating medical billing, providing clinical decision assistance, and creating clinical practice standards within health systems are the three most prevalent uses of machine learning in healthcare. In science and medicine, there are numerous high-level applications of machine learning and healthcare principles. The first deep learning system for healthcare was created by data scientists at MD Anderson to foresee acute toxicity in patients undergoing radiation therapy for head and neck tumors. When used in clinical processes, deep learning in healthcare can produce data that automatically recognize complicated patterns and provides primary care providers with clinical decision support at the point of treatment within the electronic health record.

Large volumes of unstructured healthcare data for machine learning represent almost 80% of the information held or “locked” in electronic health record systems. These are not data elements but relevant data documents or text files with patient information, which in the past could not be analyzed without a human reading through the medical records. Human language, or “natural language,” is very complex, lacking uniformity, and incorporates an enormous amount of ambiguity, jargon, and vagueness. In order to convert these documents into more valuable and analyzable data, machine learning in health care often relies on artificial intelligence, like natural language processing (NLP) programs. Most deep learning in healthcare applications that use NLP require some form of healthcare data for machine learning.

 

Anomaly Detection Module:

This module will provide efficient event, and threat data classification based on specific rules related to cyber security requirements and cyber-threats level of criticality, novel machine-learning (ML) models will be developed in this task. In particular, it is expected that adaptations of existing ML models utilized in anomaly detection and/or threat classification will be incorporated, which will match the requirements of the health systems.

The machine learning module will take the input from HEIR IoT (Logs) and process the records to differentiate the anomalies and non-anomalies. After that, the ML component will process the results in a detailed report. The result will be visualized in the AEGIS toolkit to represent the results tangibly. Machine learning is a part of the HEIR facilitators service shown in the figure below (See Figure 1).

 

Graphical user interface, application

Description automatically generated

Figure 1 Anomaly Detection Overview

 

Algorithm Used:

The random forest algorithm is a supervised learning algorithm that uses an ensemble learning method for regression (See Figure 2). The ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model. This entails several steps that are explained below:

1.  Implementing the Random Forest Algorithm

2.  Training the model using dummy data

3.  Evaluating the achieved model with other supervised algorithms

4.  Initialized the ML component to adapt the logs:

  • Padding: value to use to fill holes
  • Label Encoding: refers to converting the labels into a numeric form
  • Dependent/Independent values: What the changes/ What is being studied
  • Variables Importance: quantify the usefulness of all the variables

5. Training model using Real Data.

 

Diagram

Description automatically generated

Figure 2 Algorithm Structure


General Flow:

The machine learning component will rely on archival data to be fed to the component to be trained correctly. Then, the ML component will consume the real-time data shared by REST API to differentiate each record captured by the monitoring system and share the result later on. Bearing in mind that the ML component has a separate docker for each use case to fulfill and liaise the requirement for each one of them.

Furthermore, the ML will post a JSON output to the Elastic-search engine to index the results tangibly. After this step, the GUI toolkit will be triggered regularly in terms of time and new records captured to exhibit the anomalies and non-anomalies records to the auditor. Eventually, the auditor will dive into more details to discover the suspicious outputs.
 

Diagram

Description automatically generated

Figure 3 High-level Idea



Conclusion

Machine learning is crucial in improving the healthcare sector regarding privacy and security. On the other hand, a Random forest regression model is robust and accurate. It usually performs excellently on many problems, including features with non-linear relationships. A basic algorithm was implemented and initial data was extracted to depict the operation from this algorithm.

 

 

References:

[1] https://www.foreseemed.com/blog/machine-learning-in-healthcare