The power of multimodal data-driven medicine

Published on 12/27/2023

6 min read

Capturing the complexity of human health and disease through machine learning analysis of multimodal data has the potential to drive the future of healthcare.

What do we mean by multimodal data?

Multimodal healthcare datasets synergistically integrate diverse data modalities such as genomic, clinical, radiomic, proteomic, and biological data, to provide comprehensive insights into human biology and medical conditions. Multimodal datasets have the potential to predict outcomes more accurately and informatively than the sum of their parts (Fig. 1). 

Figure 1.Multimodal healthcare data integrated and analyzed by artificial intelligence (AI)/machine learning can provide useful information for healthcare professionals to use to improve patient care.
Genomics data.
Radiomics data include x-rays, CT scans, MRI scans, ultrasound images, and mammograms.
Clinical and biological data from electronic health records include patient histories, demographics, notes, diagnosis codes, procedure codes, laboratory results, and vital signs. 
Proteomics data.
Digital pathology data.
Patient-reported data includes questionnaires and health journals, as well as data from wearable devices monitoring heart rate, sleep patterns, and activity levels, and implantable devices such as pacemakers, insulin pumps, and continuous blood glucose monitors.
Environmental data includes air quality and location data.

How are multimodal data and artificial intelligence (AI) advancing healthcare?

New data-driven technologies powered by novel ways of linking and analyzing patient data are set to transform the way that healthcare is delivered.1 Healthcare professionals routinely make use of multiple sources of data to arrive at a diagnosis and to decide on patient management.2 However, a significant level of expertise is required for an in-depth understanding of even a single data type (e.g. radiological images) such that it is unfeasible for individual healthcare professionals to master all areas. AI/machine learning technologies can be leveraged to bring together and analyze multimodal healthcare data, breaking data silos and creating robust and accurate predictive models.3 With the appropriate guidance around decision-making and communication, the valuable insights gained from these predictive models have the potential to support healthcare professionals to improve patient care. 

Machine learning technologies can integrate data from disparate multimodal sources to provide a holistic understanding of patients’ health and medical conditions. Data are combined from multiple modalities with the aim of extracting complementary information to power predictive models that can find relationships between different variables/features that are not clearly visible or known by healthcare professionals. Indeed, multimodal data fusion models have consistently shown to provide increased accuracy (1.2-27.7% higher) and performance (AUC 0.02-0.16 higher) than models that utilize data from single modalities for the same task.4

Oncology is one of the medical specialties that most commonly leverages multimodal methods for clinical decision support.5 Machine learning technologies have the potential to explore complex and diverse data to support healthcare professionals from screening to treatment (including relapse).6 Identification of risk factors can support non-invasive patient screening and preventive care.3 Detection of patterns in easily accessible data can help identify diagnostic or prognostic biomarkers to improve patient risk stratification or selection for clinical trials. Identification of predictive signatures of risk factors, adverse treatment reactions, treatment responses, or treatment benefit, can guide decisions around patient management. 

Figure 2. The number of PubMed articles published on multimodal oncology data has dramatically increased in recent years.
PubMed search for ((multimodal) AND (oncology)) OR ((multimodal) AND (cancer)).
*2023 analysis includes data available at time of writing (January-September).

With data privacy and security paramount, multimodal healthcare data can also be leveraged to accelerate advances in medical research, such as the discovery of novel biomarkers and therapeutic targets for drug development, as well as supporting population health management by providing a comprehensive view of health trends and outcomes. The rapid increase in peer-reviewed publications on the topic over the last 13 years demonstrates that the extraordinary value of multimodal oncology data is already recognized by the scientific and medical communities (Fig. 2). Leveraging machine learning to collate and analyze the vast diversity of multimodal data for data-driven precision medicine is on track to drive the next revolution in healthcare. 

Data-driven insights with SOPHiA DDM™️ multimodal healthcare analytics

SOPHiA DDM™ multimodal healthcare analytics will have the potential to break data silos by streamlining the integration of longitudinal oncology data from multiple sources and modalities – including but not limited to genomic, radiomic, digital pathology, biological, and clinical data. The SOPHiA DDM™ Platform uses machine learning-powered analytics to assemble, standardize, and transform multimodal data into accessible data-driven insights, facilitating the identification of multimodal predictive signatures, as well as treatment response patterns and trends. To learn more and get in touch, visit the webpage.

Product in development – Technology and concepts in development. May not be available for sale.


  • Area under the ROC curve (AUC) – A ROC (receiver operating characteristic) curve is a graph that plots true and false positive rates to demonstrate the performance of a model. AUC measures the area underneath the ROC curve to provide an aggregate measure of performance. AUC values range between 0 and 1, with a score of 0 meaning that all predictions are wrong, and a score of 1 meaning that all predictions are 100% correct. Essentially, AUC represents the probability that a positive result is truly positive and a negative result is truly negative.

  • Digital pathology images – Scanned images of tissue samples on glass slides.
  • Omics data – Large-scale information related to the biology of organisms.


  1. Academy of Medical Sciences. 2018. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/ Accessed Sept 2023.
  2. Rockenbach MABC. Accessed Sept 2023.
  3. Lipkova J, et al. Cancer Cell. 2022 Oct 10;40(10):1095-1110.
  4. Huang SC, et al. NPJ Digit Med. 2020 Oct 16;3:136.
  5. Kline A, et al. NPJ Digit Med. 2022 Nov 7;5(1):171.
  6. He X, et al. Semin Cancer Biol. 2023 Jan;88:187-200.

About the Author

Mallory Gough

Proposition and Content Marketing Senior Manager

Sign Up To Our Newsletter

Enter your email address to join our mailing list and receive the latest news and updates from SOPHiA GENETICS. You can unsubscribe by using the link integrated with the communication at any time. By clicking to “sign up”, you agree that SOPHiA GENETICS will process your data in accordance with its Privacy Policy

pink dots icon

Related Posts

pink dots icon

Want to know more?
Get in touch with us.

Our client services team is on hand to help.