The Problem

The problem that CalibraMics addresses is recognized by multi-national initiatives and regulatory bodies supported by the scientific evidences produced by several projects and resulting publications. Below you can find some of the selected headlines:

Medical AI performance degrades over time due to changes in input data !

The Advanced Research Projects Agency for Health (ARPA-H) Statement (August, 2024)
“Artificial Intelligence (AI) is becoming an increasingly important tool used to help support clinical decision making. Since 2018, the number of available AI-enabled medical devices in the U.S. has increased by tenfold and will likely continue growing at similar rates in the future. However, research suggests that the accuracy of Machine Learning (ML) models may degrade over time due to changes in input data – such as changes in clinical operations, data acquisition, patient population, or even IT infrastructure.”
Click here to read the full statement…

Lack of reproducibility in quantitative image analysis is a major challenge for Radiomics !

The image biomarker standardisation initiative (IBSI) Statement (2020)
“IBSI is an independent international collaboration which works towards standardising the extraction of image biomarkers from acquired imaging for the purpose of high-throughput quantitative image analysis (radiomics). Lack of reproducibility and validation of high-throughput quantitative image analysis studies is considered to be a major challenge for the field. “
Click here to read the full statement…

Data drift is a risk !

The U.S. Food and Drug Administration (FDA), Health Canada, and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA) Joint Statement (March, 2025)
“Article 10. Deployed Models Are Monitored for Performance and Re-training Risks Are Managed: Deployed models have the capability to be monitored in “real world” use with a focus on maintained or improved safety and performance. Additionally, when models are periodically or continually trained after deployment, there are appropriate controls in place to manage risks of overfitting, unintended bias, or degradation of the model (for example, dataset drift) that may impact the safety and performance of the model as it is used by the Human-AI team.. “
Click here to read the full statement…

Computed Tomography (CT) is susceptible to data drift !

Swiss Personalized Health Network (SPHN) Statement (2023)
QA4IQI: Quality Assessment for Interoperable Quantitative CT-Imaging Project:
“Medical images form one of the corner stones of diagnosis for various conditions. They provide not only non-invasive visual observations but also allow quantitative measurements of size and tissue texture. Similar to most measurement devices, such as a simple ruler, medical imaging devices also need to be calibrated. While calibration for geometric measurements and tissue density are commonly performed, features that quantify more complex characteristics are often neglected. For example, tissue heterogeneity is correlated with aggressiveness in various cancer types. Quantitative metrics that measure heterogeneity are not currently considered during calibration, which makes medical imaging difficult to use as a measurement tool. “
Click here to read the full statement…

Measurements not reliable for use with unseen CT scanners !

Nature, Scientific Reports Article (2022)
” Advanced artificial intelligence techniques, such as deep learning, take the quantitative analysis approach one step further…
An important limitation of the quantitative analysis approach is its sensitivity to variations in scanning conditions…
Critically, image characteristics heavily rely on the acquisition details, e.g., resolution, radiation dose, noise, reconstruction algorithm. Depending on the properties of the algorithm and the measurement, the extracted quantities can be highly sensitive to variations in the image acquisition parameters.
This sensitivity inhibits the generalisation capabilities of such measurements.
If acquisition details are not perfectly matched, two different images, even of the same tissue, will yield different measurements. A number of studies have reported the impact on CT radiomics analysis caused by the variability of acquisition parameters and post-process variables. Any algorithm or analysis based on these measurements will therefore not be reliable for use with unseen scanners. “
Click here to read the full article…

The Data Addition Dilemma !

Stanford University, UC Berkeley joint publication (August, 2024)
“In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the Data Addition Dilemma, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance.”
Click here to read the full article…