Recent Advances in Medical Image Processing


  • 1 Hangzhou Zhiwei Information and Technology Inc., Hangzhou, China.
  • 2 Hangzhou Zhiwei Information and Technology Inc., Hangzhou, China, [email protected].
  • PMID: 33176311
  • DOI: 10.1159/000510992

Background: Application and development of the artificial intelligence technology have generated a profound impact in the field of medical imaging. It helps medical personnel to make an early and more accurate diagnosis. Recently, the deep convolution neural network is emerging as a principal machine learning method in computer vision and has received significant attention in medical imaging. Key Message: In this paper, we will review recent advances in artificial intelligence, machine learning, and deep convolution neural network, focusing on their applications in medical image processing. To illustrate with a concrete example, we discuss in detail the architecture of a convolution neural network through visualization to help understand its internal working mechanism.

Summary: This review discusses several open questions, current trends, and critical challenges faced by medical image processing and artificial intelligence technology.

Keywords: Artificial intelligence; Convolution neural network; Deep learning; Medical imaging.

© 2020 S. Karger AG, Basel.

Publication types

  • Deep Learning
  • Diagnosis, Computer-Assisted*
  • Image Interpretation, Computer-Assisted*
  • Neural Networks, Computer*
  • Predictive Value of Tests
  • Reproducibility of Results

Help | Advanced Search

Electrical Engineering and Systems Science > Image and Video Processing

Title: a systematic collection of medical image datasets for deep learning.

Abstract: The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analysis. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require many resources, such as human expertise and funding. That makes it difficult for non-medical researchers to have access to useful and large medical data. Thus, as comprehensive as possible, this paper provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected information of around three hundred datasets and challenges mainly reported between 2013 and 2020 and categorized them into four categories: head & neck, chest & abdomen, pathology & blood, and ``others''. Our paper has three purposes: 1) to provide a most up to date and complete list that can be used as a universal reference to easily find the datasets for clinical image analysis, 2) to guide researchers on the methodology to test and evaluate their methods' performance and robustness on relevant datasets, 3) to provide a ``route'' to relevant algorithms for the relevant medical topics, and challenge leaderboards.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Medical images classification using deep learning: a survey

  • Published: 28 July 2023
  • Volume 83 , pages 19683–19728, ( 2024 )

Cite this article

  • Rakesh Kumar 1 ,
  • Pooja Kumbharkar 1 ,
  • Sandeep Vanam 1 &
  • Sanjeev Sharma   ORCID: 1  

782 Accesses

Explore all metrics

Deep learning has made significant advancements in recent years. The technology is rapidly evolving and has been used in numerous automated applications with minimal loss. With these deep learning methods, medical image analysis for disease detection can be performed with minimal errors and losses. A survey of deep learning-based medical image classification is presented in this paper. As a result of their automatic feature representations, these methods have high accuracy and precision. This paper reviews various models like CNN, Transfer learning, Long short term memory, Generative adversarial networks, and Autoencoders and their combinations for various purposes in medical image classification. The total number of papers reviewed is 158. In the study, we discussed the advantages and limitations of the methods. A discussion is provided on the various applications of medical imaging, the available datasets for medical imaging, and the evaluation metrics. We also discuss the future trends in medical imaging using artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

medical image processing research papers 2020

Similar content being viewed by others

medical image processing research papers 2020

A survey on Image Data Augmentation for Deep Learning

Connor Shorten & Taghi M. Khoshgoftaar

medical image processing research papers 2020

Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions

J. Matthew Helm, Andrew M. Swiergosz, … Prem N. Ramkumar

medical image processing research papers 2020

Brain tumor detection and classification using machine learning: a comprehensive survey

Javaria Amin, Muhammad Sharif, … Ramesh Sundar Nayak

Data Availability

Data sharing not applicable to this article as no datasets were generated during the current study. The datasets which used in this study their references are given in Table  7 .

Abadeh MS, Shahamat H (2020) Brain MRI analysis using a deep learning based evolutionary approach. Neural Netw 126:218–234.

Google Scholar  

Abdulkareem K et al (2022) Automated system for identifying COVID-19 Infections in computed tomography images using deep learning models. In: Journal of healthcare engineering 2022.

Abdullah SM et al (2023) Deep transfer learning based parkinson’s disease detection using optimized feature selection. IEEE Access 11:3511–3524.

Abdulsahib A, Mahmoud M (2022) An Automated Image Segmentation and Useful Feature Extraction Algorithm for Retinal Blood Vessels in Fundus Images. Electronics 11:1295.

Abideen ZU, Ghafoor M, Munir K, Saqib M, Ullah A, Zia T, Tariq SA, Ahmed G, Zahra A (2020) Uncertainty assisted robust tuberculosis identification with bayesian convolutional neural networks. IEEE Access 8:22812–22825. IEEE

Al-Saffar ZA, Yildirim T (2020) A novel approach to improving brain image classification using mutual information-accelerated singular value decomposition. IEEE Access 8:52575–52587.

Allaouzi I, Ahmed BM (2019) A novel approach for multi-label chest x-ray classification of common thorax diseases. IEEE Access 7:64279–64288.

Alzubaidi LZ, Humaidi J (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8.53.

Anitha V, Murugavalli S (2016) Brain tumour classification using two-tier classifier with adaptive segmentation technique. IET Comput Vis 10:9–17.

Ansingkar NP, Patil R, Deshmukh PD (2022) An efficient multi class Alzheimer detection using hybrid equilibrium optimizer with capsule auto encoder. Multimedia Tools and Applications, pp 1–32

Arias-Garzón D et al (2021) COVID-19 detection in X-ray images using convolutional neural networks. Mach Learn Appl 6:100138. ISSN: 2666-8270. ,

PubMed   PubMed Central   Google Scholar  

Ashraf R et al (2020) Deep convolution neural network for big data medical image classification. IEEE Access 8:105659–105670.

Asifullah K et al (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53:5455–5516. ISSN: 1573-7462.

Baldi P (2011) Autoencoders, unsupervised learning and deep architectures. In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop - vol 27. UTLW’11. Washington, USA:, pp 37–50

Bank D, Koenigstein N, Giryes R (2021) Autoencoders. arXiv: 2003.05991 [cs.LG.]

Bian J et al (2021) Skin lesion classification by multi-view filtered transfer learning. IEEE Access 9:66052–66061.

Blood Cell Images (2018)

Brain MRI Images for Brain Tumor Detection (2019)

Brain-Tumor-Progression (2021)

Brima Y, Tushar MHK, Kabir U, Islam T (2021) Deep transfer learning for brain magnetic resonance image multi-class classification. arXiv: 2106.07333 [cs.CV]

COPD Machine Learning Datasets (2018)

COVID-19 Radiography Dataset (2020)

CT Images in COVID-19 (2021)

Chai Y, Liu H, Xu J (2018) Glaucoma diagnosis based on both hidden features and domain knowledge through deep learning models. Knowl Based Syst 161:147–156. ISSN:0950-7051. ,

Charte D, Charte F, Garca S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Inf Fusion 44:78–96. ISSN: 1566-2535. ,

Chest X-Ray Images (Pneumonia) (2018) .

Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. arXiv: 1610.02357 [cs.CV]

Chowdhary CL, Acharjya D (2016) A hybrid scheme for breast cancer detection using intuitionistic fuzzy rough set technique. Int J Healthc Inf Syst Inform 11.2:38–61.

Chowdhary CL, Acharjya D (2016) Breast cancer detection using intuitionistic fuzzy histogram hyperbolization and possibilitic fuzzy c-mean clustering algorithms with texture feature based classification on mammography images. In: Proceedings of the international conference on advances in information communication technology & computing.

Chowdhary CL, Acharjya D (2020) Segmentation and feature extraction in medical imaging: A systematic review. Procedia Comput Sci 167:26–36.

DA Zebari, DA Ibrahim, HJ Mohammed (2022) Effective hybrid deep learning model for COVID-19 patterns identification using CT images. Expert Systems.

DRIVE: Digital Retinal Images for Vessel Extraction (2012)

Das V, Dandapat S, Bora PK (2020) A data-efficient approach for automated classification of oct images using generative adversarial network. IEEE Sens Lett 4(1):1–4. IEEE

Das PK, Meher S (2021) An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia. Expert Systems with Applications, pp 115311. ISSN: 0957-4174. ,

Das K et al (2020) Detection of breast cancer from whole slide histopathological images using deep multiple instance CNN. IEEE Access 8:213502–213511.

Das AK et al (2021) TLCoV- An automated Covid-19 screening model using Transfer Learning from chest X-ray images. Chaos, Solitons Fractals 144:110713. ISSN: 0960–0779. ,

MathSciNet   PubMed   Google Scholar  

De Moura J et al (2020) Deep convolutional approaches for the analysis of COVID-19 using chest X-Ray images from portable devices. IEEE Access 8:195594–195607.

PubMed   Google Scholar  

Demir F (2021) DeepCoroNet: A deep LSTM approach for automated detection of COVID-19 cases from chest X-ray images. IEEE Access 103:107160.

Diabetic Retinopathy Detection (2015)

Diakite J, Xiaping X (2021) Hyperspectral image classification using 3D 2D CNN. IET Image Proc 15:1083–1092.

Elkorany AS, Elsharkawy ZF (2021) COVIDetection-Net: A tailored COVID-19 detection from chest radiography images using deep learning. Optik 231:166405. ISSN: 0030-4026. ,

ADS   CAS   PubMed   PubMed Central   Google Scholar  

Elmannai H, Hamdi M, AlGarni A (2021) Deep learning models combining for breast cancer histopathology image classification. Int J Comput Intell Syst 14(1):1003. Atlantis Press BV

Fradi M, Khriji L, Machhout M (2022) Real-time arrhythmia heart disease detection system using CNN architecture based various optimizers-networks. Multimed Tools Appl 81.29:41711–41732

Frid-Adar M et al (2018) GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321:321–331. ISSN: 0925-2312. ,

Gao Y, Wang R et al, Shi Y (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell springer 38:16–28.

García-Ordás MT et al (2020) Detecting respiratory pathologies using convolutional neural networks and variational autoencoders for unbalancing data. Sensors 20.4. ISSN: 1424-8220. ,

Garg NK, Chhabra P, Kumar M (2018) Content-based image retrieval system using ORB and SIFT features. Neural Comput Appl 32:2725–2733

Goodfellow IJ et al (2014) Generative adversarial networks. arXiv: 1406.2661 [stat.ML]

Greg VH, Carlos M, Gonzalo N (2020) A review on the long short-term memory model. Artif Intell Rev 53:5929–5955. ISSN: 1573–7462.

Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang L, Wang G, Cai J, Chen T (2017) Recent advances in convolutional neural networks. arXiv: 1512.07108 [cs.CV]

Hasan MM et al (2023) Review on the evaluation and development of artificial intelligence for COVID-19 containment. Sensors 23.1:527

ADS   Google Scholar  

He X, Fang L, Rabbani H, Chen X, Liu Z (2020) Retinal optical coherence tomography image classification with label smoothing generative adversarial network. Neurocomputing 405:37–47. ISSN: 0925-2312. ,

He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv: 1512.03385 [ cs.CV]

Heart Dieses Data Set (1988)

Hemanth DJ et al (2019) A modified deep convolutional neural network for abnormal brain image classification. IEEE Access 7:4275–4283.

Histology Image Collection Library (1988)

Howard AG et al (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861 [cs.CV]

Liao F, Chen X, Hu X, Song S (2017) Estimation of the volume of the left ventricle from MRI images using deep neural networks. IEEE Trans Cybern 49(2):495–504. IEEE

Hu J et al (2019) Squeeze-and-excitation networks. arXiv: 709.01507 [cs.CV]

Hu S et al (2020) Weakly supervised deep learning for COVID-19 infection detection and classification from CT images. IEEE Access 8:118869–118883.

Hu Z-P, Zhang R-X, Qiu Y, Zhao M-Y, Sun Z (2021) 3D convolutional networks with multi-layer-pooling selection fusion for video classification. Multimed Tools Appl 80:33179–33192. Springer

Huang G et al (2018) Densely connected convolutional networks. arXiv: 1608.06993 [cs.CV]

Huang L, Fang L, Rabbani H, Chen X (2019) Automatic classification of retinal optical coherence tomography images with layer guided convolutional neural network. IEEE Signal Process Lett 26.7:1026–1030.

Huang Q et al (2020) Blood cell classification based on hyperspectral imaging with modulated Gabor and CNN. IEEE J Biomed Health Inf 24.1:160–170.

Huang X et al (2020) Deep transfer convolutional neural network and extreme learning machine for lung nodule diagnosis on CT images. Knowl-Based Syst 204:106230. ISSN: 0950-7051. ,

Hussain E et al (2020) A comprehensive study on the multi-class cervical cancer diagnostic prediction on pap smear images using a fusion-based decision from ensemble deep convolutional neural network. Tissue Cell 65:101347. ISSN: 0040-8166. ,

Hussain SM et al (2022) Deep learning based image processing for robot assisted surgery: a systematic literature survey. IEEE Access 10:122627–122657.

Indian Diabetic Retinopathy Image Dataset (IDRID) (2019)

Indolia S et al (2018) Conceptual understanding of convolutional neural network- a deep learning approach. Procedia Comput Sci 132:679–688. ISSN: 1877-0509. ,

Inthiyaz S et al (2023) Skin disease detection using deep learning. Adv Eng Softw 175:103361

Jammula R, Tejus VR, Shankar S (2020) Optimal transfer learning model for binary classification of funduscopic images through simple heuristics. arXiv: 2002.04189 [cs.LG]

Jun TJ et al (2021) TRk-CNN: Transferable ranking-CNN for image classification of glaucoma, glaucoma suspect, and normal eyes. Exp Syst Appl 182:115211. ISSN: 0957–4174. ,

Khan NM, Abraham N, Hon M (2019) Transfer learning with intelligent training data selection for prediction of alzheimer’s disease. IEEE Access 7:72726–72735.

Khan MA, Muhammad K, Sharif M, Akram T, de Albuquerque VHC (2021) Multi-class skin lesion detection and classification via teledermatology. IEEE J Biomed Health Inf 25(12):4267–4275. IEEE

Kim S-H, Koh HM, Lee B-D (2021) Classification of colorectal cancer in histological images using deep neural networks: An investigation. Multimed Tools Appl 80.28:35941–35953

Kozegar E, Soryani M, Behnam H, Salamati M, Tan T (2020) Computer aided detection in automated 3-D breast ultrasound images: a survey. Artif Intell Rev 53:1919–1941. Springer

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with deep convolutional neural networks. In: Proceedings of the 25th International conference on neural information processing systems - vol 1. NIPS’12. Lake Tahoe, Nevada: Curran Associates Inc., pp 1097–1105

Kuang Y, Lan T, Peng X, Selasi GE, Liu Q, Zhang J (2020) Unsupervised multi-discriminator generative adversarial network for lung nodule malignancy classification. IEEE Access 8:77725–77734.

Kumar M, Bansal M, Sachdeva M (2021) Transfer learning for image classification using VGG19: Caltech-101 image data set. Journal of Ambient Intelligence and Humanized Computing.

Kumar D et al (2020) Automatic detection of white blood cancer from bone marrow microscopic images using convolutional neural networks. IEEE Access 8:142521–142531.

Labhsetwar SR et al (2020) Predictive analysis of diabetic retinopathy with transfer learning. arXiv: 2011.04052 [cs.CV]

Lecun Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86.11:2278–2324.

Li Z, Zhou D, Wan L, Li J, Mou W (2020) Heartbeat classification using deep residual convolutional neural network from 2-lead electrocardiogram. J Electrocardiol 58:105–112. ISSN: 0022-0736. ,

Li C et al (2021) Transfer learning for establishment of recognition of COVID-19 on CT imaging using small-sized training datasets. Knowl-Based Syst 218:106849. ISSN: 0950-7051. ,

Liang D, Sun L, Ma W, Paisley J (2020) A 3D spatially weighted network for segmentation of brain tissue from MRI. IEEE Trans Med Imaging 39:898–909.

Liang G et al (2018) Combining convolutional neural network with recursive neural network for blood cell image classification. IEEE Access 6:36188–36197.

Liu H, Huang KK, Ren CX, Lai ZR (2021) Hyperspectral image classification via discriminative convolutional neural network with an improved triplet loss. Pattern Recognition 112.

Liu Y, Wang W (2015) Simultaneous image fusion and denoising with adaptive sparse representation. IET Image Proc 9:347–357.

Liu X-J et al (2022) Few-shot learning for skin lesion image classification. Multimedia Tools and Applications, pp 1–12

Ma Y, Niu D, Zhang J et al (2021) Unsupervised deformable image registration network for 3D medical images. Applied Intelligence springer.

Mahmoudi R, Benameur N, Mabrouk R (2022) A Deep Learning-Based Diagnosis System for COVID-19 Detection and Pneumonia Screening Using CT Imaging. Appl Sci 12:4825.

CAS   Google Scholar  

Mallick PK et al (2019) Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7:46278–46287.

Mamalakis M et al (2021) DenResCov-19: A deep transfer learning network for robust automatic classification of COVID-19, pneumonia, and tuberculosis from X-rays. arXiv: 2104.04006 [eess.IV]

Martín EX, Velasco M, Angulo C et al (2014) LTI ODE-valued neural networks. Appl Intell Springer 41:594–605.

Martinez AR (2020) Classification of COVID-19 in CT scans using multi-source transfer learning

Masoudi S et al (2021) Deep Learning Based Staging of Bone Lesions From Computed Tomography Scans. IEEE Access 9:87531–87542.

Mehmood S et al (2022) Malignancy detection in lung and colon histopathology images using transfer learning with class selective image processing. IEEE Access 10:25657–25668.

Melanoma Cancer Cell Dataset (2020) .

Meng D et al (2017) Liver fibrosis classification based on transfer learning and FCNet for ultrasound images. IEEE Access 5:5804–5810.

Meng N et al (2019) Large-scale multi-class image-based cell classification with deep learning. IEEE J Biomed Inform 23.5:2091–2098.

Mercioni M-A, Stavarache LL (2022) Disease diagnosis with medical imaging using deep learning. In: Advances in information and communication: proceedings of the 2022 future of information and communication conference (FICC), vol 2. Springer, pp 198–208

Mijwil MM (2021) Skin cancer disease images classification using deep learning solutions. Multimed Tools Appl 80.17:26255–26271

Motamed S, Rogalla P, Khalvati F (2020) RANDGAN: Randomized generative adversarial network for detection of COVID-19 in chest X-ray. arXiv: 2010.06418 [eess.IV]

Moslehi S, Mahjub H, Farhadian M, Soltanian AR, Mamani M (2022) Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran. BMC Med Res Methodol 22 (1):339. Springer

Mousavi Z et al (2022) COVID-19 detection using chest X-ray images based on a developed deep neural network. SLAS Technology 27.1:63–75. ISSN: 2472-6303. ,

Muhammad G, Shamim Hossain M (2021) COVID-19 and non-COVID-19 classification using multi-layers fusion from lung ultrasound images. Inf Fusion 72:80–88. ISSN: 1566-2535. ,

NIH Chest X-ray Dataset (2018)

NIH DeepLesion dataset (2018) .

Nascimento JC, Carneiro G (2013) Combining Multiple Dynamic Models and Deep Learning Architectures for Tracking the Left Ventricle Endocardium in Ultrasound Data. IEEE Trans Pattern Anal Mach Intell 35.11:2592.

Nascimento JC, Carneiro G, Freitas A (2012) The segmentation of the left ventricle of the heart from ultrasound data using deep learning architectures and derivative-based search methods. IEEE Trans Image Process 21.3:968–982.

ADS   MathSciNet   Google Scholar  

Nasir Khan H et al (2019) Multi-view feature fusion based four views model for mammogram classification using convolutional neural network. IEEE Access 7:165724–165733.

Nigam B et al (2021) COVID-19: Automatic detection from X-ray images by utilizing deep learning methods. Exp Syst Appl 176:114883. ISSN: 0957-4174. ,

Noreen N et al (2020) A deep learning model based on concatenation approach for the diagnosis of brain tumor. IEEE Access 8:55135–55144.

Pantrigo JJ, Nunez JC, Cabido R, Montemayor AS (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit 76:80–94.

Peng Y, Zhu H, Han G, Zhao H (2021) Functional-realistic CT image super-resolution for early-stage pulmonary nodule detection. Future Gener Comput Syst 115:475–485.

Petrick N, Pezeshk A, Hamidian S, Sahiner B (2019) 3-D convolutional neural networks for automatic detection of pulmonary nodules in chest CT. In: IEEE Journal of biomedical and health informatics 23, pp 2080–2090.

Poloni KM et al (2021) Brain MR image classification for Alzheimer’s disease diagnosis using structural hippocampal asymmetrical attributes from directional 3-D log-Gabor filter responses. Neurocomputing 419:126–135. ISSN: 0925-2312. ,

Pulgar FJ, Charte F, Rivera AJ, del Jesus MJ (2020) Choosing the proper autoencoder for feature fusion based on data complexity and classifiers: Analysis, tips and guidelines. Inf Fusion 54:44–60. ISSN: 1566-2535. , .

Qureshi I, Shaheed K, Mao A, Zhang X (2022) Finger-vein presentation attack detection using depthwise separable convolution neural network. Expert Systems with Applications 198.

Rahman T, Khandakar A, Kadir MA, Islam KR, Islam KF, Mazhar R, Hamid T, Islam MT, Kashem S, Mahbub ZB et al (2020) Reliable tuberculosis detection using chest X-Ray with deep learning, segmentation and visualization. IEEE Access 8:191586–191601. IEEE

Raj A, Shah NA, Tiwari AK, Martini MG (2020) Multivariate regression-based convolutional neural network model for fundus image quality assessment. IEEE Access 8:57810–57821.

Rajaraman S, Antani SK (2020) Modality-specific deep learning model ensembles toward improving tb detection in chest radiographs. IEEE Access 8:27318–27326.

Reshi AA, Rustam F, Mehmood A, Alhossan A, Alrabiah Z, Ahmad A, Alsuwailem H, Choi GS (2021) An efficient CNN model for COVID-19 disease detection based on X-ray image classification. Complexity 2021:1–12. Hindawi Limited

Retinal OCT Images (optical coherence tomography) (2018),mooney/kermany2018 .

Rong Y et al (2019) surrogate-assisted retinal OCT image classification based on convolutional neural networks. IEEE J Biomed Health Inf 23.1:253–263.

Russell RL, Ozdemir O, Berlin AA (2020) A 3D probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose CT scans. IEEE Trans Med Imaging 39:1419–1429.

SJS Gardezi, Elazab A, Wang C, Bai H (2020) GP-GAN: Brain tumor growth prediction using stacked 3D generative adversarial networks from longitudinal MR Images. Neural Netw 132:321–332.

Saha S, Sheikh N (2021) Ultrasound image classification using ACGAN with small training dataset. arXiv: 2102.01539 [eess.IV]

Sakib S et al, Fouda MM, Fadlullah ZM, Guizani M (2020) DL-CRC: Deep learning-based chest radiograph classification for COVID-19 detection: A novel approach. IEEE Access 8:171575–171589.

Salama WM, Shokry A, Aly MH (2022) A generalized framework for lung cancer classification based on deep generative models. Multimed Tools Applic 81(23):32705–32722. Springer

Salehinejad H et al (2019) Synthesizing chest X-Ray pathology for training deep convolutional neural networks. IEEE Trans Med Imaging 38.5:1197–1206.

Saxena A, Singh SP (2022) A deep learning approach for the detection of COVID-19 from chest X-Ray images using convolutional neural networks.

Schmid V, Meyer-Baese A (2014) Pattern recognition and signal analysis in medical imaging, 2nd edn. Academic Press, Cambridge, pp 1–20.

Shamim S et al (2022) Automatic COVID-19 lung infection segmentation through modified Unet model. J Healthcare Eng 2022:6566982.

Sherstinsky A (2020) Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys D: Nonlinear Phenom 404:132306. ISSN: 0167-2789.

MathSciNet   Google Scholar  

Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556 [cs.CV]

Singh S, Tripathi B (2022) Pneumonia classification using quaternion deep learning. Multimed Tools Appl 81.2:1743–1764

Skin Cancer MNIST: HAM10000 (2018)

Soomro TA (2021) Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): a detailed review with direction for future research. Artificial Intelligence Review. ISSN: 1573-7462.

Sudharshan PJ et al (2019) Multiple instance learning for histopathological breast cancer image classification. Exp Syst Appl 117:103–111. ISSN: 0957-4174. ,

Sultan HH, Salem NM, Al-Atabany W (2019) Multi-classification of brain tumor images using deep neural network. IEEE Access 7:69215–69225.

Sun G, Wang X, Xu L, Li C, Wang W, Yi Z, Luo H, Su Y, Zheng J, Li Z et al (2023) Deep learning for the detection of multiple fundus diseases using ultra-widefield images. Ophthalmol Therapy 12(2):895–907. Springer

Suresh S, Mohan S (2019) NROI based feature learning for automated tumor stage classification of pulmonary lung nodules using deep convolutional neural networks. Journal of King Saud University - Computer and Information Sciences. ISSN: 1319-1578. , .

Szegedy C et al (2014) Going deeper with convolutions. arXiv: 1409.4842 [cs.CV]

Szegedy C et al (2015) Rethinking the inception architecture for computer vision, arXiv: 1512.00567 [cs.CV]

Szegedy C et al (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv: 1602.07261 [cs.CV]

The Cavy dataset (2016) .

Ting FF, Tan YJ, Sim KS (2019) Convolutional neural network improvement for breast cancer classification. Exp Syst Appl 120:103–115. ISSN: 0957-4174. ,

Trivizakis E et al (2019) Extending 2-D convolutional neural networks to 3-D for advancing deep learning cancer classification with application to MRI liver tumor differentiation. IEEE J Biomed Health Inf 23.3:923–930.

Tuberculosis (TB) Chest X-ray Database. 8.2 (2021)

Turkoglu M (2021) COVID-19 detection system using chest CT images and multiple kernels-extreme learning machine based on deep neural network. IRBM. ISSN: 1959-0318. ,

Vairamuthu S, Navaneethakrishnan M, Parthasarathy G (2021) Atom search-Jaya-based deep recurrent neural network for liver cancer detection. IET Image Proc 15:337–349.

van Grinsven MJJP, van Ginneken B et al (2016) Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images. IEEE Trans Med Imaging 35(5):1273–1284.

Waheed A, Goyal M, Gupta D, Khanna A, Al-Turjman F, Pinheiro PR (2020) Covidgan: data augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access 8:91916–91923. IEEE

Wang J (2020) OCT image recognition of cardiovascular vulnerable plaque based on CNN. IEEE Access 8:140767–140776.

Wang SW, Guo B, Y et al (2020) Twin labeled LDA: a supervised topic model for document classification. Appl Intell Springer 50:4602–4615.

Wang Q, Li Y, Wang Y, Ren J (2022) An automatic algorithm for software vulnerability classification based on CNN and GRU. Multimedia Tools and Applications, pp 1–22

Wang M, Jiang M (2019) Deep residual refining based pseudo-multi-frame network for effective single image super-resolution. IET Image Process 13:591–599.

Wang D, Wang L (2019) On OCT Image Classification via Deep Learning. IEEE Photonics J 11.5:1–14.

Wang Z et al (2019) Dilated 3D Convolutional neural networks for brain MRI data classification. IEEE Access 7:134388–134398.

Wang C et al (2019) Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 7:146533–146541.

Wang Y et al (2020) An optimized deep convolutional neural network for dendrobium classification based on electronic nose. Sens Actuator A Phys 307:111874. ISSN: 0924-4247. ,

ADS   CAS   Google Scholar  

Wang S-H et al (2021) Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inf Fusion 67:208–229. ISSN:1566-2535. ,

Wisconsin Breast Cancer Database (1992)

Wu J, Li Y, Wu Q (2019) Classification of breast cancer histology images using multi-size and discriminative patches based on deep learning. IEEE Access 7:21400–21408.

Wu Y, Yi Z (2020) Automated detection of kidney abnormalities using multi-feature fusion convolutional neural networks. Knowl-Based Syst 200:105873. ISSN: 0950-7051. ,

Xie L, Zhang L, Hu T, Huang H, Yi Z (2020) Neural networks model based on an automated multi-scale method for mammogram classification. Knowl-Based Syst 208:106465. Elsevier

Xie Y et al (2019) Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Trans Med Imaging 38.4:991–1004.

Xu Y, Lam H-K, Jia G (2021) MANet: A two-stage deep learning method for classification of COVID-19 from Chest X-ray images. Neurocomputing 443:96–105. ISSN: 0925-2312. ,

Xu J et al (2016) Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Trans Med Imaging 35.1:119–130.

Xu S et al (2020) Cxnet-M3: A Deep quintuplet network for multi-lesion classification in chest X-Ray images via multi-label supervision. IEEE Access 8:98693–98704.

Yang Z et al (2019) EMS-Net: Ensemble of multiscale convolutional neural networks for classification of breast cancer histology images. Neurocomputing 366:46–53. ISSN: 0925-2312. ,

Yang X et al (2019) A two-stage convolutional neural network for pulmonary embolism detection from CTPA images. IEEE Access 7:84849–84857.

Yao R, Fan Y, Liu J, Yuan X (2021) COVID-19 detection from X-ray images using multi-kernel-size spatial-channel attention network. Pattern Recognit 119.

Yu S et al (2021) Automatic classification of cervical cells using deep learning method. IEEE Access 9:32559–32568.

Zagoruyko S, Komodakis N (2017) Wide residual networks. arXiv: 1605.07146 [cs.CV]

Zeiler MD, Fergus R (2013) Visualizing and Understanding Convolutional Networks. arXiv: 1311.2901 [cs.CV]

Zeimarani B et al (2020) Breast lesion classification in ultrasound images using deep convolutional neural network. IEEE Access 8:133349–133359.

Zhang L et al (2017) DeepPap: Deep convolutional networks for cervical cell classification. IEEE J Biomed Health Inf 21.6:1633–1643.

Zhao C et al (2021) Dermoscopy image classification based on StyleGAN and DenseNet201. IEEE Access 9:8659–8679.

Zhao X et al (2022) Automatic thyroid ultrasound image classification using feature fusion network. IEEE Access 10:27917–27924.

Zhou L, Gu X (2020) Embedding topological features into convolutional neural network salient object detection. Neural Netw 121:308–318.

Zhou Q, Zhang J, Han G, Ruan Z, Wei Y (2022) Enhanced self-supervised GANs with blend ratio classification. Multimedia Tools and Applications, pp 1–17

Zhou L et al (2020) Transfer learning-based DCE-MRI method for identifying differentiation between benign and malignant breast tumors. IEEE Access 8:17527–17534.

Download references

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

Indian Institute of Information Technology Pune, Pune, Maharashtra, India

Rakesh Kumar, Pooja Kumbharkar, Sandeep Vanam & Sanjeev Sharma

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sanjeev Sharma .

Ethics declarations

Conflict of interests.

The authors declare that they have no confict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Kumar, R., Kumbharkar, P., Vanam, S. et al. Medical images classification using deep learning: a survey. Multimed Tools Appl 83 , 19683–19728 (2024).

Download citation

Received : 26 April 2022

Revised : 12 February 2023

Accepted : 19 April 2023

Published : 28 July 2023

Issue Date : February 2024


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image classification
  • Deep Learning models
  • Performance
  • Medical imaging
  • Find a journal
  • Publish with us
  • Track your research

Medical Imaging 2020: Image Processing


Volume Details

Table of contents.

  • Front Matter: Volume 11313
  • Awards and Plenary Session
  • Image Synthesis, GANs, and Novel Architectures
  • Image Analysis in Ultrasound and OCT: Joint Session with Conferences 11313 and 11319
  • Lesions and Pathologies
  • Machine Learning and Deep Learning
  • Registration
  • fMRI and DTI
  • Keynote and Highlights
  • Labeling and Segmentation
  • Deep Learning: Segmentation
  • Segmentation: Anatomy
  • Deep Learning: Uncertainty and Quality
  • Nuclear and Molecular
  • Poster Session

medical image processing research papers 2020

  • Open access
  • Published: 10 November 2020

Medical image processing with contextual style transfer

  • Yan Li 1 &
  • Byeong-Seok Shin 1  

Human-centric Computing and Information Sciences volume  10 , Article number:  46 ( 2020 ) Cite this article

8496 Accesses

11 Citations

Metrics details

With recent advances in deep learning research, generative models have achieved great achievements and play an increasingly important role in current industrial applications. At the same time, technologies derived from generative methods are also under a wide discussion with researches, such as style transfer, image synthesis and so on. In this work, we treat generative methods as a possible solution to medical image augmentation. We proposed a context-aware generative framework, which can successfully change the gray scale of CT scans but almost without any semantic loss. By producing target images that with specific style / distribution, we greatly increased the robustness of segmentation model after adding generations into training set. Besides, we improved 2– 4% pixel segmentation accuracy over original U-NET in terms of spine segmentation. Lastly, we compared generations produced by networks when using different feature extractors (Vgg, ResNet and DenseNet) and made a detailed analysis on their performances over style transfer.


Style transfer has received more and more attention in the area of image processing. Literally, it means using certain methods to change the style of original images. Then with stylized generations, we can make many interesting applications, such as colorization [ 1 ], young to old [ 2 ]. When looking backing on its development history, we can conclude that most of them following such pipeline:

As demonstrated in Fig.  1 , assuming we are expected to transfer the style of original input X into that of Y:

Firstly, X is feed into a feature extractor, which help us access to its feature space (or latent space)

Then we also input the target Y into the same extractor. But differ from that from X, we see Y’s feature space as a set of its style attributes, controlling the “destination” of style transformation.

Having absorbing style information from Y (with mathematical operation), the mixed feature set would be input into a generative model to produce the generations.

figure 1

General structure of style-transfer model

Despite so many research and studies have been put in this area, there are three points should be noted: (1) The choice of feature extractor, whether the network you use can totally preserve contents of X, if not, it can be predicted that the final output would be a little different from X in terms of semantics. (2) The way when extracting style attributes from Y. Mathematically speaking, all information extracted from images with network is a set of vectors, the problem is how to quantitative the “style” attributes from complicated signal set. (3) The generative method. A good generative can not only improve the quality of generations but also bring better industrial prospective. In this section, we would detailly discussion possible solutions to challenges of research of style transfer facing.

Motivated by developments of deep learning advances, great progresses have been achieved by applying AI-based techniques. With studies of network architectures, there are more and more encoder-like models used to hele people access, understand, even control the feature space of input signal. Recent years, models like VGG, ResNet [ 3 ], DenseNet [ 4 ] are preferred by many studies [ 5 , 6 , 7 ]. On the face of it, encoder-like architectures may perform better in term of feature extraction and all research mentioned above originate from studies of object detection. We do not talk about their performances in practical style transfer tasks here but note this trend deserving focus. Differ from detection tasks, transformation effects (quality of generations) and semantic preservation should be given top priority in current relevant research.

Apart from the choice of feature extractor, the way of learning style information during transformation is also received great attention. Learning from adapted loss function seems a potential solution, Gatys [ 8 ]. etc. introduced a layer-wise style loss, they provided a layer list where saved general style information, trying to learning “style” attribute from loss function. On the hand other hand, Huang [ 9 ] etc. attempted to acquire “style” by operating batch normalization. Both types of research achieved state-of-art transformation effects, considering almost all studies learning “style” from network, the only point is that they have different understanding of feature spaces. But it is worth noting that we have consider about the flexibility of solution. It means the speed of processing should be attached with great importance and the case where multiple style learning also deserves attention.

We mentioned that deep learning greatly pushed the development of generative models, especially the appearance of generative adversarial network (GAN). Researches originated from this technique have been preferred recent years, like CycleGAN [ 10 ], UNIT [ 11 ]. As shown in Fig.  2 , we provide a general structure of GAN-based model, they try to produce final outputs by the adversarial process of the generator and discriminator network. It is true that the quality of generations can be improved by adversarial GAN. But problems of such frameworks are also obvious: long training process and extra computational resource brought by discriminator, also, the difficulty of multi-style transformation should be taken into onsideration.

figure 2

General style-transfer models with GAN

All introductions on current solutions to style transfer are targeting on general images. At present time, applications of this kind of research are also popular with medical image analysis, such as visualization (CT to MRI), diagnosis classification and so on. With better-quality of generations, doctors would make a clearer understanding of patients’ condition. But unlike those general ones, preventing context loss should be put at the first place, even a little semantic loss of medical images may result incorrect diagnosis. What’s more, there are much more low-resolution but important signals existing in medical images, the preservation of which greatly increase the difficulty of transformation.

In this paper, we would focus on applications of style transfer for medical images. Relevant works about style transfer and medical image processing would be introduced in section II. Then in section III, we propose a context-aware framework for medical image processing, which based on the advances of style transfer. Unlike GAN-based research [ 12 , 13 , 14 , 15 , 16 ], we followed a traditional generative idea which can help avoid unnecessary training costs. To maintain semantics of input as much as possible, we introduced a context-aware loss when training. Besides, to accelerate the speed of processing, the model is designed to learn style by batch normalization operation instead of loss learning, enabling the entire model applicable for massive production. In section IV, we designed a two-part experiment, one for testing the quality of generations while another for organ segmentation after adding outputs into training set. Targeting evaluating the performance of feature preservation, we experimented VGG19, DenseNet and ResNet respectively. Last but not least, we concluded that the proposed framework can produce high-quality of medical images. Apart from it, compared with existing style transfer methods, the proposed framework can improve 2 ~ 4% segmentation accuracy of U-NET [ 17 ], the highest among current research.

Contributions we made in this paper are as follows:

Following traditional generative idea, we proposed a style-transfer model for medical images. To prevent possible context loss, we design a context-aware loss [ 18 ] to enforce the semantic preservation in transformation process.

Our model learns target style attributes with introduced Adain [ 9 ] (Adaptive Instance Normalization), which enables the model can absorb “style” from single target input and learn multiple styles at the same time.

Organ (spine) segmentation results [ 17 ] showed that our framework largely improved the pixel accuracy after adding outputs into training.

We made a detailed analysis on current feature extractors in terms of their performance over context maintenance, then we concluded that VGG19 [ 18 ] would be the best choice for medical image style transfer.

Related work

In this section, we focus on advanced works about style transfer and introduce its application in medical image analysis:

Style transfer

Style transfer means change the style of given images into another with certain methods. Since Gatys [ 8 ] etc. introduced their transformation framework based on convolutional network, using deep learning techniques seem a trend in this area. Similar to image synthesis, current research tends to treat style information as a factor which is independent from contexts. Under such circumstances, the entire transformation job would become an attribute-learning task, aiming to learn target “style” attributes.

Assuming images as a combination of context and style codes, Gatys learned style information by training. With pre-visualization intermediate results, they selected possible layers which totally preserve context and style information. In this way, the whole training loss would be divided into two parts: context loss and style loss. With pre-recorded layer names, Gatys [ 8 ]. adapted original one-time loss into a layer-wise one and researchers can control the transformation effects by adjusting the weight of style loss. Such advanced research greatly inspired other studies [ 9 , 19 ]. Many similar works have been introduced, at the same time, the rapid development of network architecture and training techniques, people began to apply the idea of “transformation” to other tasks, such as colorization [ 1 ], data augmentation [ 15 ], resolution improvement [ 20 ].

Facing the challenge from industrial application, original transformation strategy is complained for long training time and limits. Many unique solutions have been introduced [ 9 , 19 ]. Actually, it is almost impossible to enable “style” information a single unit, being independent from contexts. Researchers can only simulate “style” vectors and then quantify them with certain mathematical operations.

Inspired by the idea of batch normalization, Huang introduced an adaptive instance normalization layer, which enable feature codes to absorb extra style information by normalization. They performed style transfer inside original feature space, shifting and matching target style codes in a channel wise way [ 9 , 21 ]. Huang’s work has completed state-of-art transformation effects and we would follow their work and compared with the proposed model in terms of performances over medical image generation.

On the other hand, with the scaled-up size of computation and increasingly complicated image input, traditional pixel-wise context loss has to deal with the threat of industrial demand. Especially when processing medical images, countless issues and anomaly structure are extremely to quantify. To present context loss in a clear way, Mechrez et al. [ 19 ] designed a context-aware loss that measured the similarity between images by feature-wise comparisons. Experiments shown that their proposed context loss is more applicable than mean square error (MSE) when dealing with complicated signals. We introduced it as the contextual loss and applied it to balance training.

Medical image analysis

Actually, using generative models to tackle with challenges facing in medical analysis has a long history. It is worth to point out that comparing with common images, medical ones are normally low-resolution and have more noise [ 22 ]. Furtherly, when applying generative solutions to medical areas, we should put semantic preservation at the first place, otherwise possible information loss may result misdiagnosis.

At present time, there are many related studies targeting on medical image style transformation, particularly on specific imaging technologies (CT, MRI etc.). Yang [ 22 ] built a GAN-based model and successfully transfer low-dose CT scans to high-dose ones, which greatly reduced the noise from original images. Besides, style transfer techniques also play a considerable role in data augmentation. To increase the diversity of size of training set, Frid [ 15 ] introduced a multi-resolution transformation model and added generations into training, greatly improved the accuracy of liver lesion classification.

All medical research mentioned above are based on the idea of “style transfer + generative model”. We concluded that main usage of style-transfer technology focus on diagnosis assistance and data augmentation. In this work, we aim to focus on models’ performances when processing CT scans, all generations would be used to scale up the training set.

Context-ware style transfer model

Since this research focus on massive medical image processing, particularly cases where multi-style transformation. We followed the traditional generative architecture, using decoding network to avoid extra computational cost, rather than adversarial process.

As illustrated in Fig.  3 , the proposed style-transfer model followed similar pipeline that shown in Fig.  1 . We let X be context input while \(Y:\{{Y}_{1},{Y}_{2}\dots .{Y}_{n}\}\) stands for images which has target styles we aim to learn. Firstly, both X and Y are feed into the same feature extractor \(E\) . Both feature codes (outputs from the extractor) can be represented as \(E(X)\) and \(E(Y)\) . Then with the help of adaptive normalization layer, we align the mean and variance of \(E(X)\) to those of \(E(Y)\) channel by channel (details would be discussed in next section). In this way, all channels of feature maps in \(E(X)\) can learn the style information from \(E(Y)\) , what help enforce the transformation effects. Then the output \(Adain(E\left(X\right),\) \(E(X))\) would be input into decoder network to produce final generations.

figure 3

Contextual style-transfer Model

But during training process, having got the output from decoder network, the generations would then be input into extractor again, intermediate results of which could be used to measure the gap between original context and target style. Throughout the entire experiment, we still use two-part loss function (context and style loss, represented as \({L}_{context}\) and \({L}_{style}\) ).

Considering the specific of medical images, we declare the context preservation of them should be attached with greater importance than that of general ones. Inspired by Mechrez [ 19 ], we design a context-aware loss \({L}_{context}\) . In this way, we reduce semantic loss by matching feature vectors in generations with those from E(x). Next, details of context-ware loss and Adain would be introduced.

Adaptive instance normalization

Having got both feature spaces \(E\left(X\right)\) and \(E\left(Y\right)\) , the rest job is to learning style information from target input as much as possible. According to the original stylization architecture, batch normalization layer is used after each convolution layer, late research began to build a specific normalization operation for style transformation. In this part, we introduce the adaptive instance normalization layer that accelerates the speed of stylization with only single style input.

With given E(x) and E(Y), we try to extract style vectors by contrast normalization. Firstly, we computed the mean and variance of \(E\left(X\right)\) and \(E\left(Y\right)\) as follows (denoted with \(\tau\) and \(\sigma\) respectively):

where we let \(E\left(X\right)\) , \(E\left(Y\right)\) be 4-D input tensor ( \(E\left(X\right)\) , \(E\left(Y\right)\) \(\in {R}^{N*C*H*W}\) ), and Let \({E(Y)}_{nchw}\) denote its inside elements. But differ from traditional BN [], here \(\tau (Y)\) and \(\sigma \left(Y\right)\) are computed cross dimensions independently for each channel.

Then we formulated the layer as:

In which we scale up the normalized context feature map set \(E\left(X\right)\) with \(\upsigma \left(E\left(Y\right)\right)\) , and shift the result with \(\uptau \left(\mathrm{E}(\mathrm{Y})\right)\) . Intuitively, from Eq. ( 3 ), we can see the entire normalization do not need any extra learnable weights, indicating faster stylization speed.

As for the benefits of channel-wise computation, we insist that the style of an image results from the intersection of all channels. When detecting a certain style information, E(Y) would produce a high activation when processed with normalization action. The output of AdaIN have the same average activation value for each channel but preserves the context of \(E\left(X\right)\) at the same time.

Contextual-aware loss

In this section, we would concentrate on models’ ability of context preservation. The proposed model follows the setting of two-part loss: context loss and style loss [ 8 ]. Considering countless small but complicated signals existing in medical images, even a little mismatch between generations and original inputs may cause possible misunderstanding when diagnosing. Despite many relevant works using pixel-wise MSE [ 8 , 21 ], we furtherly highlight the importance of preventing semantic loss [ 19 ] during transformation process. Besides, we maintain contexts by matching feature vectors instead of pixel values.

Assuming generation \(G\) and original input \(X\) having the same number of features. Then both them can be defined as:

In which \({\mathrm{g}}_{i}\) and \({\mathrm{x}}_{j}\) stand for the feature vectors in \(\mathrm{E}(\mathrm{G})\) and \(\mathrm{E}(\mathrm{X})\) and \(\left|G\right|=\left|X\right|=\mathrm{N}\) . Next, we represent the image similarity between G and T as:

The \(\mathrm{CA}({\mathrm{g}}_{i},{\mathrm{x}}_{j})\) denotes the vector similarity between \({\mathrm{g}}_{i}\) and \({\mathrm{x}}_{j}\) . For each \({\mathrm{x}}_{j}\) , we search all \({\mathrm{g}}_{x}\) in G to find which is most close to. Then we get average feature similarity value, what can be used to stand for the image similarity between G and X.

As for the details of vector similarity \(\mathrm{CA}({\mathrm{g}}_{i},{\mathrm{x}}_{j})\) , we introduce the Cosine distance [ 22 ] \({CosD}_{i,j}\) , the distance between \({\mathrm{g}}_{i}\mathrm{ and }{\mathrm{x}}_{j}\) is formulated as:

In which \({\mu }_{t}=\frac{1}{N}\sum_{j}{\mathrm{x}}_{j}\) . When \({D}_{i,j}\ll {D}_{i,k}, \forall k\ne j\) , we see vector \({\mathrm{g}}_{i}\mathrm{ and }{\mathrm{x}}_{j}\) as similar. Besides, in practical experiment, to quickly find the minimum \({CosD}_{i,j}\) for each \({\mathrm{x}}_{j}\) , we start with distance normalization:

where the \(\sigma\) ( \(\sigma =1e-5\) ) denotes a smooth parameter that helps normalization. Next, we turn the distance into the similarity metric by exponentiation:

The \(w\) is a band-width parameter ( \(w>0\) ). Lastly, we adapt the vector similarity into a scale version (for ease of large-scale calculation):

In this way, the whole image contextual loss [ 19 ] between \(\mathrm{G}\) and T can be formulated as:

The parameter \(\varphi\) stands for the feature extractor (would be talked in next section). While L denotes the layer list which is pre-set by feature map visualization.

Context-aware model

As illustrated in Fig.  3 , we follow the two-part loss function setting.

where the \(\partial\) is the weight that used for balancing training. As for style loss, with commonly used Gram matrix loss [ 8 ] and pre-recorded layer list [ 19 , 21 ], we define it as:

The \(\widehat{L}\) here is also a list which records possible layers’ name that preserve style information of \(Y\) .

Experiment and discussion

This work focus style transfer applications on medical images. Supported by Soul National University Hospital, we were given over 50 thousand CT scans of spine, expected to complete organ segmentation with deep learning technology. Soon we found that there are several gray scales existing among those CT images, even all of them are produced by the same machines and processed by the same staff. Limited by the size of training set, we decide apply the proposed model to increase the diversity of given, furtherly improving the generalizability of original segmentation model [ 17 ].

On the other hand, although image semantics can be maintained by setting specified loss function, we observe that either style or context loss are computed over feature maps, which are accessed by extractor networks. It means a reliable feature extractor plays a significant role. Looking back on previous relevant research, we see most of them [ 3 , 4 , 18 ] make object-detection networks as their choice, like VGG19 [ 18 ], DenseNet [ 4 ] and ResNet [ 3 ]. How about their practical performances over medical images? Secondly, the selection context and style layer candidate are determined by the pre-visualization of intermediate results, does it really work?

Facing above challenges, our experiments here are divided into three parts: (1) style transfer (2) semantic segmentation (3) extractor analysis.

Three-part experiment

Resulting from the difference of imaging conditions or staff error, CT scans produced by the same machine have different gray scales, which poses a great threat to late processing. If train with such imbalance dataset, it is certain that models have poor performances no matter segmentation or classification.

As shown in Fig.  4 , great distribution difference can be observed even with eyes. In this case, the proposed contextual model is expected to learn the style of last three image but produces images with the first one’s context.

figure 4

Style Transfer in CT images

Semantic segmentation

Our goal is to increase the diversity of dataset and then furtherly improve the generalizability of models that trained with augmented training set.

We pick U-Net [ 17 ] as the baseline in this part. By comparing segmentation performances before and after adding generations, we make a clear understanding of usages of style transfer techniques.

Extractor analysis

In previous sections, we mentioned that the selection of context / style layer based on visualization of feature maps. It means the either context or style loss totally relies on the architecture of extractor.

We conclude five extractor architectures mainly used in current works: VGG19, ResNet50, ResNet101, DenseNet121, DenseNet169, DenseNet201. All models have demonstrated good ability of classification and detection. But how about their performance over medical images?

Experiments in this section would focus on encoding. By comparing their transformation performance when used as feature extractor, we try to choose the best architecture for style transfer research.

Baselines and metrics

We have built a three-part experiment, baselines and quantitative metrics are introduced as follows:

LPIPS distance We aim to produce diverse gray scale CT generations. To better evaluate models’ performance over context preservation and style transformation, we introduced the Learned Perceptual Image Patch Similarity distance (LPIPS) distance [ 13 ] as a numerical metric. A lower distance indicates greater similarity between paired input (context + style).

Conditional inception score (CIS) This metric can provide a numerical value over images’ performances over a classifier [ 13 ]. With fine-tuned Inception-V3 [ 23 ], a lower CIS value means a poorer ability of style transformation.

In the segmentation part, we make pixel accuracy ( PA ) and the mean intersection over union ( MIoU ) as our metrics to evaluate models’ segmentation ability.

Adain synthesis [ 9 ] The AdaIN synthesis model realizes style transfer by using Adaptive method but uses a MSE as context loss.

Contextual transformation [ 19 ] This generative model is trained with contextual loss and enables style transfer with unpaired input.

Results and discussions

According to generations produced above (Figs.  5 , 6 ), we found all three generative models above have good performances over medical image style transfer. Comparing with those from Adain method, no great style difference found over generations. But as labeled with red circles, we observe clear semantic loss in generations of Adain [ 9 ] that using MSE to compute context loss. While for methods which use context-aware loss (contextual transformation and ours), all semantics from context input are preserved well. But it turns to the style learning, it is clear that outputs from contextual transformation doesn’t learn well, with unclear structure and great style difference from those from others.

figure 5

Figure 5: Outputs after style transformation 1

figure 6

Outputs after style transformation 2

Table 1 provides the diversity and similarity comparison among three methods. Considering we aim to test the performances of style learning and context preservation, the lower CIS and higher LPIPS mean better style transfer performance over CT image processing.

At the same time, both methods (Contextual transformation [ 19 ] and ours) have lower LPIPS value than that using MSE, indicting the contextual loss is better at semantic protection. As for style learning, the Adain method performs better not only at its speed of style normalization but numerical evaluation over CIS. Numerical evaluations in Table 1 confirm with the visual comparison in Figs.  5 , 6 and it can be concluded our context-aware style transfer model outperforms existing works over medical image processing.

From Fig.  7 , a great improvement on segmentation result can be seen after adding generations of style transfer into original training set which has fixed size have single grayscale images, it can be viewed as a kind of data augmentation. It can not only solve class imbalance but improve the generalizability of model. Although U-NET [ 17 ] achieved good segmentation results on images following a certain style, it is not able to segment scans that have different grayscales.

figure 7

Spine Segmentation

As shown in Table 2 , PA and MIoU greatly improved after augmentation, indicating a better segmentation quality. It means style transfer techniques can be a potential choice for data augmentation.

We mentioned the choice of feature extractor plays a determining role in research of style transfer, either on the quality of final generations or practical training. But not all encoder-like architectures are appliable for medical images processing, even through some studies have applied them on general images.

In this work, we firstly experimented with VGG19 that has made good achievements on relevant research. Next, we made ResNet/DenseNet-based networks as possible candidates, exploring their performances over medical images.

Figures 8 and  9 demonstrate extractor candidates’ performance over medical image style transfer when trained with MSE and context-aware loss respectively. Considering context/style layers are selected with pre-visualization on feature maps, we assume the way of layer-wise calculation has nothing with final generations. When observing performances above, it is clear that VGG19 performs much better than other two types of candidates (ResNet-based and DenseNet-based), no matter in which loss function they are trained. Despite of style learning, for ResNet50 and ResNet101, both them can barely maintain the structure of context input (spine) during transformation process, at the cost of small issues. While for DenseNet-based networks, semantics of context inputs are totally destroyed, resulting the failure of generations. From experiments in this work, we think VGG19 is the best in term of semantic preservation among all encoder-like architectures.

figure 8

Generation comparisons among feature extractors (trained with MSE)

figure 9

Generation comparisons among feature extractors (trained with context-aware loss)

To summarize, following the traditional style transfer pipeline, we proposed a context-aware generative model. In this model, we design a new loss function that help prevent semantic loss. Also, with introduced adaptive normalization method, we greatly accelerate the speed of stylization and enable the entire model can learn style information from single style input.

Experiments show that our work can produce better quality medical images than existing research. We also treat this work as a new way of data augmentation. With increased data set, we greatly improve the segmentation ability of U-Net.

On the other hand, by experimenting on feature extractors (ResNet50, ResNet101, DenseNet169 and DenseNet201), we find that although ResNet and DenseNet improved that both them have better ability over feature extraction [ 3 , 4 ] than VGG, VGG19 is still the best feature extractor for medical images.

We conclude that with development of deep learning, encoder-like networks are becoming better and better at extracting high-level signals and using high frequencies to hide low-level ones which they think not important and make all signals imperceptible to humans [ 24 ]. It means the encoding ability of neural networks is increasingly improved, that is why people can continuingly make advances in many advance visual tasks. But it is not a good news for generative research, especially for medical images that have numberless low-signal signals and noises. Up to now, we conclude that VGG19 still be best choice for medical image processing.

Availability of data and materials

Not applicable.

Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III. Springer, Cham, pp 649–666

Chapter   Google Scholar  

Galkin F, Aliper A, Putin E et al (2018) Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. bioRxiv.

Article   Google Scholar  

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

Nam H, Kim H-E (2018) Batch-instance normalization for adaptively style-invariant neural networks. Advances in Neural Information Processing Systems

He W, Xie Z, Li Y, Wang X, Cai W (2019) Synthesizing depth hand images with GANs and style transfer for hand pose estimation. Sensors 19(13):2919

Chang B, Zhang Q, Pan S, Meng L (2018) Generating handwritten chinese characters using cyclegan. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 199–207

Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423

Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pp 1501–1510

Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp 2223–2232

Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In Advances in neural information processing systems, pp 700–708

Wolterink JM, Dinkla AM, Savenije MHF, Seevinck PR, van den Berg CAT, Išgum I (2017) Deep MR to CT synthesis using unpaired data. In International workshop on simulation and synthesis in medical imaging, Springer, Cham, pp 14–23

Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp 172–189

Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In Advances in neural information processing systems, pp 465–476

Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Synthetic data augmentation using GAN for improved liver lesion classification. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), IEEE, pp 289–293

Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra MK, Zhang Y, Sun L, Wang G (2018) Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans Med Imag 37:1348–1357

Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015 Proceedings, Part III. Springer, Cham, pp 234–241

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

Mechrez R, Talmi I, Zelnik-Manor L (2018) The contextual loss for image transformation with non-aligned data. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV. Springer, Cham, pp 768–783

Google Scholar  

Zhao C, Ng TK, Wei N, Prabaswara A, Alias MS, Janjua B, Shen C, Ooi BS (2016) Facile formation of high-quality InGaN/GaN quantum-disks-in-nanowires on bulk-metal substrates for high-power light-emitters. Nano Lett 16(2):1056–1063

Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M-H (2017) Universal style transfer via feature transforms. In Advances in neural information processing systems, pp 386–396

Liao H, Xu Z (2015) Approaches to manage hesitant fuzzy linguistic information based on the cosine distance and similarity measures for HFLTSs and their application in qualitative decision making. Expert Syst Appl 42:5328–5336

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

Chu C, Zhmoginov A, Sandler M (2017) Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950

Download references


This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No.NRF-2019R1A2C1090713).

Author information

Authors and affiliations.

Department of Electrical and Computer Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Korea

Yin Xu, Yan Li & Byeong-Seok Shin

You can also search for this author in PubMed   Google Scholar


Conceptualization: XY, YL, and BSS; Methodology: XY; Conceive and design of the experiment: XY and YL; Writing—original draft: XY; Writing—review and editing: YL and BSS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Byeong-Seok Shin .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

Reprints and permissions

About this article

Cite this article.

Xu, Y., Li, Y. & Shin, BS. Medical image processing with contextual style transfer. Hum. Cent. Comput. Inf. Sci. 10 , 46 (2020).

Download citation

Received : 04 August 2020

Accepted : 18 October 2020

Published : 10 November 2020


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical Image
  • Contextual transfer
  • Deep learning
  • Segmentation

medical image processing research papers 2020

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023.

medical image processing research papers 2020

  • 1 First School of Clinical Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
  • 2 College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, China
  • 3 The School of Health, Fujian Medical University, Fuzhou, China

Background: With the rapid development of the internet, the improvement of computer capabilities, and the continuous advancement of algorithms, deep learning has developed rapidly in recent years and has been widely applied in many fields. Previous studies have shown that deep learning has an excellent performance in image processing, and deep learning-based medical image processing may help solve the difficulties faced by traditional medical image processing. This technology has attracted the attention of many scholars in the fields of computer science and medicine. This study mainly summarizes the knowledge structure of deep learning-based medical image processing research through bibliometric analysis and explores the research hotspots and possible development trends in this field.

Methods: Retrieve the Web of Science Core Collection database using the search terms “deep learning,” “medical image processing,” and their synonyms. Use CiteSpace for visual analysis of authors, institutions, countries, keywords, co-cited references, co-cited authors, and co-cited journals.

Results: The analysis was conducted on 562 highly cited papers retrieved from the database. The trend chart of the annual publication volume shows an upward trend. Pheng-Ann Heng, Hao Chen, and Klaus Hermann Maier-Hein are among the active authors in this field. Chinese Academy of Sciences has the highest number of publications, while the institution with the highest centrality is Stanford University. The United States has the highest number of publications, followed by China. The most frequent keyword is “Deep Learning,” and the highest centrality keyword is “Algorithm.” The most cited author is Kaiming He, and the author with the highest centrality is Yoshua Bengio.

Conclusion: The application of deep learning in medical image processing is becoming increasingly common, and there are many active authors, institutions, and countries in this field. Current research in medical image processing mainly focuses on deep learning, convolutional neural networks, classification, diagnosis, segmentation, image, algorithm, and artificial intelligence. The research focus and trends are gradually shifting toward more complex and systematic directions, and deep learning technology will continue to play an important role.

1. Introduction

The origin of radiology can be seen as the beginning of medical image processing. The discovery of X-rays by Röntgen and its successful application in clinical practice ended the era of disease diagnosis relying solely on the clinical experience of doctors ( Glasser, 1995 ). The production of medical images provides doctors with more data, enabling them to diagnose and treat diseases more accurately. With the continuous improvement of computer performance and image processing technology represented by central processing units (CPUs; Dessy, 1976 ), medical image processing has become more efficient and accurate in medical research and clinical applications. Initially, medical image processing was mainly used in medical imaging diagnosis, such as analyzing and diagnosing X-rays, CT, MRI, and other images. Nowadays, medical image processing has become an important research tool in fields such as radiology, pathology, and biomedical engineering, providing strong support for medical research and clinical diagnosis ( Hosny et al., 2018 ; Hu et al., 2022 ; Lin et al., 2022 ).

Deep learning originated from artificial neural networks, which can be traced back to the 1940 and 1950s when scientists proposed the perceptron model and neuron model to simulate the working principles of human nervous system ( Rosenblatt, 1958 ; McCulloch and Pitts, 1990 ). However, limited by the weak performance of computers at that time, these models were quickly abandoned. In 2006, Canadian computer scientist Geoffrey Hinton and his team proposed a model called “deep belief network,” which adopted a deep structure and solved the shortcomings of traditional neural networks. This is considered as the starting point of deep learning ( Hinton et al., 2006 ).

In recent years, with the rapid development of the Internet, massive data are constantly generated and accumulated, which are very favorable for deep learning networks that require a large amount of data for training ( Misra et al., 2022 ). Additionally, the development of computer devices such as graphics processing units (GPUs) and tensor processing units(TPUs) has made the training of deep learning models faster and more efficient ( Alzubaidi et al., 2021 ; Elnaggar et al., 2022 ). Furthermore, the continuous improvement and optimization of deep learning algorithms have also led to the continuous improvement of the performance of deep learning models ( Minaee et al., 2022 ). Therefore, the application of deep learning is becoming more and more widespread in various fields, including medical image processing.

Deep learning has many advantages in processing medical images. Firstly, it does not require human intervention and can automatically learn and extract features, achieving automation in processing ( Yin et al., 2021 ). Secondly, it can process a large amount of data simultaneously, with processing efficiency far exceeding traditional manual methods ( Narin et al., 2021 ). Thirdly, its accuracy is also high, able to learn more complex features and discover subtle changes and patterns that are difficult for humans to perceive ( Han et al., 2022 ). Lastly, it is less affected by subjective human factors, leading to relatively more objective results ( Kerr et al., 2022 ).

Bibliometrics is a quantitative method for evaluating the research achievements of researchers, institutions, countries, or subject areas, and can be traced back to the 1960s ( Schoenbach and Garfield, 1956 ). In bibliometric analysis, the citation half-life of an article has two characteristics: first, classical articles are continuously cited; second, some articles are frequently cited within a certain period and quickly reach a peak. The length of time that classical articles are continuously cited is closely related to the speed of development of basic research, while the frequent citation of certain articles within a specific period represents the dynamic changes in the corresponding field. Generally speaking, articles that reflect dynamic changes in the field are more common than classical articles. In Web of Science, papers that are cited in one or more fields and rank in the top 1% of citation counts for their publication year are included as highly cited papers. Visual analysis of highly cited papers is more effective in identifying popular research areas and trends compared to visual analysis of all search results. CiteSpace is a visualization software that employs bibliometric methods, developed by Professor Chaomei Chen at Drexel University ( Chen, 2006 ).

Therefore, to gain a deeper understanding of the research hotspots and possible development trends of deep learning-based medical image processing, this study aims to analyze highly cited papers published between 2013 and 2023 using bibliometric methods, intends to identify the authors, institutions, and countries with the most research achievements, and provide an overall review of the knowledge structure among the highly cited papers. Expected to be helpful for researchers in this field.

2.1. Search strategy and data source

A search was conducted in the Web of Science Core Collection database using the search terms “deep learning” and “medical imaging,” along with their synonyms and related terms. The complete search string is as follows: (TS = Deep Learning OR “Deep Neural Networks” OR “Deep Machine Learning” OR “Deep Artificial Neural Networks” OR “Deep Models” OR “Hierarchical Learning” OR “Deep architectures” OR “Multi-layer Neural Networks” OR “Large-scale Neural Networks” OR “Deep Belief Networks”) AND (TS = “Medical imaging” OR “Radiology imaging” OR “Diagnostic imaging” OR “Clinical imaging” OR “Biomedical imaging” OR “Radiographic imaging” OR “Tomographic imaging” OR “Imaging modalities” OR “Medical visualization” OR “Medical image analysis”). The search was refined to include only articles published between 2013 and 2023, with a focus on highly cited papers. The search yielded a total of 562 results. The article type was restricted to papers, and the language was limited to English.

2.2. Scientometric analysis methods

Due to the Web of Science export limitation, the record options were set to export records 1–500 and 501–562 separately, and the record content including full records and cited references. This plain text file served as the source file for the analysis. Next, a new project was established in CiteSpace 6.1.R6, with the project location and data storage location set up. The input and output function of CiteSpace were used to convert the plain text file into a format that could be analyzed in CiteSpace. The remaining parameters were set as follows: the time slicing was set from 2013 to 2023, with a yearly time interval; the node types selected included authors, institutions, countries keyword, co-cited references, co-cited authors, and co-cited journals; the threshold for “Top N,” “Top N%,” and “g-index” were set to default; the network pruning was set to pathfinder and pruning the merged network; the visualization was set to static cluster view and show merged network to display the overall network.

In the map generated by CiteSpace, there are multiple elements. The various nodes available for analysis are represented as circles on the map, with their size generally indicating the quantity—the larger the circle, the greater the quantity. The circles are composed of annual rings, with the color of each ring representing the year, and the thickness of the ring determined by the number of corresponding nodes in that year. The more nodes in a year, the thicker the ring. The meaning of the “Centrality” option in CiteSpace menu is “Betweenness Centrality” ( Chen, 2005 ). CiteSpace utilizes this metric to discover and measure the importance of nodes, and highlights nodes with purple circles when the centrality greater than or equal to 0.1. It means that only nodes with centrality greater than or equal to 0.1 are worth emphasizing their importance. The calculation method is based on the formulation introduced by Freeman (1977) , and the formula is as follows:

In this formula, g s t represents the number of shortest paths from node s to node t , and n s t i represents the number of those shortest paths from node s to node t that pass through node i . From the information transmission perspective, the higher the Betweenness Centrality, the greater the importance of the node. Removing these nodes will have a larger impact on network transmission.

3.1. Analysis of annual publication volume

The trend of annual publication volume shows that from 2013 to 2023, the number of related studies fluctuated slightly each year but showed an overall upward trend. Overall, it can be divided into three stages: before 2016, the number of papers was relatively small; after 2016, the number of papers increased year by year, and the rate of increase accelerated. From 2016 to 2019, there was an increase of about 20 papers per year on the basis of the previous year. After 2019, the growth rate slowed down, but there was still a high level of publications each year ( Figure 1 ).

Figure 1 . Annual quantitative distribution of publications.

3.2. Analysis of authors

Among the 562 articles included, there are a total of 364 authors ( Figure 2 ). Pheng-Ann Heng and Hao Chen ranks first with seven publications, Klaus Hermann Maier-Hein ranks second with six publications, while Fabian Isensee, Jing Qin, Qi Dou, and Dinggang Shen are tied for third place with five publications each. From Figure 2 , it can be seen that there are many small groups of authors, but no very large research groups, and there are still many authors who do not have any collaborative relationships with each other.

Figure 2 . The collaborative relationship map of researchers in the field of medical image processing with deep learning from 2013 to 2023.The size of nodes represents the number of papers published by the author. The links between nodes reflect the strength of collaboration.

3.3. Analysis of institutions

In the 562 papers included, there are a total of 311 institutions ( Figure 3 ; Table 1 ). The institution with the highest publication output is Chinese Academy of Sciences, and the institution with the highest centrality is Stanford University. The map shows that there are close collaborative relationships between institutions, but these relationships are based on one or more institutions with high publication output and centrality. There is less collaboration between institutions with low publication output and no centrality. As shown in Table 1 , there is no necessary relationship between publication output and centrality, and the institution with the highest publication output does not necessarily have the highest centrality.

Figure 3 . The collaborative relationship map of institutions in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes represents the number of papers published by the institution. The links between nodes reflect the strength of collaboration.

Table 1 . Top 10 institutions by publication volume and centrality.

3.4. Analysis of countries

In the 562 included papers, there are a total of 62 countries represented ( Figure 4 ; Table 2 ). The United States has the highest publication output, while Germany has the highest centrality. The map shows that all countries have at least some collaboration with other countries. In general, there are three situations: some countries have a high publication output and centrality; some have a low publication output but high centrality, and some have a high publication output but low centrality.

Figure 4 . The collaborative relationship map of countries in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes represents the number of papers published by the country. The links between nodes reflect the strength of collaboration.

Table 2 . Top 10 countries by publication volume and centrality.

3.5. Analysis of keywords

Among the 562 papers included, there were a total of 425 keywords ( Figure 5 ; Table 3 ). The most frequently occurring keyword is “Deep Learning,” and the one with the highest centrality is “algorithm.” Clustering analysis of the keywords resulted in 20 clusters: management, laser radar, biomarker, mild cognitive impairment, COVID-19, image restoration, breast cancer, feature learning, major depressive disorder, pulmonary embolism detection, precursor, bioinformatics, computer vision, annotation, change detection, information, synthetic CT, auto-encoder, brain networks, and ultrasound.

Figure 5 . The clustering map of keywords in the field of medical image processing with deep learning from 2013 to 2023. The smaller the cluster number, the larger its size, and the more keywords it contains.

Table 3 . Top 10 keywords by quantity and centrality.

The evolution of burst keywords in recent years can be summarized as follow ( Figure 6 ): It all began in 2015 with a focus on “image.” By 2016, “feature, accuracy, algorithm, and machine learning” took center stage. The year 2017 brought prominence to “diabetic retinopathy, classification and computer-aided detection.” Moving into 2020, attention shifted to “COVID-19, pneumonia, lung, coronavirus, transfer learning and X-ray.” In 2021, the conversation revolved around “feature extraction, framework and image segmentation”.

Figure 6 . Top 17 keywords with the strongest citation bursts in publications of medical image processing with deep learning from 2013 to 2023. The blue line represents the overall timeline, while the red line represents the appearance year, duration, and end year of the burst keywords.

3.6. Analysis of references

In the 562 articles included, there are a total of 584 references ( Figure 7 ; Table 4 ). The most cited reference is “ImageNet Classification with Deep Convolutional Neural Networks” by Alex Krizhevsky. Alex Krizhevsky and his team developed a powerful convolutional neural network (CNN) to classify a vast dataset of high-resolution images into 1,000 categories, achieving significantly improved accuracy rates of 37.5 and 17.0% for top-1 and top-5 errors compared to previous methods ( Krizhevsky et al., 2017 ).

Figure 7 . The co-cited reference map in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.

Table 4 . Top 10 references in quantity ranking.

There are a total of three articles with centrality greater than or equal to 0.1. The authors of these three articles are Dan Claudiu Ciresan, Liang-Chieh Chen, and Marios Anthimopoulos. Dan Claudiu Ciresan use deep max-pooling convolutional neural networks to detect mitosis in breast histology images and won the ICPR 2012 mitosis detection competition ( Ciresan et al., 2013 ). Liang-Chieh Chen address the task of semantic image segmentation with deep learning and make three main contributions. Firstly, convolution with upsampled filters, known as “atrous convolution.” Secondly, they introduce the method of atrous spatial pyramid pooling (ASPP). Lastly, they improve the accuracy of object boundary localization by integrating techniques from deep convolutional neural networks and probabilistic graphical models ( Chen et al., 2018 ). Marios Anthimopoulos propose and evaluate a convolutional neural network (CNN), designed for the classification of interstitial lung diseases (ILDs) patterns ( Anthimopoulos et al., 2016 ).

The eighth and ninth ranked articles have the same title, originating from the Nature journal. The commonality lies in their source, but they differ in authors. The eighth-ranked article is by Nicole Rusk, published in the Comments & Opinion section of Nature Methods. It provides a concise introduction to deep learning ( Rusk, 2016 ). On the other hand, the ninth-ranked article is authored by Yann LeCun and is a comprehensive review. In comparison to Nicole Rusk’s article, LeCun’s extensively elaborates on the fundamental principles of deep learning and its applications in various domains such as speech recognition, visual object recognition, object detection, as well as fields like drug discovery and genomics ( LeCun et al., 2015 ).

3.7. Analysis of co-cited authors

In the 562 included articles, there are a total of 634 cited authors ( Figure 8 ). The most cited author is Kaiming He, whose papers have been cited 141 times; the author with the highest centrality is Yoshua Bengio, whose papers have been cited 45 times.

Figure 8 . The map of co-cited author in the field of medical image processing with deep learning from 2013 to 2023. The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.

The most cited paper authored by Kaiming He in Web of Science is “Deep Residual Learning for Image Recognition.” This paper introduces a residual learning framework to simplify the training of networks that are much deeper than those used previously. These residual networks are not only easier to optimize but also achieve higher accuracy with considerably increased depth ( He et al., 2016 ). On the other hand, the most cited paper authored by Yoshua Bengio in Web of Science is “Representation Learning: A Review and New Perspectives.” This paper reviews recent advances in unsupervised feature learning and deep learning, covering progress in probabilistic models, autoencoders, manifold learning, and deep networks ( Bengio et al., 2013 ).

3.8. Analysis of co-cited journals

In the 562 articles included, a total of 345 journals were cited ( Figure 9 ; Table 5 ). The journal with the most citations is the IEEE Conference on Computer Vision and Pattern Recognition, with 339 articles citing papers from this journal; the journal with the highest centrality is Advances in Neural Information Processing Systems, with 128 articles citing papers from this journal.

Figure 9 . The collaborative relationship map of co-cited journal in the field of medical image processing with deep learning from 2013 to 2023.The size of nodes reflects the number of citations, while the links between nodes reflect the strength of co-citations.

Table 5 . Top 10 journals in citation frequency and centrality ranking.

It can be seen that the literature in three major disciplines, mathematics, systems, and mathematical, cite systems, computing, computers; molecular, biology, genetics; health, nursing, and medicine. The literature in molecular, biology, and immunology cite molecular, biology, genetics, and literature in health, nursing, and medicine. The literature in medicine, medical, and clinical cite molecular, biology, genetics, and literature in health, nursing, medicine ( Figure 10 ).

Figure 10 . Dual-map overlap of journals. The map consists of two graphs, with the citing graph on the left and the cited graph on the right. The curves represent citation links, displaying the full citation chain. The longer the vertical axis of the ellipse, the more articles are published in the journal. The longer the horizontal axis of the ellipse, the more authors have contributed to the journal.

4. Discussion

From 2013 to 2023, the analysis of publication volume reveals an obvious stage characteristic, before and after 2016, and thus, 2016 is a key year for the field of deep learning-based medical image processing. Although deep learning technology began to be applied as early as 2012, it did not receive widespread attention in the field of medical image processing because traditional machine learning methods, such as support vector machines (SVM) and random forests ( Lehmann et al., 2007 ), were mainly used before then. At the same time, deep learning models require powerful computing power and a large amount of data for training ( Ren et al., 2022 ). Before 2016, high-performance computers were very expensive, which was not conducive to large-scale research in this field. Moreover, large-scale medical image datasets were relatively scarce, so research in this field was constrained by computing capability and dataset limitations. In 2016, however, deep learning technology achieved breakthroughs in computer vision, including image classification, object detection, and segmentation, providing more advanced and efficient solutions for medical image processing ( Girshick et al., 2016 ; Madabhushi and Lee, 2016 ). These breakthroughs accelerated the progress of research in this field, leading to an increase in publication volume year by year.

From the analysis of authors, it can be seen that the research on deep learning in the field of medical image processing is relatively scattered, and large-scale cooperative teams have not been formed. This may be because research on deep learning requires a large amount of computing resources and data, and therefore requires a strong background in mathematics and computer science. At the same time, the application of deep learning in the medical field is an interdisciplinary cross, which also requires the participation of talents with medical backgrounds. However, individuals with both backgrounds are relatively few, making it difficult to form large-scale research teams. In addition, researchers in this field may be more focused on personal research achievements rather than collaborating with others. This situation may not necessarily mean that researchers lack a spirit of cooperation, but rather reflects the research characteristics and preferences of this field’s researchers.

The institutional analysis mainly reflects two characteristics: first, the broad cooperation between institutions is mainly based on high publication volume and high centrality institutions; publication volume and centrality are not necessarily correlated. This indicates that in the field of medical image processing, institutions with high publication volume and centrality often have strong collaborative abilities and influence, which can attract other institutions to cooperate with them. However, institutions with low publication volume and no centrality may collaborate less due to a lack of resources or opportunities. Second, publication volume does not entirely determine centrality. Sometimes smaller institutions may receive high attention and recognition due to their unique research contributions or research directions ( Wuchty et al., 2007 ; Lariviere and Gingras, 2010 ). Therefore, institutional centrality is not only related to publication volume but also to the depth and breadth of research, and the degree of innovation in research results. Overall, these institutions are internationally renowned research institutions with broad disciplinary areas and research capabilities, and they have high centrality in the field of medical image processing, making them important research institutions in this field. The collaboration and communication between these institutions are also very frequent, jointly promoting the development of medical image processing. These institutions are distributed globally, including countries and regions such as China, the United States, Germany, and the United Kingdom, showing an international character. Among them, the United States has the largest number of institutions, occupying two of the top three positions, indicating that the United States has strong strength and influence in the field of medical image processing. In addition, these institutions include universities, hospitals, and research institutes, demonstrating the interdisciplinary nature of the field of medical image processing. These institutions also often collaborate and communicate with each other, jointly promoting the research progress in this field.

In country analysis, there are mainly three situations: some countries not only have a large number of publications, but also have high centrality; some countries have a small number of publications, but high centrality; and some countries have a large number of publications, but low centrality. This indicates that deep learning in medical image processing is a global research hotspot, and various countries have published high-quality papers in this field and have close collaborative relationships. Some countries have a large number of publications in this field because they have strong research capabilities and play a leading role in this field. The high centrality of these countries also indicates that they play an important role in collaborative relationships. Some countries have a relatively low number of publications, but their centrality is still high. This may be because they have unique contributions in specific research directions or technologies in this field ( Lee et al., 2018 ), or because they have close relationships with other countries in this field. There are also some countries with a large number of publications, but low centrality. This may be because their research and published paper quality is relatively low in this field, or because they have relatively few collaborative relationships with other countries.

According to keyword analysis, these keywords indicate that in highly cited papers in the field of medical image processing, core concepts include deep learning and machine learning, such as “deep learning” and “machine learning.” In terms of applications, the keywords emphasize COVID-19 diagnosis, image segmentation, and classification, while highlighting the significance of neural networks and convolutional neural networks. Additionally, the centrality-ranked keywords underscore the relevance of algorithms associated with deep learning and reiterate key themes in medical image processing, such as “cancer” and “MRI.” Overall, these keywords reflect the diverse applications of deep learning in medical image processing and the importance of algorithms.

From the clusters of keywords, these clusters can be grouped into four main domains, reflecting diverse applications of deep learning in medical image processing. The first group focuses on medical image processing and diseases, encompassing biomarkers, the detection, and diagnosis of specific diseases such as breast cancer and COVID-19 ( Chougrad et al., 2018 ; Altan and Karasu, 2020 ). The second group concentrates on image processing and computer vision, including image restoration, annotation, and change detection ( Zhang et al., 2016 ; Kumar et al., 2017 ; Tatsugami et al., 2019 ) to enhance the quality and analysis of medical images. The third group emphasizes data analysis and information processing, encompassing feature learning, bioinformatics, and information extraction ( Min et al., 2017 ; Chen et al., 2021 ; Hang et al., 2022 ), aiding in the extraction of valuable information from medical images. Lastly, the fourth group centers on neuroscience and medical imaging, studying brain networks and ultrasound images ( Kawahara et al., 2017 ; Ragab et al., 2022 ), highlighting the importance of deep learning in understanding and analyzing biomedical images for studying the nervous system and organs.

From the analysis of burst keywords, the evolution of these keywords reflects the changing trends and focal points in the field of deep learning in medical image processing. In 2015, the keyword “image” dominated, signifying an initial emphasis on basic image processing and analysis to acquire fundamental image information. By 2016, terms like “feature,” “accuracy,” “algorithm,” and “machine learning” ( Shin et al., 2016 ; Zhang et al., 2016 ; Jin et al., 2017 ; Lee et al., 2017 ; Zhang et al., 2018 ) were introduced, indicating a growing interest in feature extraction, algorithm optimization, accuracy, and machine learning methods, highlighting the shift toward higher-level analysis and precision in medical image processing. In 2017, terms like “diabetic retinopathy,” “classification,” and “computer-aided detection” ( Zhang et al., 2016 ; Lee et al., 2017 ; Quellec et al., 2017 ; Setio et al., 2017 ) were added, underlining an increased interest in disease-specific diagnoses (e.g., diabetic retinopathy) and computer-assisted detection of medical images. The year 2020 saw the emergence of “COVID-19,” “pneumonia,” “lung,” “coronavirus,” “transfer learning,” and “x-ray” ( Minaee et al., 2020 ) due to the urgent demand for analyzing lung diseases and infectious disease detection, prompted by the COVID-19 pandemic. Additionally, “transfer learning” reflected the trend of utilizing pre-existing deep learning models for medical image data. In 2021, keywords such as “feature extraction,” “framework,” and “image segmentation” ( Dhiman et al., 2021 ; Sinha and Dolz, 2021 ; Chen et al., 2022 ) became prominent, indicating a deeper exploration of feature extraction, analysis frameworks, and image segmentation to enhance the accuracy and efficiency of medical image processing. Overall, these changes illustrate the ongoing development in the field of medical image processing, evolving from basic image processing toward more precise feature extraction, disease diagnosis, lesion segmentation, and addressing the needs arising from disease outbreaks. This underscores the widespread application and continual evolution of deep learning in the medical domain.

Based on the analysis of reference citations, it is evident that these 10 highly cited papers cover significant research in the field of deep learning applied to medical image processing. They share a common emphasis on the outstanding performance of deep Convolutional Neural Networks (CNNs) in tasks such as image classification, skin cancer classification, and medical image segmentation. They explore the effectiveness of applying deep residual learning in large-scale image recognition and medical image analysis ( He et al., 2016 ). The introduction of the U-Net, a convolutional network architecture suitable for biomedical image segmentation, is another key aspect ( Ronneberger et al., 2015 ). Additionally, they develop deep learning algorithms for detecting diabetic retinopathy in retinal fundus photographs ( Gulshan et al., 2016 ). They also provide a review of deep learning in medical image analysis, summarizing the trends in related research ( LeCun et al., 2015 ; Rusk, 2016 ). However, these papers also exhibit some differences. Some focus on specific tasks like skin cancer classification and diabetic retinopathy detection, some concentrate on proposing new network structures (such as ResNet, U-Net, etc.) to enhance the performance of medical image processing, while others provide overviews and summaries of the overall application of deep learning in medical image processing. Overall, these papers collectively drive the advancement of deep learning in the field of medical image processing, achieving significant research outcomes through the introduction of new network architectures, effective algorithms, and their application to specific medical image tasks.

From the analysis of cited journal, it can be observed that these journals collectively highlight the important features of research in medical image processing. Firstly, they emphasize areas such as computer vision, image processing, and pattern recognition, which are closely related to medical image processing. Moreover, journals and conferences led by IEEE, such as IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Medical Imaging, and IEEE Winter Conference on Applications of Computer Vision, hold significant influence in the fields of computer vision and pattern recognition, reflecting IEEE’s leadership in the domain of medical image processing. These journals span across multiple fields including computer science, medicine, and natural sciences, underscoring the interdisciplinary nature of medical image processing research. Open-access publishing platforms like Arxiv and Scientific Reports underscore the importance of open access and information sharing in the field of medical image processing. Additionally, specialized journals like “Medical Image Analysis” and “Radiology” play pivotal roles in research on medical image processing. The comprehensive journal “Nature” covers a wide range of scientific disciplines, potentially including research related to medical image processing. In summary, these journals collectively form a comprehensive research network covering various academic disciplines in the field of medical image processing, emphasizing the significance of open access and information sharing. They also highlight the crucial role of deep learning and neural network technologies in medical image processing, as well as the importance of image processing, analysis, and diagnosis.

From the analysis of dual-map overlap of journals, it can be observed that a particularly noteworthy citation relationship is the reference of computer science, biology, and medicine to mathematics. Computer science research has a strong connection to mathematics, as mathematical methods and algorithms are the foundation of computer science, while the development of computers and information technology provides a broader range of applications for mathematical research ( Domingos, 2012 ). Molecular biology and genetics are important branches of biological research, where mathematical methods are widely applied, such as for analyzing gene sequences and molecular structures, and studying interactions between molecules ( Jerber et al., 2021 ). Medicine is a field related to human health, where mathematical methods also have many applications, such as for statistical analysis of clinical trial results, predicting disease risk, and optimizing the allocation of medical resources ( Gong and Tang, 2020 ; Wang et al., 2021 ).

From our perspective, the future development of deep learning in the field of medical image processing can be summarized as follows. First, with the widespread application of deep learning models in medical image processing, the design and development of more efficient and lightweight network architectures will become necessary. This can improve the speed and portability of the model, making it possible for these models to run effectively in resource-limited environments such as mobile devices ( Ghimire et al., 2022 ). Second, traditional deep learning methods usually require a large amount of labeled data for training, while in the field of medical image processing, labeled data is often difficult to obtain. Therefore, weakly supervised learning will become an important research direction to improve the model’s performance using a small amount of labeled data and a large amount of unlabeled data. This includes the application of techniques such as semi-supervised learning, transfer learning, and generative adversarial networks ( Ren et al., 2023 ). Third, medical image processing involves different types of data such as CT scans, MRI, X-rays, and biomarkers. Therefore, multimodal fusion will become an important research direction to organically combine information from different modalities and provide more comprehensive and accurate medical image analysis results. Deep learning methods can be used to learn the correlations between multimodal data and perform feature extraction and fusion across modalities ( Saleh et al., 2023 ). Finally, deep learning models are typically black boxes, and their decision-making process is difficult to explain and understand. In medical image processing, the interpretability and reliability of the decision-making process are crucial. Therefore, researchers will focus on developing interpretable deep learning methods to enhance physicians’ and clinical experts’ trust in the model’s results and provide explanations for the decision-making process ( Chaddad et al., 2023 ).

In conclusion, deep learning is becoming increasingly important in the field of medical image processing, with many active authors, institutions, and countries in this field. In the high-cited papers of this field in the core collection of Web of Science, Pheng-Ann Heng, Hao Chen, and Dinggang Shen have published a relatively large number of papers. China has the most research institutions in this field, including the Chinese Academy of Sciences, the University of Chinese Academy of Sciences, The Chinese University of Hong Kong, Zhejiang University, and Shanghai Jiao Tong University. The United States ranks second in terms of the number of institutions, including Stanford University, Harvard Medical School, and Massachusetts General Hospital. Germany and the United Kingdom have relatively few institutions in this field. The number of publications in the United States far exceeds that of other countries, with China in second place. The number of papers from the United Kingdom, Germany, Canada, Australia, and India is relatively high, while the number of papers from the Netherlands and France is relatively low. South Korea’s development and publication output in medical image processing are relatively low. Currently, research in this field is mainly focused on deep learning, convolutional neural networks, classification, diagnosis, segmentation, algorithms, artificial intelligence, and other aspects, and the research focus and trends are gradually moving toward more complex and systematic directions. Deep learning technology will continue to play an important role in this field.

This study has certain limitations. Firstly, we only selected highly cited papers from the Web of Science Core Collection as our analysis material, which means that we may have missed some highly cited papers from other databases and our analysis may not be comprehensive for the entire Web of Science. However, given the limitations of bibliometric software, it is difficult to merge and analyze various databases. Additionally, the reasons why we chose highly cited papers from the Web of Science Core Collection as our analysis material have been explained in the section “Introduction.” Secondly, we may have overlooked some important non-English papers, leading to research bias.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

BC: Writing – original draft. JJ: Writing – review & editing. HL: Writing – review & editing. ZY: Writing – review & editing. HZ: Writing – review & editing. YW: Writing – review & editing. JL: Writing – original draft. SW: Writing – original draft. SC: Writing – original draft.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work is supported by the National Natural Science Foundation of China (Grant No. 81973924) and Special Financial Subsidies of Fujian Province, China (Grant No. X2021003—Special financial).


We would like to thank Chaomei Chen for developing this visual analysis software.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


CNNs, Convolutional neural networks; CPUs, Central processing units; GPUs, Graphics processing units; TPUs, Tensor processing units; ASPP, Atrous spatial pyramid pooling.

Altan, A., and Karasu, S. (2020). Recognition of Covid-19 disease from X-Ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique. Chaos, Solitons Fractals 140:110071. doi: 10.1016/j.chaos.2020.110071

PubMed Abstract | CrossRef Full Text | Google Scholar

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., et al. (2021). Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J. Big Data 8:53. doi: 10.1186/s40537-021-00444-8

Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., and Mougiakakou, S. (2016). Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans. Med. Imaging 35, 1207–1216. doi: 10.1109/TMI.2016.2535865

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. doi: 10.1109/TPAMI.2013.50

Chaddad, A., Peng, J. H., Xu, J., and Bouridane, A. (2023). Survey of explainable AI techniques in healthcare. Sensors 23:634. doi: 10.3390/s23020634

Chen, C. (2005). “The centrality of pivotal points in the evolution of scientific networks” in Proceedings of the 10th international conference on Intelligent user interfaces ; San Diego, California, USA: Association for Computing Machinery. p. 98–105.

Google Scholar

Chen, C. M. (2006). Citespace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 57, 359–377. doi: 10.1002/asi.20317

CrossRef Full Text | Google Scholar

Chen, R. J., Lu, M. Y., Wang, J. W., Williamson, D. F. K., Rodig, S. J., Lindeman, N. I., et al. (2022). Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770. doi: 10.1109/TMI.2020.3021387

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018). Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848. doi: 10.1109/TPAMI.2017.2699184

Chen, M., Shi, X. B., Zhang, Y., Wu, D., and Guizani, M. (2021). Deep feature learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data 7, 750–758. doi: 10.1109/TBDATA.2017.2717439

Chougrad, H., Zouaki, H., and Alheyane, O. (2018). Deep convolutional neural networks for breast cancer screening. Comput. Methods Prog. Biomed. 157, 19–30. doi: 10.1016/j.cmpb.2018.01.011

Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. Med. Image Comput. Comput. Assist. Intervent. 16, 411–418. doi: 10.1007/978-3-642-40763-5_51

Dessy, R. E. (1976). Microprocessors?—an end user's view. Science (New York, N.Y.) 192, 511–518. doi: 10.1126/science.1257787

Dhiman, G., Kumar, V. V., Kaur, A., and Sharma, A. (2021). DON: deep learning and optimization-based framework for detection of novel coronavirus disease using X-ray images. Interdiscip. Sci. 13, 260–272. doi: 10.1007/s12539-021-00418-7

Domingos, P. (2012). A few useful things to know about machine learning. Commun. ACM 55, 78–87. doi: 10.1145/2347736.2347755

Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., et al. (2022). Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127. doi: 10.1109/TPAMI.2021.3095381

Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry 40, 35–41. doi: 10.2307/3033543

Ghimire, D., Kil, D., and Kim, S. H. (2022). A survey on efficient convolutional neural networks and hardware acceleration. Electronics 11:945. doi: 10.3390/electronics11060945

Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 142–158. doi: 10.1109/TPAMI.2015.2437384

Glasser, O. W. C. (1995). Roentgen and the Discovery of the Roentgen Rays. AJR Am. J. Roentgenol. 165, 1033–1040. doi: 10.2214/ajr.165.5.7572472

Gong, F., and Tang, S. (2020). Internet intervention system for elderly hypertensive patients based on hospital community family edge network and personal medical resources optimization. J. Med. Syst. 44:95. doi: 10.1007/s10916-020-01554-1

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316, 2402–2410. doi: 10.1001/jama.2016.17216

Han, Z., Yu, S., Lin, S.-B., and Zhou, D.-X. (2022). Depth selection for deep relu nets in feature extraction and generalization. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1853–1868. doi: 10.1109/TPAMI.2020.3032422

Hang, R. L., Qian, X. W., and Liu, Q. S. (2022). Cross-modality contrastive learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–12. doi: 10.1109/TGRS.2022.3188529

He, K, Zhang, X, Ren, S, and Sun, J (eds.) (2016). “Deep Residual Learning for Image Recognition” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ; June 27-30, 2016.

Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554. doi: 10.1162/neco.2006.18.7.1527

Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., and Aerts, H. J. W. L. (2018). Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510. doi: 10.1038/s41568-018-0016-5

Hu, K., Zhao, L., Feng, S., Zhang, S., Zhou, Q., Gao, X., et al. (2022). Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement. Comput. Biol. Med. 147:105760. doi: 10.1016/j.compbiomed.2022.105760

Jerber, J., Seaton, D. D., Cuomo, A. S. E., Kumasaka, N., Haldane, J., Steer, J., et al. (2021). Population-scale single-cell RNA-Seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53:304. doi: 10.1038/s41588-021-00801-6

Jin, K. H., McCann, M. T., Froustey, E., and Unser, M. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26, 4509–4522. doi: 10.1109/TIP.2017.2713099

Kawahara, J., Brown, C. J., Miller, S. P., Booth, B. G., Chau, V., Grunau, R. E., et al. (2017). Brainnetcnn: convolutional neural networks for brain networks; toward predicting neurodevelopment. NeuroImage 146, 1038–1049. doi: 10.1016/j.neuroimage.2016.09.046

Kerr, M. V., Bryden, P., and Nguyen, E. T. (2022). Diagnostic imaging and mechanical objectivity in medicine. Acad. Radiol. 29, 409–412. doi: 10.1016/j.acra.2020.12.017

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386

Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., and Sethi, A. (2017). A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imaging 36, 1550–1560. doi: 10.1109/TMI.2017.2677499

Lariviere, V., and Gingras, Y. (2010). The impact factor's matthew effect: a natural experiment in bibliometrics. J. Am. Soc. Inf. Sci. Technol. 61, 424–427. doi: 10.1002/asi.21232

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444. doi: 10.1038/nature14539

Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., et al. (2017). Fully automated deep learning system for bone age assessment. J. Digit. Imaging 30, 427–441. doi: 10.1007/s10278-017-9955-8

Lee, D., Yoo, J., Tak, S., and Ye, J. C. (2018). Deep residual learning for accelerated MRI using magnitude and phase networks. IEEE Trans. Biomed. Eng. 65, 1985–1995. doi: 10.1109/TBME.2018.2821699

Lehmann, C., Koenig, T., Jelic, V., Prichep, L., John, R. E., Wahlund, L.-O., et al. (2007). Application and comparison of classification algorithms for recognition of alzheimer's disease in electrical brain activity (EEG). J. Neurosci. Methods 161, 342–350. doi: 10.1016/j.jneumeth.2006.10.023

Lin, H., Wang, C., Cui, L., Sun, Y., Xu, C., and Yu, F. (2022). Brain-like initial-boosted hyperchaos and application in biomedical image encryption. IEEE Trans. Industr. Inform. 18, 8839–8850. doi: 10.1109/TII.2022.3155599

Madabhushi, A., and Lee, G. (2016). Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175. doi: 10.1016/

McCulloch, W. S., and Pitts, W. (1990). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 52, 99–115. doi: 10.1016/S0092-8240(05)80006-0

Min, S., Lee, B., and Yoon, S. (2017). Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869. doi: 10.1093/bib/bbw068

Minaee, S., Boykov, Y. Y., Porikli, F., Plaza, A. J., Kehtarnavaz, N., and Terzopoulos, D. (2022). Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3523–3542. doi: 10.1109/TPAMI.2021.3059968

Minaee, S., Kafieh, R., Sonka, M., Yazdani, S., and Soufi, G. J. (2020). Deep-covid: predicting covid-19 from chest X-ray images using deep transfer learning. Med. Image Anal. 65:101794. doi: 10.1016/

Misra, N. N., Dixit, Y., Al-Mallahi, A., Bhullar, M. S., Upadhyay, R., and Martynenko, A. (2022). Iot, big data, and artificial intelligence in agriculture and food industry. IEEE Internet Things J. 9, 6305–6324. doi: 10.1109/JIOT.2020.2998584

Narin, A., Kaya, C., and Pamuk, Z. (2021). Automatic detection of coronavirus disease (Covid-19) using X-ray images and deep convolutional neural networks. Pattern. Anal. Applic. 24, 1207–1220. doi: 10.1007/s10044-021-00984-y

Quellec, G., Charriére, K., Boudi, Y., Cochener, B., and Lamard, M. (2017). Deep image mining for diabetic retinopathy screening. Med. Image Anal. 39, 178–193. doi: 10.1016/

Ragab, M., Albukhari, A., Alyami, J., and Mansour, R. F. (2022). Ensemble Deep-Learning-Enabled Clinical Decision Support System for Breast Cancer Diagnosis and Classification on Ultrasound Images. Biology 11:439. doi: 10.3390/biology11030439

Ren, Z. Y., Wang, S. H., and Zhang, Y. D. (2023). Weakly supervised machine learning. Caai Transact. Intellig. Technol. 8, 549–580. doi: 10.1049/cit2.12216

Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B. B., et al. (2022). A survey of deep active learning. ACM Comput. Surv. 54, 1–40. doi: 10.1145/3472291

Ronneberger, O, Fischer, P, and Brox, T (eds.) (2015). “U-Net: Convolutional Networks for Biomedical Image Segmentation” in International Conference on Medical Image Computing and Computer-Assisted Intervention .

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408. doi: 10.1037/h0042519

Rusk, N. (2016). Deep learning. Nat. Methods 13:35. doi: 10.1038/nmeth.3707

Saleh, M. A., Ali, A. A., Ahmed, K., and Sarhan, A. M. (2023). A brief analysis of multimodal medical image fusion techniques. Electronics 12:97. doi: 10.3390/electronics12010097

Schoenbach, U. H., and Garfield, E. (1956). Citation indexes for science. Science (New York, N.Y.) 123, 61–62. doi: 10.1126/science.123.3185.61.b

Setio, A. A. A., Traverso, A., de Bel, T., Berens, M. S. N., van den Bogaard, C., Cerello, P., et al. (2017). Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the Luna16 challenge. Med. Image Anal. 42, 1–13. doi: 10.1016/

Shin, H. C., Roth, H. R., Gao, M. C., Lu, L., Xu, Z. Y., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298. doi: 10.1109/TMI.2016.2528162

Sinha, A., and Dolz, J. (2021). Multi-scale self-guided attention for medical image segmentation. IEEE J. Biomed. Health Inform. 25, 121–130. doi: 10.1109/JBHI.2020.2986926

Tatsugami, F., Higaki, T., Nakamura, Y., Yu, Z., Zhou, J., Lu, Y. J., et al. (2019). Deep learning-based image restoration algorithm for coronary CT angiography. Eur. Radiol. 29, 5322–5329. doi: 10.1007/s00330-019-06183-y

Wang, S., Zhang, Y., and Yao, X. (2021). Research on spatial unbalance and influencing factors of ecological well-being performance in China. Int. J. Environ. Res. Public Health 18:9299. doi: 10.3390/ijerph18179299

Wuchty, S., Jones, B. F., and Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science 316, 1036–1039. doi: 10.1126/science.1136099

Yin, L., Zhang, C., Wang, Y., Gao, F., Yu, J., and Cheng, L. (2021). Emotional deep learning programming controller for automatic voltage control of power systems. IEEE Access 9, 31880–31891. doi: 10.1109/ACCESS.2021.3060620

Zhang, J., Gajjala, S., Agrawal, P., Tison, G. H., Hallock, L. A., Beussink-Nelson, L., et al. (2018). Fully automated echocardiogram interpretation in clinical practice: feasibility and diagnostic accuracy. Circulation 138, 1623–1635. doi: 10.1161/CIRCULATIONAHA.118.034338

Zhang, P. Z., Gong, M. G., Su, L. Z., Liu, J., and Li, Z. Z. (2016). Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images. ISPRS-J Photogramm Remote Sens 116, 24–41. doi: 10.1016/j.isprsjprs.2016.02.013

Keywords: deep learning, medical images, bibliometric analysis, CiteSpace, trends, hotspots

Citation: Chen B, Jin J, Liu H, Yang Z, Zhu H, Wang Y, Lin J, Wang S and Chen S (2023) Trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023. Front. Artif. Intell . 6:1289669. doi: 10.3389/frai.2023.1289669

Received: 06 September 2023; Accepted: 27 October 2023; Published: 09 November 2023.

Reviewed by:

Copyright © 2023 Chen, Jin, Liu, Yang, Zhu, Wang, Lin, Wang and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jianping Lin, [email protected] ; Shizhong Wang, [email protected] ; Shaoqing Chen, [email protected]

† These authors have contributed equally to this work and share first authorship

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

AI in Medical Imaging Informatics: Current Challenges and Future Directions

Andreas s. panayides.

Department of Computer Science, University of Cyprus, 1678 Nicosia, Cyprus

Electrical and Computer Engineering Department, University of Louisville, Louisville, KY 40292 USA

Nenad D. Filipovic

University of Kragujevac, 2W94+H5 Kragujevac, Serbia

Ashish Sharma

Emory University Atlanta, GA 30322 USA

Sotirios A. Tsaftaris

School of Engineering, The University of Edinburgh, EH9 3FG, U.K.

The Alan Turing Institute, U.K.

Alistair Young

Department of Anatomy and Medical Imaging, University of Auckland, Auckland 1142, New Zealand

David Foran

Department of Pathology and Laboratory Medicine, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA

U.S. Department of Veterans Affairs Boston Healthcare System, Boston, MA 02130 USA

Spyretta Golemati

Medical School, National and Kapodistrian University of Athens, Athens 10675, Greece

Tahsin Kurc

Stony Brook University,, Stony Brook, NY 11794 USA

School of Medicine, Regenstrief Institute, Indiana University, IN 46202 USA

Konstantina S. Nikita

Biomedical Simulations and Imaging Lab, School of Electrical and Computer Engineering, National Technical University of Athens, Athens 157 80, Greece

Ben P. Veasey

Michalis zervakis.

Technical University of Crete, Chania 73100, Crete, Greece

Joel H. Saltz

Constantinos s. pattichis.

Department of Computer Science of the University of Cyprus, 1678 Nicosia, Cyprus, and also with the Research Centre on Interactive Media, Smart Systems and Emerging Technologies (RISE CoE), 1066 Nicosia, Cyprus

This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine.

I. Introduction

MEDICAL imaging informatics covers the application of information and communication technologies (ICT) to medical imaging for the provision of healthcare services. A wide-spectrum of multi-disciplinary medical imaging services have evolved over the past 30 years ranging from routine clinical practice to advanced human physiology and pathophysiology. Originally, it was defined by the Society for Imaging Informatics in Medicine (SIIM) as follows [ 1 ]–[ 3 ]:

“Imaging informatics touches every aspect of the imaging chain from image creation and acquisition, to image distribution and management, to image storage and retrieval, to image processing, analysis and understanding, to image visualization and data navigation; to image interpretation, reporting, and communications. The field serves as the integrative catalyst for these processes and forms a bridge with imaging and other medical disciplines.”

The objective of medical imaging informatics is thus, according to SIIM, to improve efficiency, accuracy, and reliability of services within the medical enterprise [ 3 ], concerning medical image usage and exchange throughout complex healthcare systems [ 4 ]. In that context, linked with the associate technological advances in big-data imaging, -omics and electronic health records (EHR) analytics, dynamic workflow optimization, context-awareness, and visualization, a new era is emerging for medical imaging informatics, prescribing the way towards precision medicine [ 5 ]–[ 7 ]. This paper provides an overview of prevailing concepts, highlights challenges and opportunities, and discusses future trends.

Following the key areas of medical imaging informatics in the definition given above, the rest of the paper is organized as follows: Section II covers advances in medical image acquisition highlighting primary imaging modalities used in clinical practice. Section III discusses emerging trends pertaining to the data management and sharing in the medical imaging big data era. Then, Section IV introduces emerging data processing paradigms in radiology, providing a snapshot of the timeline that has today led to increasingly adopting AI and deep learning analytics approaches. Likewise, Section V reviews the state-of-the-art in digital pathology. Section VI describes the challenges pertaining to 3D reconstruction and visualization in view of different application scenarios. Digital pathology visualization challenges are further documented in this section, while in-silico modelling advances are presented next, debating the need of introducing new integrative, multi-compartment modelling approaches. Section VII discusses the need of integrative analytics and discusses emerging radiogenomics paradigm for both radiology and digital pathology approaches. Finally, Section VIII provides the concluding remarks along with a summary of future directions.

II. Image Formation and Acquisition

Biomedical imaging has revolutionized the practice of medicine with unprecedented ability to diagnose disease through imaging the human body and high-resolution viewing of cells and pathological specimens. Broadly speaking, images are formed through interaction of electromagnetic waves at various wavelengths (energies) with biological tissues for modalities other than Ultrasound, which involves use of mechanical sound waves. Images formed with high-energy radiation at shorter wavelength such as X-ray and Gamma-rays at one end of the spectrum are ionizing whereas at longer wavelength - optical and still longer wavelength - MRI and Ultrasound are nonionizing. The imaging modalities covered in this section are X-ray, ultrasound, magnetic resonance (MR), X-ray computed tomography (CT), nuclear medicine, and high-resolution microscopy [ 8 ], [ 9 ] (see Table I ). Fig. 1 shows some examples of images produced by these modalities.

An external file that holds a picture, illustration, etc.
Object name is nihms-1742605-f0001.jpg

Typical medical imaging examples. (a) Cine angiography X-ray image after injection of iodinated contrast; (b) An axial slice of a 4D, gated planning CT image taken before radiation therapy for lung cancer; (c) Echocardiogram – 4 chamber view showing the 4 ventricular chambers (ventricular apex located at the top); (d) First row – axial MRI slices in diastole (left), mid-systole (middle), and peak systolic (right). Note the excellent contrast between blood pool and left ventricular myocardium. Second row –tissues tagged MRI slices at the same slice location and time point during the cardiac cycle. The modality creates noninvasive magnetic markers within the moving tissue [ 40 ]; (e) A typical Q SPECT image displaying lung perfusion in a lung-cancer patient; (f) A 2D slice from a 3D FDG-PET scan that shows a region of high glucose activity corresponding to a thoracic malignancy; (g) A magnified, digitized image of brain tissue to look for signs of Glioblastoma (taken from TCGA Glioblastoma Multiforme collection ( ).

Summary of Imaging Modalities Characteristics

MRI: Magnetic Resonance Imaging, CT: Computer Tomography, RF: Radiofrequency.

X-ray imaging’s low cost and quick acquisition time has led to it being one of the most commonly used imaging techniques. The image is produced by passing X-rays generated by an X-ray source through the body and detecting the attenuated X-rays on the other side via a detector array; the resulting image is a 2D projection with resolutions down to 100 microns and where the intensities are indicative of the degree of X-ray attenuation [ 9 ]. To improve visibility, iodinated contrast agents that attenuate X-rays are often injected into a region of interest (e.g., imaging arterial disease through fluoroscopy). Phase-contrast X-ray imaging can also improve soft-tissue image contrast by using the phase-shifts of the X-rays as they traverse through the tissue [ 10 ]. X-ray projection imaging has been pervasive in cardiovascular, mammography, musculoskeletal, and abdominal imaging applications among others [ 11 ].

Ultrasound imaging (US) employs pulses in the range of 1–10 MHz to image tissue in a noninvasive and relatively inexpensive way. The backscattering effect of the acoustic pulse interacting with internal structures is used to measure the echo to produce the image. Ultrasound imaging is fast, enabling, for example, real-time imaging of blood flow in arteries through the Doppler shift. A major benefit of ultrasonic imaging is that no ionizing radiation is used, hence less harmful to the patient. However, bone and air hinder the propagation of sound waves and can cause artifacts. Still, ultrasound remains one of the most used imaging techniques employed extensively for real-time cardiac and fetal imaging [ 11 ]. Contrast-enhanced ultrasound has allowed for greater contrast and imaging accuracy with the use of injected microbubbles to increase reflection in specific areas in some applications [ 12 ]. Ultrasound elasticity imaging has also been used for measuring the stiffness of tissue for virtual palpation [ 13 ]. Importantly, ultrasound is not limited to 2D imaging and use of 3D and 4D imaging is expanding, though with reduced temporal resolution [ 14 ].

MR imaging [ 15 ] produces high spatial resolution volumetric images primarily of Hydrogen nuclei, using an externally applied magnetic field in conjunction with radio-frequency (RF) pulses which are non-ionizing [ 1 ]. MRI is commonly used in numerous applications including musculoskeletal, cardiovascular, and neurological imaging with superb soft-tissue contrast [ 16 ], [ 17 ]. Additionally, functional MRI has evolved into a large sub-field of study with applications in areas such as mapping the functional connectivity in the brain [ 18 ]. Similarly, diffusion-weighted MRI images the diffusion of water molecules in the body and has found much use in neuroimaging and oncology applications [ 19 ]. Moreover, Magnetic Resonance Elastography (MRE) allows virtual palpation with significant applications in liver fibrosis [ 20 ], while 4D flow methods permit exquisite visualization of flow in 3 D + t [ 17 ], [ 21 ]. Techniques that accelerate the acquisition time of scans, e.g. compressed sensing, non-Cartesian acquisitions [ 22 ], and parallel imaging [ 23 ], have led to increased growth and utilization of MR imaging. In 2017, 36 million MRI scans were performed in the US alone [ 24 ].

X-ray CT imaging [ 25 ] also offers volumetric scans like MRI. However, CT CT produces a 3D image via the construction of a set of 2D axial slices of the body. Similar to MRI, 4D scans are also possible by gating to the ECG and respiration. Improved solid-state detectors, common in modern CT scanners, have improved spatial resolutions to 0.25 mm [ 26 ], while multiple detector rows enable larger spatial coverage with slice thicknesses down to 0.625 mm. Spectral computed tomography (SCT) utilizes multiple X-ray energy bands that are used to produce distinct attenuation data sets of the same organs. The resulting data permit material composition analysis for a more accurate diagnosis of disease [ 27 ]. CT is heavily used due to its quick scan time and excellent resolution, in spite concerns of radiation dosage. Around 74 million CT studies were performed in the US alone in 2017 [ 24 ], and this number is bound to grow due to CT’s increased applications in screening in emergency care.

In contrast to transmission energy used in X-ray based modalities, nuclear medicine is based on imaging gamma rays that are emitted through radioactive decay of radioisotopes introduced in the body. The radioisotopes emit radiation that is detected by an external camera before being reconstructed into an image [ 11 ]. Single photon emission computed tomography (SPECT) and positron emission tomography (PET) are common techniques in nuclear medicine. Both produce 2D image slices that can be combined into a 3D volume; however, PET imaging uses positron-emitting radiopharmaceuticals that produce two gamma rays when a released positron meets a free electron. This allows PET to produce images with higher signal-to-noise ratio and spatial resolution as compared to SPECT [ 9 ]. PET is commonly used in combination with CT imaging (PET/CT) [ 28 ] and more recently PET/MR [ 29 ] to provide complementary information of a potential abnormality. The use of fluorodeoxyglucose (FDG) in PET has led to a powerful method for diagnosis and cancer staging. Time-of-flight PET scanners offer improved image quality and higher sensitivity during shorter scan times over conventional PET and are particularly effective for patients with a large body habitus [ 30 ].

Last but not least, the use of microscopy in imaging of cells and tissue sections is of paramount importance for disease diagnosis, e.g. for biopsy and/ or surgical specimens. Conventional tissue slides contain one case per slide. A single tissue specimen taken from a patient is fixated on a glass slide and stained. Staining enhances visual representation of tissue morphology, enabling a pathologist to view and interpret the morphology more accurately. Conventional staining methods include Hematoxylin and Eosin (H&E), which is the most common staining system and stains nuclei, and immunohistochemical staining systems. Light microscopes use the combination of an illuminator and two or more lenses to magnify samples up to 1,000x although lower magnifications are often used in histopathology. This allows objects to be viewed at resolutions of approximately 0.2 μ m and acts as the primary tool in diagnosing histopathology. Light microscopy is often used to analyze biopsy samples for potential cancers as well as for studying tissue-healing processes [ 1 ], [ 31 ].

While conventional microscopy uses the principle of transmission to view objects, the emission of light at a different wavelength can help increase contrast in objects that fluoresce by filtering out the excitatory light and only viewing the emitted light – called fluorescence microscopy [ 32 ]. Two-photon fluorescence imaging uses two photons of similar frequencies to excite molecules which allows for deeper penetration of tissue and lower phototoxicity (damage to living tissue caused by the excitation source) [ 33 ]. These technologies have seen use in neuro [ 34 ], [ 33 ] and cancer [ 35 ] imaging among other areas.

Another tissue slide mechanism is the Tissue Microarray (TMA). TMA technology enables investigators to extract small cylinders of tissue from histological sections and arrange them in a matrix configuration on a recipient paraffin block such that hundreds can be analyzed simultaneously [ 36 ]. Each spot on a tissue microarray is a complex, heterogeneous tissue sample, which is often prepared with multiple stains. While single case tissue slides remain the most common slide type, TMA is now recognized as a powerful tool, which can provide insight regarding the underlying mechanisms of disease progression and patient response to therapy. With recent advances in immuneoncology, TMA technology is rapidly becoming indispensable and augmenting single case slide approaches. TMAs can be imaged using the same whole slide scanning technologies used to capture images of single case slides. Whole slide scanners are becoming increasingly ubiquitous in both research and remote pathology interpretation settings [ 37 ].

For in-vivo imaging, Optical Coherence Tomography (OCT) can produce 3D images from a series of cross-sectional optical images by measuring the echo delay time and intensity of backscattered light from internal microstructures of the tissue in question [ 38 ]. Hyperspectral imaging is also used by generating an image based on several spectra (sometimes hundreds) of light to gain a better understanding of the reflectance properties of the object being imaged [ 39 ].

The challenges and opportunities in the area of biomedical imaging include continuing acquisitions at faster speeds and lower radiation dose in the case of anatomical imaging methods. Variations in imaging parameters (e.g. in-plane resolution, slice thickness, etc.) – which were not discussed – may have strong impacts on image analysis and should be considered during algorithm development. Moreover, the prodigious amount of imaging data generated causes a significant need for informatics in the storage and transmission as well as in the analysis and automated interpretation of the data, underpinning the use of big data science in improved utilization and diagnosis.

III. Interoperable and Fair Data Repositories For Reproducible, Extensible and Explainable Research

Harnessing the full potential of available big data for healthcare innovation necessitates a change management strategy across both research institutions and clinical sites. In its present form, heterogeneous healthcare data ranging from imaging, to genomic, to clinical data, that are further augmented by environmental data, physiological signals and other, cannot be used for integrative analysis (see Section VII ) and new hypothesis testing. The latter is attributed to a number of factors, a non-exhaustive list extending to the data being scattered across and within institutions in a poorly indexed fashion, not being openly-available to the research community, and not being well-curated nor semantically annotated. Additionally, these data are typically semi- or un- structured, adding a significant computational burden for constituting them data mining ready.

A cornerstone for overcoming the aforementioned limitations relies on the establishment of efficient, enterprise-wide clinical data repositories (CDR). CDRs can systematically aggregate information arising from: (i) Electronic Health and Medical Records (EHR/ EMR; term used interchangeably); (ii) Radiology and Pathology archives (relying on picture archive and communication systems (PACS)), (iii) a wide range of genomic sequencing devices, Tumor Registries, and Biospecimen Repositories, as well as (iv) Clinical Trial Management Systems (CTMS). Here, it is important to note that EHR/ EMR are now increasingly used as the umbrella term instead of CDRs encompassing the wealth of medical data availability. We adopt this approach in the present study. As these systems become increasingly ubiquitous, they will decisively contribute as fertile resources for evidence-based clinical practice, patient stratification, and outcome assessment, as well as for data-mining and drug discovery [ 41 ]–[ 45 ].

Toward this direction, many clinical and research sites have developed such data management and exploration tools to track patient outcomes [ 46 ]. Yet, many of them receive limited adoption from the clinical and research communities because they require manual data entry and do not furnish the necessary tools required to enable end-users to perform advanced queries. More recently, there has been a much greater emphasis placed on developing automated extraction, transformation and load (ETL) interfaces. ETLs can accommodate the full spectrum of clinical information, imaging studies and genomic information. Hence, it is possible to interrogate multi-modal data in a systematic manner, guide personalized treatment, refine best practices and provide objective, reproducible insight as to the underlying mechanisms of disease onset and progression [ 47 ].

One of the most significant challenges towards establishing enterprise-wide EHRs stems from the fact that a tremendous amount of clinical data are found in unstructured or semi-structured format with a significant number of reports generated at 3 rd party laboratories. Many institutions simply scan these documents into images or PDFs so that they can be attached to the patient’s EHR. Other reports arrive in Health Level 7 (HL7) format with the clinical content of the message aggregated into a continuous ASCII (American Standard Code for Information Interchange) string. Unfortunately, such solutions address only the most basic requirements of interoperability by allowing the information to flow into another Healthcare Information Technology (HIT) system; but since the data are not discrete, they cannot be easily migrated into a target relational or document-oriented (non-relational) database.

To effectively incorporate this information into the EHRs and achieve semantic interoperability, it is necessary to develop and optimize software that endorses and relies on interoperability profiles and standards. Such standards are defined by the Integrating the Healthcare Enterprise (IHE), HL7 Fast Healthcare Interoperability Resources (FHIR), and Digital Imaging and Communications in Medicine (DICOM), the latter also extending to medical video communications [ 48 ]. Moreover, to adopt clinical terminology coding (e.g., Systemized Nomeclature of Medicine-Clinical Terms (SNOMED CT), International Statistical Classification of Diseases and Related Health Problems (ICD) by the World Health Organization (WHO)). In this fashion, software systems will be in position to reliably extract, process, and share data that would otherwise remain locked in paper-based documents [ 49 ]. Importantly, (new) data entry (acquisition) in a standardized fashion underpins extensibility that in turns results in increased statistical power of research studies relying on larger cohorts.

The availability of metadata information is central in unambiguously describing processes throughout the data handling cycle. Metadata underpin medical dataset sharing by providing descriptive information that characterize the underlying data. The latter, can be further capitalized towards joint processing of medical datasets constructed under different context, such as clinical practice, research and clinical trials data [ 50 ]. A key medical imaging example concept relevant to metadata usage comes from image retrieval. Traditionally, image retrieval relied on image metadata, such as keywords, tags or descriptions. However, with the advent of machine and deep learning AI solutions (see Section IV ), content-based image retrieval (CBIR) systems evolved to exploiting rich contents extracted from images (e.g., imaging, statistical, object features, etc.) stored in a structured manner. Today, querying for other images with similar contents typically relies on a content-metadata similarity metric. Supervised, semi-supervised and unsupervised methods can be applied for CBIR extending across imaging modalities [ 51 ].

FAIR guiding principles initiative attempts to overcome (meta) data availability, by establishing a set of recommendations towards constituting (meta) data findable, accessible, interoperable, and reusable (FAIR) [ 52 ]. At the same time, privacy-preserving data publishing (PPDP) is an active research area aiming to provide the necessary means for openly sharing data. PPDP objective is to preserve patients’ privacy while achieving the minimum possible loss of information [ 53 ]. Sharing such data can increase the likelihood of novel findings and replication of existing research results [ 54 ]. To accomplish the anonymization of medical imaging data, approaches such as k-anonymity [ 55 ], [ 56 ], l-diversity [ 57 ] and t-closeness [ 58 ] are typically used. Toward this direction, multi-institutional collaboration is quickly becoming the vehicle driving the creation of well-curated and semantically annotated large cohorts that are further enhanced with research methods and results metadata, underpinning reproducible, extensible, and explainable research [ 59 ], [ 60 ]. From a medical imaging research perspective, the quantitative imaging biomarkers alliance (QIBA) [ 61 ] and more recently the image biomarker standardisation initiative (IBSI) [ 62 ] set the stage for multi-institution collaboration across imaging modalities. QIBA and IBSI vision is to promote reproducible results emanating from imaging research methods by removing interoperability barriers and adopting software, hardware, and nomeclature standards and guidelines [ 63 ]–[ 66 ]. Disease specific as well as horizontal examples include the Multi-Ethnic Study of Atherosclerosis (MESA - ), the UK biobank ( ), the Cancer Imaging Archive (TCIA - ), the Cancer Genome Atlas (TCGA - ), and the Alzheimer’s Disease Neuroimaging Initiative (ADNI - ). In a similar context, the CANDLE project (CANcer Distributed Learning Environment) focuses on the development of open-source AI-driven predictive models under a single scalable deep neural network umbrella code. Exploiting the ever-growing volumes and diversity of cancer data and leveraging exascale computing capabilities, it aspires to advance and accelerate cancer research.

The co-localization of such a broad number of correlated data elements representing a wide spectrum of clinical information, imaging studies, and genomic information, coupled with appropriate tools for data mining, are instrumental for integrative analytics approaches and will lead to unique opportunities for improving precision medicine [ 67 ], [ 68 ].

IV. Processing, Analysis, and Understanding in Radiology

This section reviews the general field of image analysis and understanding in radiology whereas a similar approach is portrayed in the next section for digital pathology.

Medical image analysis typically involves the delineation of the objects of interest (segmentation) or description of labels (classification) [ 69 ]–[ 72 ]. Examples include segmentation of the heart for cardiology and identification of cancer for pathology. To date, medical image analysis has been hampered by a lack of theoretical understanding on how to optimally choose and process visual features. A number of ad hoc (or hand-crafted ) feature analysis approaches have achieved some success in different applications, by explicitly defining a prior set of features and processing steps. However, no single method has provided robust, cross-domain application solutions. The recent advent of machine learning approaches has provided good results in a wide range of applications. These approaches, attempt to learn the features of interest and optimize parameters based on training examples. However, these methods are often difficult to engineer since they can fail in unpredictable ways and are subject to bias or spurious feature identification due to limitations in the training dataset. An important mechanism for advancing the field is by open access challenges in which participants can benchmark methods on standardized datasets. Notable examples of challenges include dermoscopic skin lesions [ 73 ], brain MRI [ 74 ], [ 75 ], heart MRI [ 76 ], quantitative perfusion [ 77 ], classification of heart disease from statistical shape models [ 78 ], retinal blood vessels segmentation [ 79 ], [ 80 ], general anatomy (i.e., the VISCERAL project evaluated the subjectivity of 20 segmentation algorithms [ 81 ]), segmentation of several organs together (the decathlon challenge) [ 82 ], and many others. An up-to-date list of open and ongoing biomedical challenges appears in [ 83 ]. These challenges have provided a footing for advances in medical image analysis and helped push the field forward; however, a recent analysis of challenge design has showed that biases exist that questions how easy would be to translate methods to clinical practice [ 84 ].

A. Feature Analysis

There has been a wealth of literature on medical image analysis using signal analysis, statistical modelling, etc. [ 71 ]. Some of the most successful include multi-atlas segmentation [ 85 ], graph cuts [ 86 ], and active shape models [ 87 ], [ 88 ]. Multi-atlas segmentation utilizes a set of labelled cases (atlases) which are selected to represent the variation in the population. The image to be segmented is registered to each atlas (i.e., using voxel-based morphometry [ 89 ]) and the propagated labels from each atlas are fused into a consensus label for that image. This procedure adds robustness since errors associated with a particular atlas are averaged to form a maximum likelihood consensus. A similarity metric can then be used to weight the candidate segmentations. A powerful alternative method attempts to model the object as a deformable structure, and optimize the position of the boundaries according to a similarity metric [ 87 ]–[ 90 ]. Active shape models contain information on the statistical variation of the object in the population and the characteristic of their images [ 91 ]. These methods are typically iterative and may thus get stuck in a local minimum. On the other hand, graph cut algorithms facilitate a global optimal solution [ 86 ]. Despite the initial graph construction being computationally expensive, updates to the weights (interaction) can be computed in real time.

B. Machine Learning

Machine learning (prior to deep learning which we analyse below) involves the definition of a learning problem to solve a task based on inputs [ 92 ]. To reduce data dimensionality and induce necessary invariances and covariances (e.g. robustness to intensity changes or scale) early machine learning approaches relied on hand-crafted features to represent data. In imaging data several transforms have been used to capture local correlation and disentangle frequency components spanning from Fourier, Cosine or Wavelet transform to the more recent Gabor filters that offer also directionality of the extracted features and superior texture information (when this is deemed useful for the decision). In an attempt to reduce data dimensionality or to learn in a data-driven fashion features, Principal and Independent Component Analyses have been used and [ 93 ] also the somewhat related (with some assumptions) K-means algorithm [ 94 ]. These approaches formulate feature extraction within a reconstruction objective imposing different criteria on the reconstruction and the projection space (e.g. PCA assumes the projection space is orthogonal). Each application then required a significant effort in identifying the proper features (known as feature engineering), which would then be fed into a learnable decision algorithm (for classification or regression). A plethora of algorithms have been proposed for this purpose, a common choice being support vector machines [ 95 ], due to the ease of implementation and the well understood nonlinear kernels. Alternatively, random forest methods [ 96 ] employ an ensemble of decision trees, where each tree is trained on a different subset of the training cases, improving the robustness of the overall classifier. An alternative classification method is provided by probabilistic boosting trees [ 97 ], which forms a binary tree of strong classifiers using a boosting approach to train each node by combining a set of weak classifiers. However, recent advances in GPU processing and availability of data for training have led to a rapid expansion in neural nets and deep learning for regression and classification [ 98 ]. Deep learning methods instead optimize simultaneously for the decision (classification or regression) whilst identifying and learning suitable input features. Thus, in lieu of feature engineering, learning how to represent data and how to solve for the decision are now done in a completely data-driven fashion, notwithstanding the existence of approaches combining feature-engineering and deep learning [ 99 ]. Exemplar deep learning approaches for medical imaging purpose are discussed in the next subsections.

C. Deep Learning for Segmentation

One of the earliest applications of convolutional neural networks (CNN, the currently most common form of deep learning) has appeared as early as 1995, where a CNN was used for lung nodule detection in chest x-rays [ 100 ]. Since then, fueled by the revolutionary results of AlexNet [ 101 ] and incarnations of patch-based adaptations of Deep Boltzmann Machines and stacked autoencoders, deep learning based segmentation of anatomy and pathology has witnessed a revolution (see also Table II ), where for some tasks now we observe human level performance [ 102 ]. In this section, we aim to analyse key works and trends in the area, while we point readers to relevant, thorough reviews in [ 69 ], [ 70 ].

Selected Deep Learning Methods for Medical Image Segmentation and Classification

US: Ultrasound; MRI: Magnetic Resonance Imaging; DCE-MRI: Dynamic Contrast Enhancement MRI; CT: Computed Tomography; PET: Positron Emission Tomography; GBM: Glioblastoma; LGG: Lower-Grade Glioma; CNN: Convolutional Neural Networks.

The major draw of deep learning and convolutional architectures is the ability to learn suitable features and decision functions in tandem. While AlexNet quickly set the standard for classification (that was profusely adapted also for classification of medical tasks, see next subsection) it was the realisation that dense predictions can be obtained from classification networks by convolutionalization that enabled powerful segmentation algorithms [ 103 ]. The limitations of such approaches for medical image segmentation were quickly realised and led to the discovery of U-Net [ 104 ], which is even today one of the most successful architectures for medical image segmentation.

The U-Net is simple in its conception: an encoder-decoder network that goes through a bottleneck but contains skip connections from encoding to decoding layers. The skip connections allow the model to be trained even with few input data and offer highly accurate segmentation boundaries, albeit perhaps at the “loss” of a clearly determined latent space. While the original U-Net was 2D, in 2016, the 3D U-net was proposed that allowed full volumetric processing of imaging data [ 105 ], maintaining the same principles of the original U-net.

Several works were inspired by treating image segmentation as an image-to-image translation (and synthesis) problem. This introduced a whole cadre of approaches that permit for unsupervised and semi-supervised learning working in tandem with adversarial training [ 106 ] to augment training data leveraging label maps or input images from other domains. The most characteristic examples are works inspired by CycleGAN [ 107 ]. CycleGAN allows mapping of one image domain to another image domain even without having pairs of images. Early on Chartsias et al. , used this idea to generate new images and corresponding myocardial segmentations mapping CT to MRI images [ 108 ]. Similarly, Wolterink et al. used it in the context of brain imaging [ 109 ]. Both these approaches paired and unpaired information (defining a pair as an input image and its segmentation) differently to map between different modalities (MR to CT) or different MR sequences.

Concretely rooted in the area of semi-supervised learning [ 110 ] are approaches that use discriminators to approximate distributions of shapes (and thus act as shape priors), to solve the segmentation task in an unsupervised manner in the heart or the brain [ 111 ]. However, in the context of cardiac segmentation, the work of Chartsias et al. , showed that when combined with auto-encoding principles and factorised learning, a shape-prior aided with reconstruction objectives offer a compelling solution to semi-supervised learning for myocardial segmentation [ 112 ].

We highlight that all the above works treat expert delineations as ground truth, whereas our community is well aware of the variability in the agreement between experts in delineation tasks. Inspired by aforementioned, Kohl et al. devised a probabilistic U-Net, where the network learns from a variety of annotations without need to provide (externally) a consensus [ 113 ]. However, we note that use of supervision via training exemplars as a signal could be limited and may not fully realize the potential of deep learning.

D. Deep Learning for Classification

Deep learning algorithms have been extensively used for disease classification, or screening, and have resulted in excellent performance in many tasks (see Table II ). Applications include screening for acute neurologic events [ 114 ], diabetic retinopathy [ 115 ], and melanoma [ 116 ].

Like segmentation, these classification tasks have also benefited from CNNs. Many of the network architectures that have been proven on the ImageNet image classification challenge [ 117 ] have seen reuse for medical imaging tasks by fine-tuning previously trained layers. References [ 118 ] and [ 119 ] were among the first that assessed the feasibility of using CNN-based models trained on large natural image datasets, for medical tasks. In [ 118 ], the authors showed that pre-training a model on natural images and fine-tuning its parameters for a new medical imaging task gave excellent results. These findings were reinforced in [ 120 ] to demonstrate that fine-tuning a pre-trained model generally performs better than a model trained from scratch. Ensembles of pre-trained models can also be fine-tuned to achieve strong performance as demonstrated in [ 121 ].

This transfer learning approach is not straightforward, however, when the objective is tissue classification of 3D image data. Here, transfer learning from natural images is not possible without first condensing the 3D data into two dimensions. Practitioners have proposed a myriad of choices on how to handle this issue, many of which have been quite successful. Alternative approaches directly exploit the 3D data by using architectures that perform 3D convolutions and then train the network from scratch on 3D medical images [ 122 ]–[ 126 ]. Other notable techniques include slicing 3D data into different 2D views before fusing to obtain a final classification score [ 127 ]. Learning lung nodule features using a 2D autoencoder [ 128 ] and then employing a decision tree for distinguishing between benign nodules and malignant ones was proposed in [ 129 ].

Development of an initial network–in which transfer learning is dependent – is often difficult and time-consuming. Automated Machine Learning (AutoML) has eased this burden by finding optimal networks hyperparameters [ 130 ] and, more recently, optimal network architectures [ 131 ]. We suspect these high-level training paradigms will soon impact medical image analysis.

Overall, irrespective of the training strategy used, classification tasks in medical imaging are dominated by some formulation of a CNN – often with fully-connected layers at the end to perform the final classification. With bountiful training data, CNNs can often achieve state-of-the-art performance; however, deep learning methods generally suffer with limited training data. As discussed, transfer learning has been beneficial in coping with scant data, but the continued availability of large, open datasets of medical images will play a big part in strengthening classification tasks in the medical domain.

E. CNN Interpretability

Although Deep CNNs have achieved extremely high accuracy, they are still black-box functions with multiple layers of nonlinearities. It is therefore essential to trust the output of these networks and to be able to verify that the predictions are from learning appropriate representations, and not from overfitting the training data. Deep CNN interpretability is an emerging area of machine learning research targeting a better understanding of what the network has learned and how it derives its classification decisions. One simple approach consists of visualizing the nearest neighbors of image patches in the fully connected feature space [ 101 ]. Another common approach that is used to shed light on the predictions of Deep CNN is based on creating saliency maps [ 132 ] and guided backpropagation [ 133 ], [ 134 ]. These approaches aim to identify voxels in an input image that are important for classification based on computing the gradient of a given neuron at a fixed layer with respect to voxels in the input image. Another similar approach, that is not specific to an input image, uses gradient ascent optimization to generate a synthetic image that maximally activates a given neuron [ 135 ]. Feature inversion, where the difference between an input image and its reconstruction from a representation at a given layer, is another approach that can capture the relevant patches of the image at the considered layer [ 136 ]. Other methods for interpreting and understanding deep networks can be found in [ 137 ]–[ 139 ]. Specifically, for medical imaging, techniques described in [ 140 ] interpret predictions in a visually and semantically meaningful way while task-specific features in [ 141 ] are developed such that their deep learning system can make transparent classification predictions. Another example uses multitask learning to model the relationship between benign-malignant and eight other morphological attributes in lung nodules with the goal of an interpretable classification [ 142 ]. Importantly, due diligence must be done during the design of CNN systems in the medical domain to ensure spurious correlations in the training data are not incorrectly learned.

F. Interpretation and Understanding

Once object geometry and function has been quantified, patient cohorts can be studied in terms of the statistical variation of shape and motion across large numbers of cases. In the Multi-Ethnic Study of Atherosclerosis, heart shape variations derived from MRI examinations were associated with known cardiovascular risk factors [ 143 ]. Moreover, application of imaging informatics methodologies in the cardiovascular system has produced important new knowledge and has improved our understanding of normal function as well as of pathophysiology, diagnosis and treatment of cardiovascular disorders [ 144 ]. In the brain, atlas-based neuroinfoimatics enables new information on structure to predict neurodegenerative diseases [ 145 ].

At the same time, it is also possible to extract information on biophysical parameters of tissues and organs from medical imaging data. For example, in elastography, it is possible to estimate tissue compliance from the motion of wave imaged using ultrasound or MRI [ 146 ], whereas in the heart, myocardial stiffness is associated with disease processes. Given knowledge of the boundary loading, and imaged geometry and displacements, finite element analysis can estimate material properties compatible with the imaged deformation [ 147 ].

V. Processing, Analysis, and Understanding in Digital Pathology

Pathology classifications and interpretations have traditionally been developed through pathologist examination of tissue prepared on glass slides using microscopes. Analyses of single tissue and TMA images have the potential to extract highly detailed and novel information about the morphology of normal and diseased tissue and characterization of disease mechanics at the sub-cellular scale. Studies have validated and shown the value of digitized tissue slides in biomedical research [ 148 ]–[ 152 ]. Whole slide images can contain hundreds of thousands or more cells and nuclei. Detection, segmentation and labeling of slide tissue image data can thus lead to massive, information rich datasets. These datasets can be correlated to molecular tumor characteristics and can be used to quantitatively characterize tissue at multiple spatial scales to create biomarkers that predict outcome and treatment response [ 150 ], [ 152 ]–[ 154 ]. In addition, multiscale tissue characterizations can be employed in epidemiological and surveillance studies. The National Cancer Institute SEER program is exploring the use of whole slide imaging extracted features to add cancer biology phenotype data to its surveillance efforts. Digital pathology has made great strides in the past 20 years. A good review of challenges and advancements in digital pathology is provided in several publications [ 155 ]–[ 157 ]. Whole slide imaging is also now employed at some sites for primary anatomic pathology diagnostics. In light of advances in imaging instruments and software, the FDA approved in 2017 the use of a commercial digital pathology system in clinical settings [ 158 ]. A summary of AI-based medical imaging systems that have obtained FDA approval appear in Table III .

AI-Based Medical Imaging Systems With FDA-Approval

US: Ultrasound; MRI: Magnetic Resonance Imaging; CT: Computed Tomography; PET: Positron Emission Tomography.

A. Segmentation and Classification

Routine availability of digitized pathology images, coupled with well-known issues associated with inter-observer variability in how pathologists interpret studies [ 159 ], has led to increased interest in computer-assisted decision support systems. Image analysis algorithms, however, have to tackle several challenges in order to efficiently, accurately and reliably extract information from tissue images. Tissue images contain a much denser amount of information than many other imaging modalities, encoded at multiple scales (pixels, objects such as nuclei and cells, and regions such as tumor and stromal tissue areas). This is further compounded by heterogeneity in structure and texture characteristics across tissue specimens from different disease regions and subtypes. A major challenge in pathology decision support also arises from the complex and nuanced nature of many pathology classification systems. Classifications can hinge of the fraction of the specimen found to have one or another pattern of tissue abnormality. In such cases, the assessment of abnormality and the estimate of tissue area are both subjective. When interpretation could only be carried out using glass slides, the profound way of reducing inter-observer variability was for multiple pathologists to view the same glass slides and to confer on interpretation. These challenges have motivated many efforts for the development of image analysis methods to automate whole slide image pathology interpretation. While few of these methods have found their way into clinical practice, results are promising and seem almost certain to ultimately lead to the development of effective methods to routinely provide algorithmic anatomic pathology second opinions. A comprehensive review of these initiatives appears in [ 160 ]–[ 162 ].

Some of the earlier works employed statistical techniques and machine learning algorithms to segment and classify tissue images. Bamford and Lovell, for example, used active contours to segment nuclei in Pap stained cell images [ 163 ]. Malpica et al. applied watershed-based algorithms for separation of nuclei in cell clusters [ 164 ]. Kong et al. utilized a combination of grayscale reconstruction, thresholding, and watershed-based methods [ 165 ]. Gao et al. adapted a hierarchical approach based on mean-shift and clustering analysis [ 166 ]. Work by Al-Kofahi et al. implemented graph-cuts and multiscale filtering methods to detect nuclei and delineate their boundaries [ 167 ]. In recent years, deep learning methods have rapidly grown in importance in pathology image analysis [ 160 ]. Deep learning approaches make it possible to automate many aspects of the information extraction and classification process. A variety of methods have been developed to classify tissue regions or whole slide images, depending on the context and the disease site. Classifications can hinge on whether regions of tissue contain tumor, necrosis or immune cells. Classification can also target algorithmic assessment of whether tissue regions are consistent with pathologist descriptions of tissue patterns. An automated system for the analysis of lung adenocarcinoma based on nuclear features and WHO subtype classification using deep convolutional neural networks and computational imaging signatures was developed, for example, in [ 168 ]. There has been a wealth of work over the past twenty years to classify histological patterns in different disease sites and cancer types (e.g. Gleason Grade in prostate cancer, lung cancer, breast cancer, melanoma, lymphoma and neuroblastoma) using statistical methods and machine and deep learning techniques [ 154 ], [ 169 ], [ 170 ].

Detection of cancer metastases is an important diagnostic problem to which machine-learning methods have been applied. The CAMELYON challenges target methods for algorithmic detection and classification of breast cancer metastases in H&E whole slide lymph node sections [ 171 ]. The best performing methods employed convolutional neural networks differing in network architecture, training methods, and methods for pre- and post- processing. Overall, there has been ongoing improvement in performance of algorithms that detect, segment and classify cells and nuclei. These algorithms often form crucial components of cancer biomarker algorithms. Their results are used to generate quantitative summaries and maps of the size, shape, and texture of nuclei as well as statistical characterizations of spatial relationships between different types of nuclei [ 172 ]–[ 176 ]. One of the challenges in nuclear characterization is to generalize the task across different tissue types. This is especially problematic because generating ground truth datasets for training is a labor intensive and time-consuming process and requires the involvement of expert pathologists. Deep learning generative adversarial networks (GANs) have proved to be useful in generalizing training datasets in that respect [ 177 ].

B. Interpretation and Understanding

There is increasing attention paid to the role of tumor immune interaction in determining outcome and response to treatment. In addition, immune therapy is increasingly employed in cancer treatment. High levels of lymphocyte infiltration have been related to longer disease-free survival or improved overall survival (OS) in multiple cancer types [ 178 ] including early stage triple-negative and HER2-positive breast cancer [ 179 ]. The spatial distribution of lymphocytes with respect to tumor, tumor boundary and tumor associated stroma are also important factors in cancer prognosis [ 180 ]. A variety of recent efforts relies on deep learning algorithms to classify TIL regions in H&E images. One recent effort targeted characterization of TIL regions in lung cancer, while another, carried out in the context of TCGA Pan Cancer Immune group, looked across tumor types to correlate deep learning derived spatial TIL patterns with molecular data and outcome. A 3 rd study employed a structured crowd sourcing method to generate tumor infiltrating lymphocyte maps [ 152 ], [ 181 ]. These studies showed there are correlations between characterizations of TIL patterns, as analyzed by computerized algorithms, and patient survival rates and groupings of patients based on subclasses of immunotypes. These studies demonstrate the value of whole slide tissue imaging in producing quantitative evaluations of sub-cellular data and opportunities for richer correlative studies.

Although there has been some progress made in the development of automated methods for assessing TMA images, most of systems are limited by the fact that they are closed and proprietary; do not exploit the potential of advanced computer vision techniques; and/or do not conform with emerging data standards. In addition to the significant analytical issues, the sheer volume of data, text, and images arising from even limited studies involving tissue microarrays pose significant computational and data management challenges (see also Section VI.B ). Tumor expression of immune system-related proteins may reveal the tumor immune status which in turn can be used to determine the most appropriate choices for immunotherapy. Objective evaluation of tumor biomarker expression is needed but often challenging. For instance, human leukocyte antigen (HLA) class I tumor epithelium expression is difficult to quantify by eye due to its presence on both tumor epithelial cells and tumor stromal cells, as well as tumor-infiltrating immune cells [ 182 ].

To maximize the flexibility and utility of the computational imaging tools that are being developed, it will be necessary to address the challenge of batch affect, which arises due to the fact that histopathology tissue slides from different institutions show heterogeneous appearances as a result of differences in tissue preparation and staining procedures. Prediction models had been investigated as a means for reliably learning from one domain to map into a new domain directly. This was accomplished by introducing unsupervised domain adaptation to transfer the discriminative knowledge obtained from the source domain to the target domain without requiring re-labeling images at the target domain [ 183 ]. This paper has focused on analysis of Hematoxylin and Eosin (H&E) stained tissue images. H&E is one of the main tissue stains and is most commonly used stain in histopathology. Tissue specimens taken from patients are routinely stained with H&E for evaluation by pathologists for cancer diagnosis. There is a large body of image analysis research that targets H&E stained tissue as covered in this paper. In research and clinical settings other types of staining and imaging techniques, such as fluorescence microscopy and immunohistochemical techniques, are also employed [ 184 ]–[ 185 ]. These staining techniques can be used to boosting signal specific morphological features of tissue –e.g., emphasizing proteins and macromolecules in cells and tissue samples. An increasing number of histopathology imaging projects are targeting methods for analysis of images obtained from fluorescence microscopy and immunostaining techniques (e.g., [ 186 ]–[ 192 ]).

VI. Visualization and Navigation

A. biomedical 3d reconstruction and visualization.

Three-dimensional (3D) reconstruction concerns the detailed 3D surface generation and visualization of specific anatomical structures, such as arteries, vessels, organs, body parts and abnormal morphologies e.g. tumors, lesions, injuries, scars and cysts. It entails meshing and rendering techniques are used for completing the seamless boundary surface, generating the volumetric mesh, followed by smoothing and refinement. By enabling precise position and orientation of the patient’s anatomy, 3D visualization can contribute to the design of aggressive surgery and radiotherapy strategies, with realistic testing and verification, with extensive applications in spinal surgery, joint replacement, neuro-interventions, as well as coronary and aortic stenting [ 193 ]. Furthermore, 3D reconstruction constitutes the necessary step towards biomedical modeling of organs, dynamic functionality, diffusion processes, hemodynamic flow and fluid dynamics in arteries, as well as mechanical loads and properties of body parts, tumors, lesions and vessels, such as wall / shear stress and strain and tissue displacement [ 194 ].

In medical imaging applications with human tissues, registration of slices must be performed in an elastic form [ 195 ]. To that respect, feature-based registration appears more suitable in the case of vessels’ contours and centerline [ 196 ], while the intensity-based registration can be effectively used for image slices depicting abnormal morphologies such as tumors [ 197 ]. The selection of appropriate meshing and rendering techniques highly depends on the imaging modality and the corresponding tissue type. To this respect, Surface Rendering techniques are exploited for the reconstruction of 3D boundaries and geometry of arteries and vessels through the iso-contours extracted from each slice of intravascular ultrasound or CT angiography. Furthermore, NURBS are effectively used as a meshing technique for generating and characterizing lumen and media-adventitia surfaces of vascular geometric models, such as aortic, carotid, cerebral and coronary arteries, deployed for the reconstruction of aneurysms and atherosclerotic lesions [ 196 ], [ 198 ]. The representation of solid tissues and masses, i.e. tumors, organs and body parts, is widely performed by means of Volume Rendering techniques, such as ray-casting, since they are capable of visualizing the entire medical volume as a compact structure but also with great transparency, even though they might be derived from relatively low contrast image data.

The reconstruction process necessitates expert knowledge and guidance. However, this is particularly time consuming and hence not applicable in the analysis of larger numbers of patient-specific cases. For those situations, automatic segmentation and reconstruction systems are needed. The biggest problem with automatic segmentation and 3D reconstruction is the inability to fully automate the segmentation process, because of different imaging modalities, varying vessel geometries, and the quality of source images [ 199 ]. Processing of large numbers of images require fast algorithms for segmentation and reconstruction. There are several ways to overcome this challenge such as parallel algorithms for segmentation and application of neural networks as discussed in Sections IV – V , the use of multiscale processing techniques, as well as the use of multiple computer systems where each system works on an image in real time.

B. Data Management, Visualization and Processing in Digital Pathology

Digital pathology is an inherently interactive human-guided activity. This includes labeling data for algorithm development, visualization of images and features for tuning algorithms, as well as explaining findings, and finally gearing systems towards clinical applications. It requires interactive systems that can query the underlying data and feature management systems, as well as support interactive visualizations. Such interactivity is a prerequisite to wide-scale adoption of digital pathology in imaging informatics applications. There are a variety of open source systems that support visualization, management, and query of features, extracted from whole slide images along with the generation of whole slide image annotations and markups. One such system is the QuIP software system [ 201 ]. QuIP is an open-source system that uses the caMicroscope viewer [ 202 ] to support the interactive visualization of images, image annotations, and segmentation results as overlays of heatmaps or polygons. QuIP includes FeatureScape - a visual analytic tool that supports interactive exploration of feature and segmentation maps. Other open-source systems that carry out these or related tasks are QuPath [ 203 ], the Pathology Image Informatics Platform (PIIP) for visualization, analysis, and management [ 204 ], the Digital Slide Archive (DSA) [ 205 ] and Cytomine [ 206 ]. These platforms are designed for local (QuPath, PIIP) or web-based (QuIP, caMicroscope, DSA) visualization, management and analysis of whole slide images. New tools and methods are also being developed to support knowledge representation and indexing of imaged specimens based on advanced feature metrics. These metrics include computational biomarkers with similarity indices that enable rapid search and retrieval of similar regions of interest from large datasets of images. Together, these technologies will enable investigators to conduct high-throughput analysis of tissue microarrays composed of large patient cohorts, store and mine large data sets and generate and test hypotheses [ 200 ].

The processing of digital pathology images is a challenging activity, in part due to the size of whole-slide images, but also because of an abundance of image formats and the frequent need for human guidance and intervention during processing. There are some efforts towards the adoption of DICOM in digital pathology, including the availability of tools such as the Orthanc DICOMizer [ 207 ] that can convert a pyramidal tiled tiff file into a DICOM pathology file. caMicroscope [ 202 ] supports the visualization of DICOM pathology files over the DICOMWeb API [ 208 ]. These efforts are few and far between, and most solutions adopt libraries such as OpenSlide [ 209 ] or Bio-Formats [ 210 ] to navigate the plethora of open and proprietary scanner formats. Digital pathology algorithms work well with high resolution images to extract detailed imaging features from tissue data. Since digital pathology images can grow to a few GBs, compressed, per-image, the local processing of digital pathology images can be severely affected by the computational capacity of an interactive workstation. In such cases, some algorithms can work on regions of interest (ROI) identified by a user or on lower-resolution, down-sampled images. The growing popularity of containerization technologies such as Docker [ 211 ] has opened a new mechanism to distribute algorithms and pathology pipelines. There is also growing interest in the use of cloud computing for digital pathology, driven by the rapid decline in costs, making them increasingly cost-effective solutions for large-scale computing. A number of groups, predominantly in the genomics community, have developed solutions for deploying genomic pipelines on the cloud [ 212 ]–[ 214 ]. QuIP includes cloud-based pipelines for tumor infiltrating lymphocyte analysis and nuclear segmentation. These are available as APIs and deployed as containers as well as pipelines in workflow definition language (WDL) using a cross-platform workflow orchestrator, which supports multiple cloud and high performance computing (HPC) platforms. The work in this area is highly preliminary, but one that is likely to see widespread adoption in the forthcoming years. Applications include algorithm validation, deployment of algorithms in clinical studies and clinical trials, and algorithm development particularly in systems that employ transfer learning.

C. In Silico Modeling of Malignant Tumors

Applications of in-silico models evolve drastically in early diagnosis and prognosis, with personalized therapy planning, noninvasive and invasive interactive treatment, as well as planning of pre-operative stages, chemotherapy and radiotherapy (see Fig. 2 ). The potential of inferring reliable predictions on the macroscopic tumor growth is of paramount importance to the clinical practice, since the tumor progression dynamics can be estimated under the effect of several factors and the application of alternative therapeutic schemes. Several mathematical and computational models have been developed to investigate the mechanisms that govern cancer progression and invasion, aiming to predict its future spatial and temporal status with or without the effects of therapeutic strategies.

An external file that holds a picture, illustration, etc.
Object name is nihms-1742605-f0002.jpg

In silico modelling paradigm of cardiovascular disease with application to heart.

Recent efforts towards in silico modeling focus on multi-compartment models for describing how subpopulations of various cell types proliferate and diffuse, while they are computationally efficient. Furthermore, multiscale approaches link in space and time the interactions at different biological levels, such as molecular, microscopic cellular and macroscopic tumor scale [ 215 ]. Multi-compartment approaches can reflect the macroscopic volume expansion while they reveal particular tumor aspects, such as the spatial distributions of cellular densities of different phenotypes taking into account tissue heterogeneity and anisotropy issues, as well as the chemical microenvironment with the available nutrients [ 216 ]. The metabolic influence of oxygen, glucose and lactate is incorporated in multi-compartment models of tumor spatio-temporal evolution, enabling the formation of cell populations with different metabolic profile, proliferation and diffusion rates. Methodological limitations of such approaches relate mainly to reduced ability of simulating specific cellular factors (e.g. cell to cell adhesion) and subcellular-scale processes [ 217 ], which play an important role in regulating cellular behavior and determine tumor expansion/metastasis.

Recent trends in modeling seek to incorporate the macroscopic tumor progress along with dynamic changes of chemical ingredients (such as glucose, oxygen, chemotherapeutic drugs, etc), but also the influence of individual cell expressions resulting from the intracellular signaling cascades and gene characteristics. Along this direction, multiscale cancer models allow to link in space and time the different biological scales affecting the macroscopic tumor development. They facilitate model development in precision medicine under the 3R principles of in vivo experimentation related to replacement, reduction and refinement [ 218 ] of experimentation on life samples. Distinct spatial and temporal scales have been considered, such as the subcellular scale of molecular pathways and gene expressions, the microscopic-cellular level of individual cell’s behavior and phenotypic properties, the microenvironmental scale of the diffusing chemical ingredients, the tissue-multicellular extent of different cell-regions and the macroscopic scale of the tumor volume. The interconnection of the different levels is considered great challenge of in-silico models, through coupling of blood flow, angiogenesis, vascular remodeling, nutrient transport and consumption, as well as movement interactions between normal and cancer cells [ 219 ].

Despite the progress, challenging issues still remain in cancer growth models. Important factors include the ability to simulate tumor microenvironment, as well as cell-to-cell interactions, the effectiveness of addressing body heterogeneity and anisotropy issues with diffusion tensors, the potential of engaging the dynamically changing metabolic profile of tumor, and the ability of including interactions on cancer growth at biomolecular level, considering gene mutations and malignancy of endogenous receptors.

D. Digital Twins

In general, digital twin uses and applications benefit not only from CAD reconstruction tools but also engage dynamic modelling stemming from either theoretical developments or real-life measurements merging the Internet of Things with artificial intelligence and data analytics [ 220 ]–[ 221 ]. In this form, the digital equivalent of a complex human functional system enables the consideration of event dynamics, such as tumour growth or information transfer in epilepsy network, as well as a systemic response to therapy, such as response to pharmacogenomics or targeted radiotherapy [ 222 ].

Since the digital twin can incorporate modelling at different resolutions, from organ structure to cellular and genomic level, it may enable complex simulations [ 223 ] with the use of AI tools to integrate huge amounts of data and knowledge aiming at improved diagnostics and therapeutic treatments, without harming the patient. Furthermore, such a twin can also act as a framework to support human-machine collaboration in testing and simulating complex invasive operations without even engaging the patient.

VII. Integrative Analytics

A. medical imaging in the era of precision medicine.

Radiologists and pathologists are routinely called upon to evaluate and interpret a range of macroscopic and microscopic images to render diagnoses and to engage in a wide range of research activities. The assessments that are made ultimately lead to clinical decisions that determine how patients are treated and predict outcomes. Precision medicine is an emerging approach for administering healthcare that aims to improve the accuracy with which clinical decisions are rendered towards improving the delivery of personalized treatment and therapy planning for patients as depicted in Fig. 3 [ 67 ]. In that context, physicians have become increasingly reliant upon sophisticated molecular and genomic tests, which can augment standard pathology and radiology practices in order to refine stratification of patient populations and manage individual care. Recent advances in computational imaging, clinical genomics and high-performance computing now make it possible to consider multiple combinations of clinico-pathologic data points, simultaneously. Such advances provide unparalleled insight regarding the underlying mechanisms of disease progression and could be used to develop a new generation of diagnostic and prognostic metrics and tools. From a medical imaging perspective, radiogenomics paradigm integrates afore-described objectives towards advancing precision medicine.

An external file that holds a picture, illustration, etc.
Object name is nihms-1742605-f0003.jpg

Radiogenomics System Diagram: An abstract system diagram demonstrating the use of radiogenomics approaches in the context of precision medicine [ 68 ]. Based on the clinical case, (multi-modal) image acquisition is performed. Then, manual and/or automatic segmentation of the diagnostic regions of interest follows, driving quantitative and/or qualitative radiomic features extraction and machine learning approaches for segmentation, classification and inference. Alternatively, emerging deep learning methods using raw pixel intensities can be used for the same purpose. Radiogenomics approaches investigate the relationships between imaging and genomic features and how radiomics and genomics signatures, when processed jointly, can better describe clinical outcomes. On the other hand, radiomics research is focused on characterizing the relationship between quantitative imaging and clinical features.

B. Radiogenomics for Integrative Analytics

Radiomics research has emerged as a non-invasive approach of significant prognostic value [ 224 ]. Through the construction of imaging signatures (i.e., fusing shape, texture, morphology, intensity, etc., features) and their subsequent association to clinical outcomes, devising robust predictive models (or quantitative imaging biomarkers) is achieved [ 225 ]. Incorporating longitudinal and multi-modality radiology and pathology (see also Section VII.C ) image features further enhances the discriminatory power of these models. A dense literature demonstrates the potentially transforming impact of radiomics for different disease staging such as cancer, neurodegenerative, and cardiovascular diseases [ 224 ]–[ 228 ]. Going one-step further, radiogenomics methods extend radiomics approaches by investigating the correlation between, for example, a tumor’s characteristics in terms of quantitative imaging features and its molecular and genetic profiling [ 68 ]. A schematic representation of radiomic and radiogenomics approaches appears in Fig. 3 .

During the transformation from a benign to malignant state and throughout the course of disease progression, changes occur in the underlying molecular, histologic and protein expression patterns, with each contributing a different perspective and complementary strength. Clearly then, the objective is to generate surrogate imaging biomarkers connecting cancer phenotypes to genotypes, providing a powerful and yet non-invasive prognostic and diagnostic tool in the hands of physicians. At the same time, the joint development of radiogenomic signatures, involves the integrated mining of both imaging and -omics features, towards constructing robust predictive models that better correlate and describe clinical outcomes, as compared with imaging, genomics or histopathology alone [ 68 ].

The advent of radiogenomics research is closely aligned with associated advances in inter- and multi- institutional collaboration and the establishment of well curated, FAIR-driven repositories that encompass the substantial amount of semantically annotated (big) data, underpinning precision medicine (see Section III ). Such example is the TCIA and the TCGA repositories, which provide matched imaging, genetic and clinical data for over 20 different cancer types. Importantly, these data further facilitate consensus ratings on radiology images (e.g., MRI) of expert radiologists to alleviate inconsistencies that often arise due to subjective impressions and inter- and intra-observer variability [ 229 ]. Moreover, driven by the observation that objectivity and reproducibility improve when conclusions are based upon computer-assisted decision support [ 230 ]–[ 233 ], research initiatives from TCIA groups attempt to formalize methodological processes thus accommodating extensibility and explainability.

1) The TCIA/ TCGA Initiatives Paradigm:

The breast and glioma phenotype groups in TCIA, investigating breast invasive carcinoma (BRCA) and glioblastoma (GBM) and lower grade glioma (LGG), respectively, are examples of such initiatives. In this sequence, the breast phenotype group defined a total of 38 radiomics features driving reproducible radiogenomics research hypothesis testing [ 234 ]. Stemming from T1-weighted Dynamic Contrast Enhancement (DCE) MRI, radiomics features are classified into six phenotype categories, namely: (i) size (4) , (ii) shape (3) , (iii) morphology (3) , (iv) enhancement texture (14) , (v) kinetic curve (10) , and (vi) enhancement-variance kinetics (4) . Likewise, the glioma phenotype group relies on the VASARI feature set to subjectively interpret MRI visual cues. VASARI is a reference consensus schema composed of 30 descriptive features classified with respect to (i) non-enhanced tumor , (ii) contrast-enhanced tumor , (iii) necrosis , and (iv) edema . VASARI is widely used in corresponding radiogenomics studies driving the quantitative imaging analysis from a clinical perspective [ 235 ]. In terms of genetic analysis, features are extracted from the TCGA website, using enabling software such as the TCGA-Assembler.

Breast phenotype group studies documented significant associations between specific radiomics features (e.g., size and enhancement texture) and breast tumor staging. Moreover, they performed relatively well in predicting clinical receptor status, multigene assay recurrence scores (poor vs good prognosis), and molecular subtyping. Imaging phenotypes where further associated with miRNA and protein expressions [ 236 ]–[ 239 ].

At the same time, hypothesis testing in glioma phenotype group verified the significant association between certain radiomic and genomic features with respect to overall and progression free survival, while joint radiogenomic signatures were found to increase the predictive ability of generated models. Importantly, imaging features were linked to molecular GBM subtype classification (based on Verhaak and/ or Philips classification) providing for non-invasive prognosis [ 68 ], [ 240 ].

2) Deep Learning Based Radiogenomics:

While still at its infancy, relying mostly on transfer learning approaches, deep learning methods are projected to expand and transform radiomics and radiogenomics research. Indicative studies focusing on cancer research involve discriminating between Luminal A and other molecular subtypes for breast cancer [ 241 ], predicting bladder cancer treatment response [ 242 ], IDH1 mutation status for LGG [ 243 ], [ 244 ], and MGMT methylation status for GBM [ 245 ], as well as predicting overall survival for GBM patients [ 246 ] and non-disease specific subjects [ 247 ].

C. Integrative Analytics in Digital Pathology

Recently, the scope of image-based investigations has expanded to include synthesis of results from pathology images, genome information and correlated clinical information. For example a recent set of experiments utilized 86 breast cancer cases from the Genomics Data Commons (GDC) repository to demonstrate that using a combination of image- based and genomic features served to improve classification accuracy significantly [ 248 ]. Other work demonstrated the potential of utilizing a combination of genomic and computational imaging signatures to characterize prostate cancer. The results of the study show that integrating image biomarkers from CNN with a recurrence network model, called long short-term memory LSTM and genomic pathway scores, is more strongly correlated with a patient’s recurrence of disease as compared to using standard clinical markers and image-based texture features [ 249 ]. An important computational issue is how to effectively integrate the omics data with digitized pathology images for biomedical research. Multiple statistical and machine learning methods have been applied for this purpose including consensus clustering [ 250 ], linear classifier [ 251 ], LASSO regression modeling [ 252 ], and deep learning [ 253 ]. These methods have been applied to studies on cancers, including breast [ 250 ], lung [ 252 ], and colorectal [ 253 ]. The studies not only demonstrated that integration of morphological features extracted from digitized pathology images and -omics data can improve the accuracy of prognosis but also provided insights on the molecular basis of cancer cell and tissue organizations. For instance, Yuan et al. [ 251 ] showed that morphological information on TILs combined with gene expression data can significantly improve prognosis prediction for ER-negative breast cancers while the distribution patterns for TILs and the related genomics information are characterized for multiple cancers in [ 152 ]. These works led to new directions on integrative genomics for both precision medicine and biological hypothesis generation.

As an extension of the work that is already underway using multi-modal combinations of image and genomic signatures to help support the classification of pathology specimens, there have been renewed efforts to develop reliable, content-based retrieval (CBR) strategies. These strategies aim to automatically search through large reference libraries of pathology samples to identify previously analyzed lesions which exhibit the most similar characteristics to a given query case. They also support systematic comparisons of tumors within and across patient populations while facilitating future selection of appropriate patient cohorts. One of the advantages of CBR systems over traditional classifier-based systems is that they enable investigators to interrogate data while visualizing the most relevant profiles [ 254 ]. However, CBR systems have to deal with very large and high-dimensional datasets, the complexity of which can easily render simple feature concatenation inefficient and insufficiently robust. It is often desirable to utilize hashing techniques to encode the high-dimensional feature vectors extracted from computational imaging signatures and genomic profiles so that they can be encapsulated into short binary vectors, respectively. Hashing-based retrieval approaches are gaining popularity in the medical imaging community due to their exceptional efficiency and scalability [ 255 ].

VIII. Concluding Remarks & Future Directions

Medical imaging informatics has been driving clinical research, translation, and practice for over three decades. Advances in associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum enabling informed, more accurate diagnosis, timely prognosis, and effective treatment planning. Among AI-based research-driven approaches that have obtained approval from the Food and Drug Administration (FDA), a significant percentage involves medical imaging informatics [ 256 ]. FDA is the US official regulator of medical devices and more recently software-as-a-medical-device (SAMD) [ 257 ]. These solutions rely on machine- or deep-learning methodologies that perform various image analysis tasks, such as image enhancement (e.g. SubtlePET/MR, IDx-DR), segmentation and detection of abnormalities (e.g. Lung/LiverAI, OsteoDetect, Profound AI), as well as estimation of likelihood of malignancy (e.g. Transpara). Radiology images are mostly addressed in these FDA-approved applications, and, to a lower degree, digital pathology images (e.g. Paige AI). Table III summarizes existing FDA-approved AI-based solutions. We expect significant growth in systems obtaining FDA-approval these numbers in the near future.

Hardware breakthroughs in medical image acquisition facilitate high-throughput and high-resolution images across imaging modalities at unprecedented performance and lower induced radiation. Already deep in the big medical data era, imaging data availability is only expected to grow, complemented by massive amounts of associated data-rich EMR/ EHR, -omics, and physiological data, climbing to orders of magnitude higher than what is available today. As such, the research community is struggling to harness the full potential of the wealth of data that are now available at the individual patient level underpinning precision medicine.

Keeping up with storage, sharing, and processing while preserving privacy and anonymity [ 258 ], [ 259 ], has pushed boundaries in traditional means of doing research. Driven by the overarching goal of discovering actionable information, afore-described challenges have triggered new paradigms in an effort to standardize involved workflows and processes towards accelerating new knowledge discovery. Such initiatives include multi-institutional collaboration with extended research teams’ formation, open-access datasets encompassing well-annotated (extensible) large-cohorts, and reproducible and explainable research studies with analysis results augmenting existing data.

Imaging researchers are also faced with challenges in data management, indexing, query and analysis of digital pathology data. One of the main challenges is how to manage relatively large-scale, multi-dimensional data sets that will continue to expand over time since it is unreasonable to exhaustively compare the query data with each sample in a high-dimensional database due to practical storage and computational bottlenecks [ 255 ]. The second challenge is how to reliably interrogate the characteristics of data originating from multiple modalities.

In that sequence, data analytics approaches have allowed the automatic identification of anatomical areas of interest as well as the description of physiological phenomena, towards in-depth understanding of regional tissue physiology and pathophysiology. Deep learning methods are currently dominating new research endeavours. Undoubtedly, research in deep learning applications and methods is expected to grow, especially in in view of documented advances across the spectrum of healthcare data, including EHR [ 260 ], genomic [ 261 ], [ 262 ], physiological parameters [ 263 ], and natural language data processing [ 264 ]. Beyond the initial hype, deep learning models managed in a short time to optimize critical issues pertaining to methods generalization, overfitting, complexity, reproducibility and domain dependence.

However, the primary attribute behind deep learning success has been the unprecedented accuracy in classification, segmentation, and image synthesis performance, consistently, across imaging modalities, and for a wide range of applications.

Toward this direction, transfer learning approaches and uptake in popular frameworks supported by a substantial community base has been catalytic. In fact, fine-tuning and feature extraction transfer learning approaches as well as inference using pre-trained networks can be now invoked as would any typical programming function, widening the deep learning research base and hence adoption in new applications.

Yet, challenges remain, calling for breakthroughs ranging from explainable artificial intelligence methods leveraging advanced reasoning and 3D reconstruction and visualization, to exploiting the intersection and merits of traditional (shallow) machine learning techniques performance and deep learning methods accuracy, and most importantly, facilitating clinical translation by overcoming generalization weaknesses induced by different populations. The latter potentially being due to training with small datasets.

At the same time, we should highlight a key difference in the medical domain. Deep learning-based computer vision tasks have been developed on “enormous” data of natural images that go beyond ImageNet (see for example the efforts of Google, and Facebook). This paradigm is rather worrying as in the medical domain matching that size is not readily possible. While in medicine we can still benefit from advances in transfer learning methods and computational efficiency [ 265 ], [ 266 ] in the future we have to consider how can we devise methods that rely on fewer data to train that can still generalize well. From an infrastructure perspective, computational capabilities of exascale computing driven by ongoing deep learning initiatives, such as the CANDLE initiative, project revolutionary solutions [ 267 ].

Emerging radiogenomics paradigms are concerned with developing integrative analytics approaches, in an attempt to facilitate new knowledge harvesting extracted from analysing heterogeneous (non-imaging), multi-level data, jointly with imaging data. In that sequence, new insights with respect to disease aetiology, progression, and treatment efficacy can be generated. Toward this direction, integrative analytics approaches are systematically considered for in-silico modelling applications, where biological processes guiding, for example, a tumour expansion and metastasis, need to be modelled in a precise and computationally efficient manner. For that purpose, investigating the association between imaging and -omics features is of paramount importance towards constructing advanced multi-compartment models that will be able to accurately portray proliferation and diffusion of various cell types’ subpopulations.

In conclusion, medical imaging informatics advances are projected to elevate the quality of care levels witnessed today, once innovative solutions along the lines of selected research endeavors presented in this study are adopted in clinical practice, and thus potentially transforming precision medicine.

Contributor Information

Andreas S. Panayides, Department of Computer Science, University of Cyprus, 1678 Nicosia, Cyprus.

Amir Amini, Electrical and Computer Engineering Department, University of Louisville, Louisville, KY 40292 USA.

Nenad D. Filipovic, University of Kragujevac, 2W94+H5 Kragujevac, Serbia.

Ashish Sharma, Emory University Atlanta, GA 30322 USA.

Sotirios A. Tsaftaris, School of Engineering, The University of Edinburgh, EH9 3FG, U.K. The Alan Turing Institute, U.K.

Alistair Young, Department of Anatomy and Medical Imaging, University of Auckland, Auckland 1142, New Zealand.

David Foran, Department of Pathology and Laboratory Medicine, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA.

Nhan Do, U.S. Department of Veterans Affairs Boston Healthcare System, Boston, MA 02130 USA.

Spyretta Golemati, Medical School, National and Kapodistrian University of Athens, Athens 10675, Greece.

Tahsin Kurc, Stony Brook University,, Stony Brook, NY 11794 USA.

Kun Huang, School of Medicine, Regenstrief Institute, Indiana University, IN 46202 USA.

Konstantina S. Nikita, Biomedical Simulations and Imaging Lab, School of Electrical and Computer Engineering, National Technical University of Athens, Athens 157 80, Greece.

Ben P. Veasey, Electrical and Computer Engineering Department, University of Louisville, Louisville, KY 40292 USA.

Michalis Zervakis, Technical University of Crete, Chania 73100, Crete, Greece.

Joel H. Saltz, Stony Brook University,, Stony Brook, NY 11794 USA.

Constantinos S. Pattichis, Department of Computer Science of the University of Cyprus, 1678 Nicosia, Cyprus, and also with the Research Centre on Interactive Media, Smart Systems and Emerging Technologies (RISE CoE), 1066 Nicosia, Cyprus.


  1. Recent Advances in Medical Image Processing

    It helps medical personnel to make an early and more accurate diagnosis. Recently, the deep convolution neural network is emerging as a principal machine learning method in computer vision and has received significant attention in medical imaging. Key Message: In this paper, we will review recent advances in artificial intelligence, machine ...

  2. Medical image analysis based on deep learning approach

    Classification, detection, and segmentation are essential tasks in medical image processing . For specific deep learning tasks in medical applications, the training of deep neural networks needs a lot of labeled data. ... The potential future research for medical image analysis is the designing of deep neural network architectures using deep ...

  3. Recent advances and clinical applications of deep learning in medical

    1. Introduction. In the current clinical practice, accuracy of detection and diagnosis of cancers and/or many other diseases depends on the expertise of individual clinicians (e.g., radiologists, pathologists) (Kruger et al., 1972), which results in large inter-reader variability in reading and interpreting medical images.In order to address and overcome this clinical challenge, many computer ...

  4. Medical image analysis based on deep learning approach

    Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning ...

  5. [2106.12864] A Systematic Collection of Medical Image Datasets for Deep

    A Systematic Collection of Medical Image Datasets for Deep Learning. The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training.

  6. (PDF) Deep Learning in Medical Image Analysis

    The purpose of this special issue (SI) "Deep Learning on Medical Image Analysis" is to present and highlight. novel algorithms, architectures, techniques, and applications of DL for medical ...

  7. Convolutional neural networks for medical image analysis: State-of-the

    In this paper, we provide a survey on convolutional neural networks in medical image analysis. First, we review the commonly used CNNs in medical image processing, including AlexNet, GoogleNet, ResNet, R-CNN, and FCNN. ... Recently, CNNs are being widely used by the medical imaging research community because of their outstanding performance in ...

  8. Advances in Deep Learning-Based Medical Image Analysis

    Although there exist a number of reviews on deep learning methods on medical image analysis [4-13], most of them emphasize either on general deep learning techniques or on specific clinical applications.The most comprehensive review paper is the work of Litjens et al. published in 2017 [].Deep learning is such a quickly evolving research field; numerous state-of-the-art works have been ...

  9. Critical Analysis of the Current Medical Image-Based Processing ...

    Medical image processing and analysis techniques play a significant role in diagnosing diseases. Thus, during the last decade, several noteworthy improvements in medical diagnostics have been made based on medical image processing techniques. In this article, we reviewed articles published in the most important journals and conferences that used or proposed medical image analysis techniques to ...

  10. Convolutional neural networks in medical image understanding ...

    Imaging techniques are used to capture anomalies of the human body. The captured images must be understood for diagnosis, prognosis and treatment planning of the anomalies. Medical image understanding is generally performed by skilled medical professionals. However, the scarce availability of human experts and the fatigue and rough estimate procedures involved with them limit the effectiveness ...

  11. Deep learning and medical image processing for ...

    In a desperate attempt to combat the COVID-19 pandemic, researches have been initiated on scientific studies in all directions, and DL integrated with medical image processing techniques have also been explored rigorously to find a definite solution (Hakak, Khan, Imran, Choo, & Shoaib, 2020; Iwendi et al., 2020).Numerous research publications have been published with similar objectives, as ...

  12. A Review of Deep-Learning-Based Medical Image Segmentation Methods

    As an emerging biomedical image processing technology, medical image segmentation has made great contributions to sustainable medical care. Now it has become an important research direction in the field of computer vision. With the rapid development of deep learning, medical image processing based on deep convolutional neural networks has become a research hotspot. This paper focuses on the ...

  13. Medical images classification using deep learning: a survey

    This paper discusses the different evaluation metrics used in medical imaging classification. Provides a conclusion and future directions in the field of medical image processing using deep learning. This is the outline of the survey paper. In Section 2, medical image analysis is discussed in terms of its applications.

  14. Artificial intelligence and machine learning for medical imaging: A

    Introduction. For the last decade, the locution Artificial Intelligence (AI) has progressively flooded many scientific journals, including those of image processing and medical physics. Paradoxically, though, AI is an old concept, starting to be formalized in the 1940s, while the term of artificial intelligence itself was coined in 1956 by John McCarthy.

  15. Medical Imaging 2020: Image Processing

    Medical Imaging 2020: Image Processing. Ivana Išgum, Bennett A. Landman. Proceedings Volume 11313. ... is a topic of active research. Many papers have formulated this question as a classification problem: one considers a fixed time of conversion and aims to discriminate between the patients who have converted to AD at that time and those who ...

  16. Medical image processing with contextual style transfer

    With recent advances in deep learning research, generative models have achieved great achievements and play an increasingly important role in current industrial applications. At the same time, technologies derived from generative methods are also under a wide discussion with researches, such as style transfer, image synthesis and so on. In this work, we treat generative methods as a possible ...

  17. Frontiers

    1. Introduction. The origin of radiology can be seen as the beginning of medical image processing. The discovery of X-rays by Röntgen and its successful application in clinical practice ended the era of disease diagnosis relying solely on the clinical experience of doctors (Glasser, 1995).The production of medical images provides doctors with more data, enabling them to diagnose and treat ...

  18. AI in Medical Imaging Informatics: Current Challenges and Future

    Abstract. This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the ...

  19. Research in Medical Imaging Using Image Processing Techniques

    Image processing increases the percentage and amount of detected tissues. This chapter presents the application of both simple and sophisticated image analysis techniques in the medical imaging field.

  20. A Parallel DNA Crypto Algorithm for Medical Image

    In this paper, we proposed a parallel DNA crypto algorithm to afford substantial security with decline in time complexity. The parallel task processing and parallel instructions processing with multithreads are exercised for parallel process. The medical image is segregated into four subparts, and parallelly, each subpart is remodelled into DNA ...