U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of jimaging

The Constantly Evolving Role of Medical Image Processing in Oncology: From Traditional Medical Image Processing to Imaging Biomarkers and Radiomics

Kostas marias.

1 Department of Electrical and Computer Engineering, Hellenic Mediterranean University, 71410 Heraklion, Greece; rg.umh@sairamk

2 Computational Biomedicine Laboratory (CBML), Foundation for Research and Technology—Hellas (FORTH), 70013 Heraklion, Greece

The role of medical image computing in oncology is growing stronger, not least due to the unprecedented advancement of computational AI techniques, providing a technological bridge between radiology and oncology, which could significantly accelerate the advancement of precision medicine throughout the cancer care continuum. Medical image processing has been an active field of research for more than three decades, focusing initially on traditional image analysis tasks such as registration segmentation, fusion, and contrast optimization. However, with the advancement of model-based medical image processing, the field of imaging biomarker discovery has focused on transforming functional imaging data into meaningful biomarkers that are able to provide insight into a tumor’s pathophysiology. More recently, the advancement of high-performance computing, in conjunction with the availability of large medical imaging datasets, has enabled the deployment of sophisticated machine learning techniques in the context of radiomics and deep learning modeling. This paper reviews and discusses the evolving role of image analysis and processing through the lens of the abovementioned developments, which hold promise for accelerating precision oncology, in the sense of improved diagnosis, prognosis, and treatment planning of cancer.

1. Introduction

To better understand the evolution of medical image processing in oncology, it is necessary to explain the importance of measuring tumor appearance from medical images. Medical image processing approaches contain useful diagnostic and prognostic information that can add precision in cancer care. In addition, because biology is a system of systems, it is reasonable to assume that image-based information may convey multi-level pathophysiology information. This has led to the establishment of many sophisticated predictive and diagnostic image-based biomarker extraction approaches in cancer. In more detail, medical image processing efforts are focused on extracting imaging biomarkers able to decipher the variation within individuals in terms of imaging phenotype, enabling the identification of patient subgroups for precision medicine strategies [ 1 ]. From the very beginning, the main prerequisite for clinical use was that quantitative biomarkers must be precise and reproducible. If these conditions are met, imaging biomarkers have the potential to aid clinicians in assessing the pathophysiologic changes in patients and better planning personalized therapy. This is important, as in clinical practice subjective characterizations might be used (e.g., average heterogeneity, speculated mass, necrotic core) which can decrease the precision of diagnostic processes.

Based on the above considerations, the extraction of quantitative parameters characterizing size, shape, texture, and activity can enhance the role of medical imaging in assisting in diagnosis or therapy response assessment. However, in clinical practice, only simpler image metrics (e.g., linear) are often used in oncology, especially in the evaluation of solid tumor response to therapy (e.g., a longer lesion diameter in RECIST). Both RECIST and WHO evaluation criteria rely on anatomical image measurements, mainly in CT or MRI data, and were originally developed mainly for cytotoxic therapies. Such linear measures suffer from high intra/inter-observed variability, which in some cases can compromise the accurate assessment of tumor response, since some studies report inter-observer RECIST variability of up to 30% [ 2 ]. Several studies have shown that 3D quantitative response assessments are better correlated with disease progression than those based on 1D linear measurements [ 3 ]. Nevertheless, traditional tumor quantification approaches based on linear or 3D tumor measures have experienced substantial difficulties in assessing response to newer oncology therapies, such as targeted, anti-angiogenic treatments and immunotherapies [ 2 ]. Size-based tumor assessments do not always represent tumor response to therapy, since, for example, tumors may display internal necrosis formation, with or without reduction in lesion size (as in traditional cytotoxic treatments). Even if RECIST criteria are constantly updated to address these issues, as in the case of Immune RECIST [ 4 ], such approaches still do not take into consideration a tumor’s image structure and texture over time. In addition the size and the location of metastases have been reported to play a significant role in assessing early tumor shrinkage and depth of response [ 5 ]. To address these limitations, medical image processing has provided over the last few decades the means to extract tumor texture and size descriptors for obtaining more detailed (e.g., pixel-based) descriptors of tissue structure and for discovering feature patterns connected to disease or response. In this paper, it is argued that the evolution of medical image processing has been a gradual process, and the diverse factors that contributed to unprecedented progress in the field with the use of AI are explained. Initially, simplistic approaches to classify benign and malignant masses, e.g., in mammograms, were based on traditional feature extraction and pattern recognition methods. Functional tomographic imaging such as PET gave rise to more sophisticated, model-based approaches from which quantitative markers from tissue properties could be extracted in an effort to optimize diagnosis, treatment stratification, and personalize response criteria. Lastly, the advancement of artificial intelligence enabled the more exhaustive search of imaging phenotype descriptors and led to the increased performance of modern diagnostic and predictive models.

2. Traditional Image Analysis: The First Efforts towards CAD Systems

In the 1990s, one of the first challenges in medical image analysis was to facilitate the interpretation of mammograms in the context of national screening programs for breast cancer. In the United Kingdom, the design of the first screening program was undertaken by a working group under Sir Patrick Forrest, whose report was accepted by the government in 1986. As a consequence, the UK screening program was established for women between 50 and 64 in 1990 [ 6 ]. The implementation of such screening programs throughout Europe led to the establishment of specialist breast screening centers and the formal training of both radiographers and radiologists. X-ray mammography proved to be a cost-effective imaging modality for national screening, and population screening led to smaller and usually non-palpable masses being increasingly detected.

As a result, the radiologist’s task became more complex, since the interpretation of a mammogram is challenging, due to the projective nature of mammography, while at the same time the need for early and accurate detection of cancer became pressing. To address these needs, medical image analysis became an active field of research in the early nineties, giving rise to numerous research efforts into cancer and microcalcification detection, as well as mammogram registration for improving the comparison of temporal mammograms. Figure 1 depicts the temporal mammogram registration concept towards CAD systems that would facilitate comparison and aid clinicians in early diagnose of cancer in screening mammography [ 7 ]. When the ImageChecker system was certified by the FDA for screening mammography in 1998, R2 Technology became the first company to employ computer-assisted diagnosis (CAD) for mammography, and later for digital mammography as well.

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g001.jpg

Traditional medical image processing on temporal mammograms. From left to right: the most recent mammogram ( a ) is registered to the previous mammogram ( b ), which is shown in ( c ). After registration there is one predominant region of significant difference in the subtraction image ( d ), which corresponds to a mass developed in the breast.

However, early diagnostic decision support systems suffered from low precision, which in turn could potentially lead to a negative impact in the number of unnecessary biopsies. In a relevant study [ 8 ], the positive predictive values of the interpretations worsened from 100%, 92.7%, and 95.5%, to 86.4%, 97.3%, and 91.1%, when mammograms were analyzed by three independent observers, with and without the CAD. This limitation was representative of the low generalizability of such cancer detection tools in these early days. At the same time the lack of more sophisticated imaging modalities hampered the research efforts towards predicting therapy response and optimizing therapy based on imaging data.

3. Quantitative Imaging Based on Models

With the advent of more sophisticated imaging modalities enabling functional imaging, medical image analysis efforts shifted towards the quantification of tissue properties. This opened new horizons in CAD systems towards translating image signals to cancer tissue properties such as perfusion and cellularity and developing more intuitive imaging biomarkers for several cancer imaging applications. For example, in the case of MRI, complex phenomena that occur after excitation are amenable to mathematical modeling, taking into consideration tissue interactions within the tumor microenvironment. In the context of evaluating a model-based approach, the model can be regarded reliable when the predicted data converges on the observed signal intensities and at the same time provides useful insights to radiologists and oncologists. MRI perfusion and diffusion imaging has been the main focus of such modeling efforts, not least due to fact that MRI is ionizing radiation-free.

Diffusion weighted MRI (DWI-MRI) is based on sequences sensitized to microscopic water mobility by means of strong gradient pulses and can provide quantitative information on tumor environment and architecture. Diffusivity can be assessed in the intracellular, extracellular, and intravascular spaces. Apparent diffusion coefficient (ADC) per pixel values derived from DWI-MRI theoretically have an inverse relationship to tumor cell density. In addition, with the introduction of the intravoxel incoherent motion (IVIM) model, both cellularity and microvascular perfusion information could be assessed after parametric modeling [ 9 ]. Figure 2 presents a parametric map of the stretching parameter α from the DWI-MRI stretched-exponential model (SEM), revealing highly heterogeneous parts of a dedifferentiated liposarcoma (DDLS) of Grade 3 [ 9 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g002.jpg

DWI-MRI stretched-exponential (SEM) DWI-MRI parametric map, revealing highly heterogeneous parts of a dedifferentiated liposarcoma (with permission from the department of Medical Imaging, Heraklion University Hospital). Heterogeneity index α ranges from 0 to 1, with lower values of α indicating microstructural heterogeneity.

DWI-MRI has been tested in most solid tumors for discriminating malignant from benign lesions, to automatize tumor grading, and to predict treatment response and post-treatment monitoring [ 10 ].

However, there is still a lack of standardization and generalization of these results, as well as validation against histopathology. While in clinical routine, in-depth DWI-MRI biomarker validation is difficult, recent pre-clinical studies have found that derived parametric maps can serve as a non-invasive marker of cell death and apoptosis in response to treatment [ 11 ]. To this end, they also confirmed significant correlations of ADC with immunohistochemistry measurements of cell density, cell death, and apoptosis.

In a similar fashion, in dynamic contrast-enhanced MRI (DCE-MRI), T1-weighted sequences are acquired before, during, and after the administration of a paramagnetic contrast agent (CA). Tissue-specific information about pathophysiology can be inferred from the dynamics of signal intensity in every pixel of the studied area. Usually this is performed by visual or semi-quantitative analysis from the signal time curves in selected regions of interest. However, with the use of pharmacokinetic modeling, e.g., between the intravascular and the extravascular extracellular space, it became possible to map signal intensities per pixel to CA concentration and then fit model parameters describing, e.g., interstitial space and transfer constant (ktrans). This enabled the generation of parametric maps, e.g., for ktrans providing more quantitative representation of tumor perfusion and heterogeneity within the tumor image region of interest. Although promising, e.g., for assessing treatment efficacy, such approaches have found limited use in clinical practice, not least due to the low reported reproducibility of model parameter estimation. One aspect of this problems is presented in the example shown in Figure 3 , where the use of image-driven methods based on multiple-flip angles produces a parametric map of a tumor with different contrast compared to the one produced with the Fritz–Hansen population based AIF [ 12 ]. This issue has several implications, including for the accuracy of assessing breast cancer response to neoadjuvant chemotherapy [ 13 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g003.jpg

( a ) ktrans map of a tumor from PK analysis using AIF measured directly from the MR image, while for the conversion from signal to CA concentration the multiple flip angles method (mFAs) was used, ( b ) ktrans map of the same tumor using a population based AIF from Fritz and Hansen.

In conclusion, the clinical translation of DWI and DCE MRI is hampered by low repeatability and reproducibility across several studies in oncology. To address this problem initiatives such as the Quantitative Imaging Biomarkers Alliance (QIBA) propose clinical and technological requirements for quantitative DWI and DCE-derived imaging biomarkers, as well as image acquisition, processing, and quality control recommendations aimed at improving reproducibility error, precision, and accuracy [ 14 ]. It is argued that this active area of medical image processing has not yet reached its full potential and still represents a complementary approach to AI driven methods, towards CAD systems for promoting precision oncology. In addition, the exploitation of multimodality imaging strategies (e.g., PET/MRI) can provide added value through the combination of anatomical and functional information.

4. Radiomics and Deep Learning Approaches in Oncology through the Cancer Continuum

Traditional cancer medical image analysis was for decades based on human-defined features, usually inspired by low-level image properties, such as intensity, contrast, and a limited number of texture measures. Such methods were successfully used. e.g., in cancer subclassification, but it was hard to capture the high-level, complex patterns that an expert radiologist uses to define the presence or absence of cancer [ 1 ].

However, with the advancement of machine learning and the availability of more powerful, high-performance computational infrastructures, it became possible to exhaustively analyze the texture and shape content of medical images in an effort to decipher high-level pathophysiology patterns. At the same time the evolution of texture representation and feature extraction, through a growing number of techniques during the last decades, played a catalytic role in better capturing tumor appearance through medical image analysis [ 15 ]. Last but not least, the need to decipher the imaging phenotype in cancer became even more pressing, due to the fact that the vast majority of visible phenotypic variation is now considered attributable to non-genetic determinants in chronic and age-associated disorders [ 1 ].

All these factors played a central role in the advancement of radiomics, where in analogy to genomics high-throughput feature extraction followed by ML enabled the development of significant discriminatory and predictive signatures, based on imaging phenotype. Radiomics have been enhanced with deep learning techniques, offering an alternative approach to medical image feature extraction by the learning of complex, high-level features in an automated fashion from a large number of medical images that contain variable instances of a particular tumor. Figure 4 illustrates the main AI/radiomics applications that can assist clinicians in adding precision in the management of cancer patients.

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g004.jpg

The main medical image processing applications enhanced with AI/radiomics towards precision oncology.

4.1. Cancer Screening

Recent advancements in AI driven medical image processing can have a positive impact in national cancer screening programs, alleviating the heavy workload of radiologists and aiding clinicians to reduce the number of missed cancers and to detect them at an earlier stage. Compared to the initial efforts mentioned in previous sections, recent AI-driven image processing can exceed the limits of human vision and potentially reduce the number of cancers missed in screening, as well as cope with inter-observer variability.

Regarding lung cancer screening, early nodule detection and classification is of paramount importance for improving patient outcomes and quality of life. Despite the existence of such screening programs the majority of lung cancers are detected in the later stages, leading to increased mortality and low 5-year survival rate [ 16 ]. To this end, radiomics and deep-learning-based methods have shown encouraging results towards precision pulmonary nodule evaluation [ 17 ]. A very interesting recent example is reported by Ardill et al., who developed a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Their model achieved a state-of-the-art performance (94.4% area under the curve) on 6716 cases and performed similarly on an independent clinical validation set of 1139 cases. When prior computed tomography imaging was not available, their model outperformed all six radiologists, with absolute reductions of 11% in false positives and 5% in false negatives [ 18 ].

Regarding breast cancer screening technologies, it is argued that AI may provide the means to limit the inherent drawbacks of mammography and enhance diagnostic performance and robustness. In a prospective clinical study, a commercially available AI algorithm was evaluated as an independent reader of screening mammograms, and adequate diagnostic performance was reported [ 19 ].

4.2. Precision Cancer Diagnosis

During the last decades CAD-driven precision diagnosis has been the holy grail of medical image processing research efforts. However, the clinical interest in such applications has significantly grown only recently with the advancement of AI-driven efforts to generalize performance across diverse datasets. AI systems have reported unprecedented performance regarding the segmentation and classification of cancer. A recent study reported increased performance in segmenting and classifying brain tumors into meningioma, glioma, and pituitary tumors [ 20 ].

In addition, a growing number of studies are concerned with automated tumor grading, which is a prerequisite for optimal therapy planning. Yang et al. presented a retrospective glioma grading study (grade II and grade III concerning low grade glioma and high grade glioma) on one hundred and thirteen glioma patients and used transfer learning with AlexNet and GoogLeNet architectures, achieving up to 0.939 AUC [ 21 ].

At the same time, the quest to decode imaging phenotype has given rise to efforts to correlate imaging features with molecular and genetic markers in the context of radio-genomics [ 22 ]. This promising field of research can provide surrogate molecular information directly from medical images and is not prone to biopsy sampling errors, as the whole tumor can be analyzed. In a recent study, MRI radiomics were able to predict IDH1 mutation with an AUC of up to 90% [ 23 ].

4.3. Treatment Optimization

There are many challenging problems in optimizing treatment for cancer patients, such as accurate segmentation of organs at risk (OAR) in radiotherapy and prediction of neoadjuvant chemotherapy response. Intelligent processing of medical images has opened new horizons to address these clinical needs. In the case of nasopharyngeal carcinoma radiotherapy planning, a deep learning organs-at-risk (OAR) detection and segmentation network provides useful insights for clinicians for the accurate delineation of OARs [ 24 ]. Regarding prediction of neoadjuvant chemotherapy, the use of image-based algorithms to predict outcome has the potential to add precision, not least due to the fact that depending on tumor subtype the outcome can differ significantly. To this end, recent studies report promising preliminary results in applying AI to predict breast cancer neoadjuvant therapy response. Vulchi et al. reported improved prediction of response to HER2-targeted neoadjuvant therapy based on deep learning of DCE-MRI data [ 25 ]. Notably, the AUC dropped from 0.93 to 0.85 in the external validation cohort.

5. Radiomics Limitations Regarding Clinical Translation

While promising, radiomics methodologies are still in a translational phase and thorough clinical validation is needed towards clinical translation. To this end, when these technologies are tested and reviewed, a number of important limitations becomes apparent. In a recent review on MRI based radiomics in nasopharyngeal cancer [ 26 ], the authors reviewed the state of the art and used a radiomic quality score assessment (RQS). Several limitations were highlighted in the reviewed studies, including the absence of a validation cohort in 21% of them, as well as the lack of external validation in 92% of them. In another RQS based evaluation study on radiomics and radio-genomics papers, the RQS was low regarding clinical utility, test-retest analysis, prospective study, and open science [ 27 ]. It was also very interesting that no single study used phantoms to assess the robustness of radiomics features or performed a cost-effectiveness analysis. In a similar fashion, lack of feature robustness assessment and external validation was reported in studies regarding prostate cancer [ 28 ], while the main reported shortcomings in the quality of the MRI lymphoma radiomics studies regarded inconsistencies in the segmentation process and the lack of temporal data to increase model robustness [ 29 ]. All these recent studies clearly indicate that, although medical image processing in oncology has evolved significantly, the clinical translation of radiomics is still hampered by the lack of extensive, high quality validation studies. In addition, the lack of standardization in radiomics extraction remains a problem, which is currently being investigated by several studies, with respect to different software packages [ 30 ] and the reproducibility of standardized radiomics features using multi-modality patient data [ 31 ].

6. Discussion

Contrary to common belief, medical image processing has been evolving for the last few decades and its main application is cancer image analysis. Traditional medical image processing was founded on classical image processing and computer vision principles, focusing on low-level feature extraction and simple classification tasks, e.g., benign vs. malignant, or in the geometrical alignment of temporal images and the segmentation of tumors for volumetric analyses. This early stage in the 1990s was an important milestone for further development, since several radiologists and oncologists understood the future potential and helped in the creation of a multidisciplinary community on medical image analysis and processing. More importantly, it laid the foundations of radiomics by proposing the shape and textural analysis of tumors as useful patterns for detection, segmentation, and classification. However, the main limitation was the high degree of fragmentation in such efforts, the limited computational resources, and the very low availability of cancer image data; usually being mammograms or MRIs.

Functional imaging was another important milestone for medical image computing, since the idea of transforming dynamic image signals to tissue properties paved the way for the discovery of reliable and reproducible image biomarkers for oncology. To achieve this goal, non-conventional medical image processing was deployed based on compartmental models to link the imaging phenotype with microscopic tumor environment properties, based on diffusion and perfusion. Such model-based approaches include compartment pharmacokinetic models for DCE-MRI and the IVIM model for DWI-MRI, often requiring laborious pre-processing to transform the original signal to quantitative parametric maps able to convey perfusion and cellularity information to the clinician. It is argued that this is still an evolving research field and that the potential for clinical translation is significant, especially since techniques such as DWI-MRI do not involve ionizing radiation or the administration of contrast agent. That said, significant standardization efforts are still required in order to converge on stable imaging protocols and model implementations that will guarantee reproducible parametric maps and robust cancer biomarkers. Another limitation when comparing to modern radiomics/deep learning efforts is that the processing of such functional data with compartmental models is a very demanding task, requiring a deeper understanding of imaging protocols, as well as of numerical analysis methods for model fitting.

The gradual advancements of high-performance computing and machine learning and neural networks have revolutionized research in the field, especially during the last decade. The field of radiomics has extended the cancer medical image processing concepts regarding texture and shape descriptors to massive feature extraction and modeling. Such radiomics approaches have also been enhanced by convolutional neural networks, which outperformed the traditional image analysis methods in tasks such as lesion segmentation, while introducing more sophisticated predictive, diagnostic, and correlative pipelines towards precision diagnostics, therapy optimization, and synergistic radio-genomic biomarker discovery. The availability of open access computational tools for machine and deep learning, in combination with public cancer image resources such as the Cancer Imaging Archive (TCIA), has led to an unprecedented number of publications, AI start-ups, and accelerated discussions for the establishment of AI regulatory processes and clinical translation of such technologies. At the same time, the main limitation of these impressive technologies has been their low explainability, which came as a tradeoff for the impressive performances in oncological applications throughout the cancer continuum. Low explainability also contributed to reduced trust in these models, while the vast number of features explored made generalization difficult, especially due to the large variability of image quality and imaging protocols across vendors and clinical sites.

Medical image processing is still evolving and will continue to provide useful tools and methodological concepts for improving cancer image analysis and interpretation. Data science approaches focusing on radiomics have paved the way for accelerating precision oncology [ 32 ]. However, most of the efforts to date only use imaging data, which limits the performance of diagnostic and prognostic tools. To this end, novel data integration paradigms, exploiting both imaging and multi-omics data, is a very promising field for future research [ 33 ]. Recent studies have started exploring the synergy of deep learning with quantitative parametric maps. In [ 34 ], the authors present a deep learning method to predict good responders of locally advanced rectal cancer trained on apparent diffusion coefficient (ADC) parametric scans from different vendors. The fusion of standard imaging representations with parametric maps, as well as integrative diagnostic approaches [ 35 ] involving medical image and other cancer related data, hold promise for increasing accuracy and trustworthiness.

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Conflicts of interest.

The author declares no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

jimaging-logo

Article Menu

image processing in research paper

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Developments in image processing using deep learning and reinforcement learning.

image processing in research paper

1. Introduction

2. methodology, 2.1. search process and sources of information, 2.2. inclusion and exclusion criteria for article selection, 3. technical background, 3.1. graphics processing units, 3.2. image processing, 3.3. machine learning overview.

  • In supervised learning, we can determine predictive functions using labeled training datasets, meaning each data object instance must include an input for both the values and the expected labels or output values [ 21 ]. This class of algorithms tries to identify the relationships between input and output values and generate a predictive model able to determine the result based only on the corresponding input data [ 3 , 21 ]. Supervised learning methods are suitable for regression and data classification, being primarily used for a variety of algorithms like linear regression, artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs), k-nearest neighbors (KNNs), random forest (RF), and others [ 3 ]. As an example, systems using RF and DT algorithms have developed a huge impact on areas such as computational biology and disease prediction, while SVM has also been used to study drug–target interactions and to predict several life-threatening diseases, such as cancer or diabetes [ 23 ].
  • Unsupervised learning is typically used to solve several problems in pattern recognition based on unlabeled training datasets. Unsupervised learning algorithms are able to classify the training data into different categories according to their different characteristics [ 21 , 24 ], mainly based on clustering algorithms [ 24 ]. The number of categories is unknown, and the meaning of each category is unclear; therefore, unsupervised learning is usually used for classification problems and for association mining. Some commonly employed algorithms include K-means [ 3 ], SVM, or DT classifiers. Data processing tools like PCA, which is used for dimensionality reduction, are often necessary prerequisites before attempting to cluster a set of data.

3.3.1. Deep Learning Concepts

  • Training a DNN implies the definition of a loss function, which is responsible for calculating the error made in the process given by the difference between the expected output value and that produced by the network. One of the most used loss functions in regression problems is the mean squared error (MSE) [ 30 ]. In the training phase, the weight vector that minimizes the loss function is adjusted, meaning it is not possible to obtain analytical solutions effectively. The loss function minimization method usually used is gradient descent [ 30 ].
  • Activation functions are fundamental in the process of learning neural network models, as well as in the interpretation of complex nonlinear functions. The activation function adds nonlinear features to the model, allowing it to represent more than one linear function, which would not happen otherwise, no matter how many layers it had. The Sigmoid function is the most commonly used activation function in the early stages of studying neural networks [ 30 ].
  • As their capacity to learn and adjust to data is greater than that of traditional ML models, it is more likely that overfitting situations will occur in DL models. For this reason, regularization represents a crucial and highly effective set of techniques used to reduce the generalization errors in ML. Some other techniques that can contribute to achieving this goal are increasing the size of the training dataset, stopping at an early point in the training phase, or randomly discarding a portion of the output of neurons during the training phase [ 30 ].
  • In order to increase stability and reduce convergence times in DL algorithms, optimizers are used, with which greater efficiency in the hyperparameter adjustment process is also possible [ 30 ].

3.3.2. Reinforcement Learning Concepts

3.4. current challenges, 4. image processing developments, 4.1. domains, 4.1.1. research using deep learning.

  • One of the first DL models used for video prediction, inspired by the sequence-to-sequence model usually used in natural language processing [ 97 ], uses a recurrent long and short term memory network (LSTM) to predict future images based on a sequence of images encoded during video data processing [ 97 ].
  • In their research, Salahzadeh et al. [ 98 ] presented a novel mechatronics platform for static and real-time posture analysis, combining 3 complex components. The components included a mechanical structure with cameras, a software module for data collection and semi-automatic image analysis, and a network to provide the raw data to the DL server. The authors concluded that their device, in addition to being inexpensive and easy to use, is a method that allows postural assessment with great stability and in a non-invasive way, proving to be a useful tool in the rehabilitation of patients.
  • Studies in graphical search engines and content-based image retrieval (CBIR) systems have also been successfully developed recently [ 11 , 82 , 99 , 100 ], with processing times that might be compatible with real-time applications. Most importantly, the corresponding results of these studies appeared to show adequate image retrieval capabilities, displaying an undisputed similarity between input and output, both on a semantic basis and a graphical basis [ 82 ]. In a review by Latif et al. [ 101 ], the authors concluded that image feature representation, as it is performed, is impossible to be represented by using a unique feature representation. Instead, it should be achieved by a combination of said low-level features, considering they represent the image in the form of patches and, as such, the performance is increased.
  • In their publication, Rani et al. [ 102 ] reviewed the current literature found on this topic from the period from 1995 to 2021. The authors found that researchers in microbiology have employed ML techniques for the image recognition of four types of micro-organisms: bacteria, algae, protozoa, and fungi. In their research work, Kasinathan and Uyyala [ 17 ] apply computer vision and knowledge-based approaches to improve insect detection and classification in dense image scenarios. In this work, image processing techniques were applied to extract features, and classification models were built using ML algorithms. The proposed approach used different feature descriptors, such as texture, color, shape, histograms of oriented gradients (HOG) and global image descriptors (GIST). ML was used to analyze multivariety insect data to obtain the efficient utilization of resources and improved classification accuracy for field crop insects with a similar appearance.

4.1.2. Research Using Reinforcement Learning

5. discussion and future directions, 6. conclusions.

  • Interest in image-processing systems using DL methods has exponentially increased over the last few years. The most common research disciplines for image processing and AI are medicine, computer science, and engineering.
  • Traditional ML methods are still extremely relevant and are frequently used in fields such as computational biology and disease diagnosis and prediction or to assist in specific tasks when coupled with other more complex methods. DL methods have become of particular interest in many image-processing problems, particularly because of their ability to circumvent some of the challenges that more traditional approaches face.
  • A lot of attention from researchers seems to focus on improving model performance, reducing computational resources and time, and expanding the application of ML models to solve concrete real-world problems.
  • The medical field seems to have developed a particular interest in research using multiple classes and methods of learning algorithms. DL image processing has been useful in analyzing medical exams and other imaging applications. Some areas have also still found success using more traditional ML methods.
  • Another area of interest appears to be autonomous driving and driver profiling, possibly powered by the increased access to information available both for the drivers and the vehicles alike. Indeed, modern driving assistance systems have already implemented features such as (a) road lane finding, (b) free driving space finding, (c) traffic sign detection and recognition, (d) traffic light detection and recognition, and (e) road-object detection and tracking. This research field will undoubtedly be responsible for many more studies in the near future.
  • Graphical search engines and content-based image retrieval systems also present themselves as an interesting topic of research for image processing, with a diverse body of work and innovative approaches.

Author Contributions

Institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest, abbreviations.

AIArtificial Inteligence
MLMachine Learning
DLDeep Learning
CBIRContent Based Image Retrieval
CNNConvolutional Neural Network
DNNDeep Neural Network
DCNNDeep Convolution Neural Network
RGBRed, Green, and Blue
  • Raschka, S.; Patterson, J.; Nolet, C. Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Information 2020 , 11 , 193. [ Google Scholar ] [ CrossRef ]
  • Barros, D.; Moura, J.; Freire, C.; Taleb, A.; Valentim, R.; Morais, P. Machine learning applied to retinal image processing for glaucoma detection: Review and perspective. BioMed. Eng. OnLine 2020 , 19 , 20. [ Google Scholar ] [ CrossRef ]
  • Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022 , 1 , 107–116. [ Google Scholar ] [ CrossRef ]
  • Singh, V.; Chen, S.S.; Singhania, M.; Nanavati, B.; kumar kar, A.; Gupta, A. How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–A review and research agenda. Int. J. Inf. Manag. Data Insights 2022 , 2 , 100094. [ Google Scholar ] [ CrossRef ]
  • Moscalu, M.; Moscalu, R.; Dascălu, C.G.; Țarcă, V.; Cojocaru, E.; Costin, I.M.; Țarcă, E.; Șerban, I.L. Histopathological Images Analysis and Predictive Modeling Implemented in Digital Pathology—Current Affairs and Perspectives. Diagnostics 2023 , 13 , 2379. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, S.; Yang, D.M.; Rong, R.; Zhan, X.; Fujimoto, J.; Liu, H.; Minna, J.; Wistuba, I.I.; Xie, Y.; Xiao, G. Artificial Intelligence in Lung Cancer Pathology Image Analysis. Cancers 2019 , 11 , 1673. [ Google Scholar ] [ CrossRef ]
  • van der Velden, B.H.M.; Kuijf, H.J.; Gilhuijs, K.G.; Viergever, M.A. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal. 2022 , 79 , 102470. [ Google Scholar ] [ CrossRef ]
  • Prevedello, L.M.; Halabi, S.S.; Shih, G.; Wu, C.C.; Kohli, M.D.; Chokshi, F.H.; Erickson, B.J.; Kalpathy-Cramer, J.; Andriole, K.P.; Flanders, A.E. Challenges related to artificial intelligence research in medical imaging and the importance of image analysis competitions. Radiol. Artif. Intell. 2019 , 1 , e180031. [ Google Scholar ] [ CrossRef ]
  • Smith, K.P.; Kirby, J.E. Image analysis and artificial intelligence in infectious disease diagnostics. Clin. Microbiol. Infect. 2020 , 26 , 1318–1323. [ Google Scholar ] [ CrossRef ]
  • Wu, Q. Research on deep learning image processing technology of second-order partial differential equations. Neural Comput. Appl. 2023 , 35 , 2183–2195. [ Google Scholar ] [ CrossRef ]
  • Jardim, S.; António, J.; Mora, C. Graphical Image Region Extraction with K-Means Clustering and Watershed. J. Imaging 2022 , 8 , 163. [ Google Scholar ] [ CrossRef ]
  • Ying, C.; Huang, Z.; Ying, C. Accelerating the image processing by the optimization strategy for deep learning algorithm DBN. EURASIP J. Wirel. Commun. Netw. 2018 , 232 , 232. [ Google Scholar ] [ CrossRef ]
  • Protopapadakis, E.; Voulodimos, A.; Doulamis, A.; Doulamis, N.; Stathaki, T. Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing. Appl. Intell. 2019 , 49 , 2793–2806. [ Google Scholar ] [ CrossRef ]
  • Yong, B.; Wang, C.; Shen, J.; Li, F.; Yin, H.; Zhou, R. Automatic ventricular nuclear magnetic resonance image processing with deep learning. Multimed. Tools Appl. 2021 , 80 , 34103–34119. [ Google Scholar ] [ CrossRef ]
  • Freeman, W.; Jones, T.; Pasztor, E. Example-based super-resolution. IEEE Comput. Graph. Appl. 2002 , 22 , 56–65. [ Google Scholar ] [ CrossRef ]
  • Rodellar, J.; Alférez, S.; Acevedo, A.; Molina, A.; Merino, A. Image processing and machine learning in the morphological analysis of blood cells. Int. J. Lab. Hematol. 2018 , 40 , 46–53. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kasinathan, T.; Uyyala, S.R. Machine learning ensemble with image processing for pest identification and classification in field crops. Neural Comput. Appl. 2021 , 33 , 7491–7504. [ Google Scholar ] [ CrossRef ]
  • Yadav, P.; Gupta, N.; Sharma, P.K. A comprehensive study towards high-level approaches for weapon detection using classical machine learning and deep learning methods. Expert Syst. Appl. 2023 , 212 , 118698. [ Google Scholar ] [ CrossRef ]
  • Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. Reinforcement learning coupled with finite element modeling for facial motion learning. Int. J. Multimed. Inf. Retr. 2022 , 11 , 19–38. [ Google Scholar ] [ CrossRef ]
  • Zeng, Y.; Guo, Y.; Li, J. Recognition and extraction of high-resolution satellite remote sensing image buildings based on deep learning. Neural Comput. Appl. 2022 , 34 , 2691–2706. [ Google Scholar ] [ CrossRef ]
  • Pratap, A.; Sardana, N. Machine learning-based image processing in materials science and engineering: A review. Mater. Today Proc. 2022 , 62 , 7341–7347. [ Google Scholar ] [ CrossRef ]
  • Mahesh, B. Machine Learning Algorithms—A Review. Int. J. Sci. Res. 2020 , 9 , 1–6. [ Google Scholar ] [ CrossRef ]
  • Singh, D.P.; Kaushik, B. Machine learning concepts and its applications for prediction of diseases based on drug behaviour: An extensive review. Chemom. Intell. Lab. Syst. 2022 , 229 , 104637. [ Google Scholar ] [ CrossRef ]
  • Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations 2016, San Juan, Puerto Rico, 2–4 May 2016. [ Google Scholar ] [ CrossRef ]
  • Dworschak, F.; Dietze, S.; Wittmann, M.; Schleich, B.; Wartzack, S. Reinforcement Learning for Engineering Design Automation. Adv. Eng. Inform. 2022 , 52 , 101612. [ Google Scholar ] [ CrossRef ]
  • Khan, T.; Tian, W.; Zhou, G.; Ilager, S.; Gong, M.; Buyya, R. Machine learning (ML)-centric resource management in cloud computing: A review and future directions. J. Netw. Comput. Appl. 2022 , 204 , 103405. [ Google Scholar ] [ CrossRef ]
  • Botvinick, M.; Ritter, S.; Wang, J.X.; Kurth-Nelson, Z.; Blundell, C.; Hassabis, D. Reinforcement Learning, Fast and Slow. Trends Cogn. Sci. 2019 , 23 , 408–422. [ Google Scholar ] [ CrossRef ]
  • Moravčík, M.; Schmid, M.; Burch, N.; Lisý, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; Bowling, M. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 2017 , 356 , 508–513. [ Google Scholar ] [ CrossRef ]
  • ElDahshan, K.A.; Farouk, H.; Mofreh, E. Deep Reinforcement Learning based Video Games: A Review. In Proceedings of the 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 8–9 May 2022. [ Google Scholar ] [ CrossRef ]
  • Huawei Technologies Co., Ltd. Overview of Deep Learning. In Artificial Intelligence Technology ; Springer: Singapore, 2023; Chapter 1–4; pp. 87–122. [ Google Scholar ] [ CrossRef ]
  • Le, N.; Rathour, V.S.; Yamazaki, K.; Luu, K.; Savvides, M. Deep reinforcement learning in computer vision: A comprehensive survey. Artif. Intell. Rev. 2022 , 55 , 2733–2819. [ Google Scholar ] [ CrossRef ]
  • Melanthota, S.K.; Gopal, D.; Chakrabarti, S.; Kashyap, A.A.; Radhakrishnan, R.; Mazumder, N. Deep learning-based image processing in optical microscopy. Biophys. Rev. 2022 , 14 , 463–481. [ Google Scholar ] [ CrossRef ]
  • Winovich, N.; Ramani, K.; Lin, G. ConvPDE-UQ: Convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains. J. Comput. Phys. 2019 , 394 , 263–279. [ Google Scholar ] [ CrossRef ]
  • Pham, H.; Warin, X.; Germain, M. Neural networks-based backward scheme for fully nonlinear PDEs. SN Partial. Differ. Equ. Appl. 2021 , 2 , 16. [ Google Scholar ] [ CrossRef ]
  • Wei, X.; Jiang, S.; Li, Y.; Li, C.; Jia, L.; Li, Y. Defect Detection of Pantograph Slide Based on Deep Learning and Image Processing Technology. IEEE Trans. Intell. Transp. Syst. 2020 , 21 , 947–958. [ Google Scholar ] [ CrossRef ]
  • E, W.; Yu, B. The deep ritz method: A deep learning based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018 , 6 , 1–12. [ Google Scholar ] [ CrossRef ]
  • Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018 , 9 , 611–629. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Archarya, U.; Oh, S.; Hagiwara, Y.; Tan, J.; Adam, M.; Gertych, A.; Tan, R. A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 2021 , 89 , 389–396. [ Google Scholar ] [ CrossRef ]
  • Ha, V.K.; Ren, J.C.; Xu, X.Y.; Zhao, S.; Xie, G.; Masero, V.; Hussain, A. Deep Learning Based Single Image Super-resolution: A Survey. Int. J. Autom. Comput. 2019 , 16 , 413–426. [ Google Scholar ] [ CrossRef ]
  • Jeong, C.Y.; Yang, H.S.; Moon, K. Fast horizon detection in maritime images using region-of-interest. Int. J. Distrib. Sens. Netw. 2018 , 14 , 1550147718790753. [ Google Scholar ] [ CrossRef ]
  • Olmos, R.; Tabik, S.; Lamas, A.; Pérez-Hernández, F.; Herrera, F. A binocular image fusion approach for minimizing false positives in handgun detection with deep learning. Inf. Fusion 2019 , 49 , 271–280. [ Google Scholar ] [ CrossRef ]
  • Zhao, X.; Wu, Y.; Tian, J.; Zhang, H. Single Image Super-Resolution via Blind Blurring Estimation and Dictionary Learning. Neurocomputing 2016 , 212 , 3–11. [ Google Scholar ] [ CrossRef ]
  • Qi, C.; Song, C.; Xiao, F.; Song, S. Generalization ability of hybrid electric vehicle energy management strategy based on reinforcement learning method. Energy 2022 , 250 , 123826. [ Google Scholar ] [ CrossRef ]
  • Ritto, T.; Beregi, S.; Barton, D. Reinforcement learning and approximate Bayesian computation for model selection and parameter calibration applied to a nonlinear dynamical system. Mech. Syst. Signal Process. 2022 , 181 , 109485. [ Google Scholar ] [ CrossRef ]
  • Hwang, R.; Lee, H.; Hwang, H.J. Option compatible reward inverse reinforcement learning. Pattern Recognit. Lett. 2022 , 154 , 83–89. [ Google Scholar ] [ CrossRef ]
  • Ladosz, P.; Weng, L.; Kim, M.; Oh, H. Exploration in deep reinforcement learning: A survey. Inf. Fusion 2022 , 85 , 1–22. [ Google Scholar ] [ CrossRef ]
  • Khayyat, M.M.; Elrefaei, L.A. Deep reinforcement learning approach for manuscripts image classification and retrieval. Multimed. Tools Appl. 2022 , 81 , 15395–15417. [ Google Scholar ] [ CrossRef ]
  • Nguyen, D.P.; Ho Ba Tho, M.C.; Dao, T.T. A review on deep learning in medical image analysis. Comput. Methods Programs Biomed. 2022 , 221 , 106904. [ Google Scholar ] [ CrossRef ]
  • Laskin, M.; Lee, K.; Stooke, A.; Pinto, L.; Abbeel, P.; Srinivas, A. Reinforcement Learning with Augmented Data. In Proceedings of the 34th Conference on Neural Information Processing Systems 2020, Vancouver, BC, Canada, 6–12 December 2020; pp. 19884–19895. [ Google Scholar ]
  • Li, H.; Xu, H. Deep reinforcement learning for robust emotional classification in facial expression recognition. Knowl.-Based Syst. 2020 , 204 , 106172. [ Google Scholar ] [ CrossRef ]
  • Gomes, G.; Vidal, C.A.; Cavalcante-Neto, J.B.; Nogueira, Y.L. A modeling environment for reinforcement learning in games. Entertain. Comput. 2022 , 43 , 100516. [ Google Scholar ] [ CrossRef ]
  • Georgeon, O.L.; Casado, R.C.; Matignon, L.A. Modeling Biological Agents beyond the Reinforcement-learning Paradigm. Procedia Comput. Sci. 2015 , 71 , 17–22. [ Google Scholar ] [ CrossRef ]
  • Yin, S.; Liu, H. Wind power prediction based on outlier correction, ensemble reinforcement learning, and residual correction. Energy 2022 , 250 , 123857. [ Google Scholar ] [ CrossRef ]
  • Badia, A.P.; Piot, B.; Kapturowski, S.; Sprechmann, P.; Vitvitskyi, A.; Guo, D.; Blundell, C. Agent57: Outperforming the Atari Human Benchmark. arXiv 2020 , arXiv:2003.13350. [ Google Scholar ] [ CrossRef ]
  • Zong, K.; Luo, C. Reinforcement learning based framework for COVID-19 resource allocation. Comput. Ind. Eng. 2022 , 167 , 107960. [ Google Scholar ] [ CrossRef ]
  • Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015 , 518 , 529–533. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ren, J.; Guan, F.; Li, X.; Cao, J.; Li, X. Optimization for image stereo-matching using deep reinforcement learning in rule constraints and parallax estimation. Neural Comput. Appl. 2023 , 1–11. [ Google Scholar ] [ CrossRef ]
  • Morales, E.F.; Murrieta-Cid, R.; Becerra, I.; Esquivel-Basaldua, M.A. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intell. Serv. Robot. 2021 , 14 , 773–805. [ Google Scholar ] [ CrossRef ]
  • Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017 , 60 , 84–90. [ Google Scholar ] [ CrossRef ]
  • Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023 , 12 , 151. [ Google Scholar ] [ CrossRef ]
  • Song, D.; Kim, T.; Lee, Y.; Kim, J. Image-Based Artificial Intelligence Technology for Diagnosing Middle Ear Diseases: A Systematic Review. J. Clin. Med. 2023 , 12 , 5831. [ Google Scholar ] [ CrossRef ]
  • Muñoz-Saavedra, L.; Escobar-Linero, E.; Civit-Masot, J.; Luna-Perejón, F.; Civit, A.; Domínguez-Morales, M. A Robust Ensemble of Convolutional Neural Networks for the Detection of Monkeypox Disease from Skin Images. Sensors 2023 , 23 , 7134. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wang, Y.; Hargreaves, C.A. A Review Study of the Deep Learning Techniques used for the Classification of Chest Radiological Images for COVID-19 Diagnosis. Int. J. Inf. Manag. Data Insights 2022 , 2 , 100100. [ Google Scholar ] [ CrossRef ]
  • Teng, Y.; Pan, D.; Zhao, W. Application of deep learning ultrasound imaging in monitoring bone healing after fracture surgery. J. Radiat. Res. Appl. Sci. 2023 , 16 , 100493. [ Google Scholar ] [ CrossRef ]
  • Zaghari, N.; Fathy, M.; Jameii, S.M.; Sabokrou, M.; Shahverdy, M. Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques. J. Supercomput. 2021 , 77 , 3752–3794. [ Google Scholar ] [ CrossRef ]
  • Farag, W. Cloning Safe Driving Behavior for Self-Driving Cars using Convolutional Neural Networks. Recent Patents Comput. Sci. 2019 , 11 , 120–127. [ Google Scholar ] [ CrossRef ]
  • Agyemang, I.; Zhang, X.; Acheampong, D.; Adjei-Mensah, I.; Kusi, G.; Mawuli, B.C.; Agbley, B.L. Autonomous health assessment of civil infrastructure using deep learning and smart devices. Autom. Constr. 2022 , 141 , 104396. [ Google Scholar ] [ CrossRef ]
  • Zhou, S.; Canchila, C.; Song, W. Deep learning-based crack segmentation for civil infrastructure: Data types, architectures, and benchmarked performance. Autom. Constr. 2023 , 146 , 104678. [ Google Scholar ] [ CrossRef ]
  • Guerrieri, M.; Parla, G. Flexible and stone pavements distress detection and measurement by deep learning and low-cost detection devices. Eng. Fail. Anal. 2022 , 141 , 106714. [ Google Scholar ] [ CrossRef ]
  • Hoang, N.; Nguyen, Q. A novel method for asphalt pavement crack classification based on image processing and machine learning. Eng. Comput. 2019 , 35 , 487–498. [ Google Scholar ] [ CrossRef ]
  • Tabrizi, S.E.; Xiao, K.; Van Griensven Thé, J.; Saad, M.; Farghaly, H.; Yang, S.X.; Gharabaghi, B. Hourly road pavement surface temperature forecasting using deep learning models. J. Hydrol. 2021 , 603 , 126877. [ Google Scholar ] [ CrossRef ]
  • Jardim, S.V.B. Sparse and Robust Signal Reconstruction. Theory Appl. Math. Comput. Sci. 2015 , 5 , 1–19. [ Google Scholar ]
  • Jackulin, C.; Murugavalli, S. A comprehensive review on detection of plant disease using machine learning and deep learning approaches. Meas. Sens. 2022 , 24 , 100441. [ Google Scholar ] [ CrossRef ]
  • Keceli, A.S.; Kaya, A.; Catal, C.; Tekinerdogan, B. Deep learning-based multi-task prediction system for plant disease and species detection. Ecol. Inform. 2022 , 69 , 101679. [ Google Scholar ] [ CrossRef ]
  • Kotwal, J.; Kashyap, D.; Pathan, D. Agricultural plant diseases identification: From traditional approach to deep learning. Mater. Today Proc. 2023 , 80 , 344–356. [ Google Scholar ] [ CrossRef ]
  • Naik, A.; Thaker, H.; Vyas, D. A survey on various image processing techniques and machine learning models to detect, quantify and classify foliar plant disease. Proc. Indian Natl. Sci. Acad. 2021 , 87 , 191–198. [ Google Scholar ] [ CrossRef ]
  • Thaiyalnayaki, K.; Joseph, C. Classification of plant disease using SVM and deep learning. Mater. Today Proc. 2021 , 47 , 468–470. [ Google Scholar ] [ CrossRef ]
  • Carnegie, A.J.; Eslick, H.; Barber, P.; Nagel, M.; Stone, C. Airborne multispectral imagery and deep learning for biosecurity surveillance of invasive forest pests in urban landscapes. Urban For. Urban Green. 2023 , 81 , 127859. [ Google Scholar ] [ CrossRef ]
  • Hadipour-Rokni, R.; Askari Asli-Ardeh, E.; Jahanbakhshi, A.; Esmaili paeen-Afrakoti, I.; Sabzi, S. Intelligent detection of citrus fruit pests using machine vision system and convolutional neural network through transfer learning technique. Comput. Biol. Med. 2023 , 155 , 106611. [ Google Scholar ] [ CrossRef ]
  • Agrawal, P.; Chaudhary, D.; Madaan, V.; Zabrovskiy, A.; Prodan, R.; Kimovski1, D.; Timmerer, C. Automated bank cheque verification using image processing and deep learning methods. Multimed. Tools Appl. 2021 , 80 , 5319–5350. [ Google Scholar ] [ CrossRef ]
  • Gordo, A.; Almazán, J.; Revaud, J.; Larlus, D. Deep Image Retrieval: Learning Global Representations for Image Search. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 241–257. [ Google Scholar ]
  • Jardim, S.; António, J.; Mora, C.; Almeida, A. A Novel Trademark Image Retrieval System Based on Multi-Feature Extraction and Deep Networks. J. Imaging 2022 , 8 , 238. [ Google Scholar ] [ CrossRef ]
  • Lin, K.; Yang, H.F.; Hsiao, J.H.; Chen, C.S. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 27–35. [ Google Scholar ] [ CrossRef ]
  • Andriasyan, V.; Yakimovich, A.; Petkidis, A.; Georgi, F.; Georgi, R.; Puntener, D.; Greber, U. Microscopy deep learning predicts virus infections and reveals mechanics of lytic-infected cells. iScience 2021 , 24 , 102543. [ Google Scholar ] [ CrossRef ]
  • Lüneburg, N.; Reiss, N.; Feldmann, C.; van der Meulen, P.; van de Steeg, M.; Schmidt, T.; Wendl, R.; Jansen, S. Photographic LVAD Driveline Wound Infection Recognition Using Deep Learning. In dHealth 2019—From eHealth to dHealth ; IOS Press: Amsterdam, The Netherlands, 2019; pp. 192–199. [ Google Scholar ] [ CrossRef ]
  • Fink, O.; Wang, Q.; Svensén, M.; Dersin, P.; Lee, W.J.; Ducoffe, M. Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng. Appl. Artif. Intell. 2020 , 92 , 103678. [ Google Scholar ] [ CrossRef ]
  • Ahmed, I.; Ahmad, M.; Jeon, G. Social distance monitoring framework using deep learning architecture to control infection transmission of COVID-19 pandemic. Sustain. Cities Soc. 2021 , 69 , 102777. [ Google Scholar ] [ CrossRef ]
  • Hussain, S.; Yu, Y.; Ayoub, M.; Khan, A.; Rehman, R.; Wahid, J.A.; Hou, W. IoT and Deep Learning Based Approach for Rapid Screening and Face Mask Detection for Infection Spread Control of COVID-19. Appl. Sci. 2021 , 11 , 3495. [ Google Scholar ] [ CrossRef ]
  • Kaur, J.; Kaur, P. Outbreak COVID-19 in Medical Image Processing Using Deep Learning: A State-of-the-Art Review. Arch. Comput. Methods Eng. 2022 , 29 , 2351–2382. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Groen, A.M.; Kraan, R.; Amirkhan, S.F.; Daams, J.G.; Maas, M. A systematic review on the use of explainability in deep learning systems for computer aided diagnosis in radiology: Limited use of explainable AI? Int. J. Autom. Comput. 2022 , 157 , 110592. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hao, D.; Li, Q.; Feng, Q.X.; Qi, L.; Liu, X.S.; Arefan, D.; Zhang, Y.D.; Wu, S. SurvivalCNN: A deep learning-based method for gastric cancer survival prediction using radiological imaging data and clinicopathological variables. Artif. Intell. Med. 2022 , 134 , 102424. [ Google Scholar ] [ CrossRef ]
  • Cui, X.; Zheng, S.; Heuvelmans, M.A.; Du, Y.; Sidorenkov, G.; Fan, S.; Li, Y.; Xie, Y.; Zhu, Z.; Dorrius, M.D.; et al. Performance of a deep learning-based lung nodule detection system as an alternative reader in a Chinese lung cancer screening program. Eur. J. Radiol. 2022 , 146 , 110068. [ Google Scholar ] [ CrossRef ]
  • Liu, L.; Li, C. Comparative study of deep learning models on the images of biopsy specimens for diagnosis of lung cancer treatment. J. Radiat. Res. Appl. Sci. 2023 , 16 , 100555. [ Google Scholar ] [ CrossRef ]
  • Muniz, F.B.; de Freitas Oliveira Baffa, M.; Garcia, S.B.; Bachmann, L.; Felipe, J.C. Histopathological diagnosis of colon cancer using micro-FTIR hyperspectral imaging and deep learning. Comput. Methods Programs Biomed. 2023 , 231 , 107388. [ Google Scholar ] [ CrossRef ]
  • Gomes, S.L.; de S. Rebouças, E.; Neto, E.C.; Papa, J.P.; de Albuquerque, V.H.C.; Filho, P.P.R.; Tavares, J.M.R.S. Embedded real-time speed limit sign recognition using image processing and machine learning techniques. Neural Comput. Appl. 2017 , 28 , 573–584. [ Google Scholar ] [ CrossRef ]
  • Monga, V.; Li, Y.; Eldar, Y.C. Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing. IEEE Signal Process. Mag. 2021 , 38 , 18–44. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Cheng, L.; Li, H.; Gao, J.; Yu, C.; Domel, R.; Yang, Y.; Tang, S.; Liu, W.K. Hierarchical deep-learning neural networks: Finite elements and beyond. Comput. Mech. 2021 , 67 , 207–230. [ Google Scholar ] [ CrossRef ]
  • Salahzadeh, Z.; Rezaei-Hachesu, P.; Gheibi, Y.; Aghamali, A.; Pakzad, H.; Foladlou, S.; Samad-Soltani, T. A mechatronics data collection, image processing, and deep learning platform for clinical posture analysis: A technical note. Phys. Eng. Sci. Med. 2021 , 44 , 901–910. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Singh, P.; Hrisheekesha, P.; Singh, V.K. CBIR-CNN: Content-Based Image Retrieval on Celebrity Data Using Deep Convolution Neural Network. Recent Adv. Comput. Sci. Commun. 2021 , 14 , 257–272. [ Google Scholar ] [ CrossRef ]
  • Varga, D.; Szirányi, T. Fast content-based image retrieval using convolutional neural network and hash function. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 2636–2640. [ Google Scholar ] [ CrossRef ]
  • Latif, A.; Rasheed, A.; Sajid, U.; Ahmed, J.; Ali, N.; Ratyal, N.I.; Zafar, B.; Dar, S.H.; Sajid, M.; Khalil, T. Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review. Math. Probl. Eng. 2019 , 2019 , 9658350. [ Google Scholar ] [ CrossRef ]
  • Rani, P.; Kotwal, S.; Manhas, J.; Sharma, V.; Sharma, S. Machine Learning and Deep Learning Based Computational Approaches in Automatic Microorganisms Image Recognition: Methodologies, Challenges, and Developments. Arch. Comput. Methods Eng. 2022 , 29 , 1801–1837. [ Google Scholar ] [ CrossRef ]
  • Jardim, S.V.B.; Figueiredo, M.A.T. Automatic Analysis of Fetal Echographic Images. Proc. Port. Conf. Pattern Recognit. 2002 , 1 , 1–6. [ Google Scholar ]
  • Jardim, S.V.B.; Figueiredo, M.A.T. Automatic contour estimation in fetal ultrasound images. In Proceedings of the 2003 International Conference on Image Processing 2003, Barcelona, Spain, 14–17 September 2003; Volum 1, pp. 1065–1068. [ Google Scholar ] [ CrossRef ]
  • Devunooru, S.; Alsadoon, A.; Chandana, P.W.C.; Beg, A. Deep learning neural networks for medical image segmentation of brain tumours for diagnosis: A recent review and taxonomy. J. Ambient Intell. Humaniz. Comput. 2021 , 12 , 455–483. [ Google Scholar ] [ CrossRef ]
  • Anaya-Isaza, A.; Mera-Jiménez, L.; Verdugo-Alejo, L.; Sarasti, L. Optimizing MRI-based brain tumor classification and detection using AI: A comparative analysis of neural networks, transfer learning, data augmentation, and the cross-transformer network. Eur. J. Radiol. Open 2023 , 10 , 100484. [ Google Scholar ] [ CrossRef ]
  • Cao, Y.; Kunaprayoon, D.; Xu, J.; Ren, L. AI-assisted clinical decision making (CDM) for dose prescription in radiosurgery of brain metastases using three-path three-dimensional CNN. Clin. Transl. Radiat. Oncol. 2023 , 39 , 100565. [ Google Scholar ] [ CrossRef ]
  • Chakrabarty, N.; Mahajan, A.; Patil, V.; Noronha, V.; Prabhash, K. Imaging of brain metastasis in non-small-cell lung cancer: Indications, protocols, diagnosis, post-therapy imaging, and implications regarding management. Clin. Radiol. 2023 , 78 , 175–186. [ Google Scholar ] [ CrossRef ]
  • Mehrotra, R.; Ansari, M.; Agrawal, R.; Anand, R. A Transfer Learning approach for AI-based classification of brain tumors. Mach. Learn. Appl. 2020 , 2 , 100003. [ Google Scholar ] [ CrossRef ]
  • Drai, M.; Testud, B.; Brun, G.; Hak, J.F.; Scavarda, D.; Girard, N.; Stellmann, J.P. Borrowing strength from adults: Transferability of AI algorithms for paediatric brain and tumour segmentation. Eur. J. Radiol. 2022 , 151 , 110291. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ranjbarzadeh, R.; Caputo, A.; Tirkolaee, E.B.; Jafarzadeh Ghoushchi, S.; Bendechache, M. Brain tumor segmentation of MRI images: A comprehensive review on the application of artificial intelligence tools. Comput. Biol. Med. 2023 , 152 , 106405. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Yedder, H.B.; Cardoen, B.; Hamarneh, G. Deep learning for biomedical image reconstruction: A survey. Artif. Intell. Rev. 2021 , 54 , 215–251. [ Google Scholar ] [ CrossRef ]
  • Manuel Davila Delgado, J.; Oyedele, L. Robotics in construction: A critical review of the reinforcement learning and imitation learning paradigms. Adv. Eng. Inform. 2022 , 54 , 101787. [ Google Scholar ] [ CrossRef ]
  • Íñigo Elguea-Aguinaco; Serrano-Muñoz, A.; Chrysostomou, D.; Inziarte-Hidalgo, I.; Bøgh, S.; Arana-Arexolaleiba, N. A review on reinforcement learning for contact-rich robotic manipulation tasks. Robot. Comput.-Integr. Manuf. 2023 , 81 , 102517. [ Google Scholar ] [ CrossRef ]
  • Ahn, K.H.; Na, M.; Song, J.B. Robotic assembly strategy via reinforcement learning based on force and visual information. Robot. Auton. Syst. 2023 , 164 , 104399. [ Google Scholar ] [ CrossRef ]
  • Jafari, M.; Xu, H.; Carrillo, L.R.G. A biologically-inspired reinforcement learning based intelligent distributed flocking control for Multi-Agent Systems in presence of uncertain system and dynamic environment. IFAC J. Syst. Control 2020 , 13 , 100096. [ Google Scholar ] [ CrossRef ]
  • Wang, X.; Liu, S.; Yu, Y.; Yue, S.; Liu, Y.; Zhang, F.; Lin, Y. Modeling collective motion for fish schooling via multi-agent reinforcement learning. Ecol. Model. 2023 , 477 , 110259. [ Google Scholar ] [ CrossRef ]
  • Jain, D.K.; Dutta, A.K.; Verdú, E.; Alsubai, S.; Sait, A.R.W. An automated hyperparameter tuned deep learning model enabled facial emotion recognition for autonomous vehicle drivers. Image Vis. Comput. 2023 , 133 , 104659. [ Google Scholar ] [ CrossRef ]
  • Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018 , 362 , 1140–1144. [ Google Scholar ] [ CrossRef ]
  • Ueda, M. Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners’ dilemma game. Appl. Math. Comput. 2023 , 444 , 127819. [ Google Scholar ] [ CrossRef ]
  • Wang, X.; Liu, F.; Ma, X. Mixed distortion image enhancement method based on joint of deep residuals learning and reinforcement learning. Signal Image Video Process. 2021 , 15 , 995–1002. [ Google Scholar ] [ CrossRef ]
  • Dai, Y.; Wang, G.; Muhammad, K.; Liu, S. A closed-loop healthcare processing approach based on deep reinforcement learning. Multimed. Tools Appl. 2022 , 81 , 3107–3129. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Valente, J.; António, J.; Mora, C.; Jardim, S. Developments in Image Processing Using Deep Learning and Reinforcement Learning. J. Imaging 2023 , 9 , 207. https://doi.org/10.3390/jimaging9100207

Valente J, António J, Mora C, Jardim S. Developments in Image Processing Using Deep Learning and Reinforcement Learning. Journal of Imaging . 2023; 9(10):207. https://doi.org/10.3390/jimaging9100207

Valente, Jorge, João António, Carlos Mora, and Sandra Jardim. 2023. "Developments in Image Processing Using Deep Learning and Reinforcement Learning" Journal of Imaging 9, no. 10: 207. https://doi.org/10.3390/jimaging9100207

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Subscribe to the PwC Newsletter

Join the community, search results, scikit-image: image processing in python.

1 code implementation • 23 Jul 2014

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications.

Loss Functions for Neural Networks for Image Processing

2 code implementations • 28 Nov 2015

Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems.

Image Processing GNN: Breaking Rigidity in Super-Resolution

1 code implementation • CVPR 2024

Alternatively we leverage the flexibility of graphs and propose the Image Processing GNN (IPG) model to break the rigidity that dominates previous SR methods.

image processing in research paper

Picasso: A Modular Framework for Visualizing the Learning Process of Neural Network Image Classifiers

1 code implementation • 16 May 2017

Picasso is a free open-source (Eclipse Public License) web application written in Python for rendering standard visualizations useful for analyzing convolutional neural networks.

image processing in research paper

MAXIM: Multi-Axis MLP for Image Processing

1 code implementation • CVPR 2022

In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks.

image processing in research paper

Fast Image Processing with Fully-Convolutional Networks

2 code implementations • ICCV 2017

Our approach uses a fully-convolutional network that is trained on input-output pairs that demonstrate the operator's action.

image processing in research paper

Simple Image Signal Processing using Global Context Guidance

1 code implementation • 17 Apr 2024

First, we propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images.

Pre-Trained Image Processing Transformer

6 code implementations • CVPR 2021

To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.

image processing in research paper

In Defense of Classical Image Processing: Fast Depth Completion on the CPU

2 code implementations • 31 Jan 2018

With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from hand crafted classical image processing algorithms.

image processing in research paper

Image Processing Using Multi-Code GAN Prior

1 code implementation • CVPR 2020

Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors.

image processing in research paper

EDITORIAL article

Editorial: current trends in image processing and pattern recognition.

KC Santosh

  • PAMI Research Lab, Computer Science, University of South Dakota, Vermillion, SD, United States

Editorial on the Research Topic Current Trends in Image Processing and Pattern Recognition

Technological advancements in computing multiple opportunities in a wide variety of fields that range from document analysis ( Santosh, 2018 ), biomedical and healthcare informatics ( Santosh et al., 2019 ; Santosh et al., 2021 ; Santosh and Gaur, 2021 ; Santosh and Joshi, 2021 ), and biometrics to intelligent language processing. These applications primarily leverage AI tools and/or techniques, where topics such as image processing, signal and pattern recognition, machine learning and computer vision are considered.

With this theme, we opened a call for papers on Current Trends in Image Processing & Pattern Recognition that exactly followed third International Conference on Recent Trends in Image Processing & Pattern Recognition (RTIP2R), 2020 (URL: http://rtip2r-conference.org ). Our call was not limited to RTIP2R 2020, it was open to all. Altogether, 12 papers were submitted and seven of them were accepted for publication.

In Deshpande et al. , authors addressed the use of global fingerprint features (e.g., ridge flow, frequency, and other interest/key points) for matching. With Convolution Neural Network (CNN) matching model, which they called “Combination of Nearest-Neighbor Arrangement Indexing (CNNAI),” on datasets: FVC2004 and NIST SD27, their highest rank-I identification rate of 84.5% was achieved. Authors claimed that their results can be compared with the state-of-the-art algorithms and their approach was robust to rotation and scale. Similarly, in Deshpande et al. , using the exact same datasets, exact same set of authors addressed the importance of minutiae extraction and matching by taking into low quality latent fingerprint images. Their minutiae extraction technique showed remarkable improvement in their results. As claimed by the authors, their results were comparable to state-of-the-art systems.

In Gornale et al. , authors extracted distinguishing features that were geometrically distorted or transformed by taking Hu’s Invariant Moments into account. With this, authors focused on early detection and gradation of Knee Osteoarthritis, and they claimed that their results were validated by ortho surgeons and rheumatologists.

In Tamilmathi and Chithra , authors introduced a new deep learned quantization-based coding for 3D airborne LiDAR point cloud image. In their experimental results, authors showed that their model compressed an image into constant 16-bits of data and decompressed with approximately 160 dB of PSNR value, 174.46 s execution time with 0.6 s execution speed per instruction. Authors claimed that their method can be compared with previous algorithms/techniques in case we consider the following factors: space and time.

In Tamilmathi and Chithra , authors carefully inspected possible signs of plant leaf diseases. They employed the concept of feature learning and observed the correlation and/or similarity between symptoms that are related to diseases, so their disease identification is possible.

In Das Chagas Silva Araujo et al. , authors proposed a benchmark environment to compare multiple algorithms when one needs to deal with depth reconstruction from two-event based sensors. In their evaluation, a stereo matching algorithm was implemented, and multiple experiments were done with multiple camera settings as well as parameters. Authors claimed that this work could be considered as a benchmark when we consider robust evaluation of the multitude of new techniques under the scope of event-based stereo vision.

In Steffen et al. ; Gornale et al. , authors employed handwritten signature to better understand the behavioral biometric trait for document authentication/verification, such letters, contracts, and wills. They used handcrafter features such as LBP and HOG to extract features from 4,790 signatures so shallow learning can efficiently be applied. Using k-NN, decision tree and support vector machine classifiers, they reported promising performance.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Santosh, KC, Antani, S., Guru, D. S., and Dey, N. (2019). Medical Imaging Artificial Intelligence, Image Recognition, and Machine Learning Techniques . United States: CRC Press . ISBN: 9780429029417. doi:10.1201/9780429029417

CrossRef Full Text | Google Scholar

Santosh, KC, Das, N., and Ghosh, S. (2021). Deep Learning Models for Medical Imaging, Primers in Biomedical Imaging Devices and Systems . United States: Elsevier . eBook ISBN: 9780128236505.

Google Scholar

Santosh, KC (2018). Document Image Analysis - Current Trends and Challenges in Graphics Recognition . United States: Springer . ISBN 978-981-13-2338-6. doi:10.1007/978-981-13-2339-3

Santosh, KC, and Gaur, L. (2021). Artificial Intelligence and Machine Learning in Public Healthcare: Opportunities and Societal Impact . Spain: SpringerBriefs in Computational Intelligence Series . ISBN: 978-981-16-6768-8. doi:10.1007/978-981-16-6768-8

Santosh, KC, and Joshi, A. (2021). COVID-19: Prediction, Decision-Making, and its Impacts, Book Series in Lecture Notes on Data Engineering and Communications Technologies . United States: Springer Nature . ISBN: 978-981-15-9682-7. doi:10.1007/978-981-15-9682-7

Keywords: artificial intelligence, computer vision, machine learning, image processing, signal processing, pattern recocgnition

Citation: Santosh KC (2021) Editorial: Current Trends in Image Processing and Pattern Recognition. Front. Robot. AI 8:785075. doi: 10.3389/frobt.2021.785075

Received: 28 September 2021; Accepted: 06 October 2021; Published: 09 December 2021.

Edited and reviewed by:

Copyright © 2021 Santosh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: KC Santosh, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: research on image processing and vectorization storage based on garage electronic maps.

Abstract: For the purpose of achieving a more precise definition and data analysis of images, this study conducted a research on vectorization and rasterization storage of electronic maps, focusing on a large underground parking garage map. During the research, image processing, vectorization and rasterization storage were performed. The paper proposed a method for the vectorization classification storage of indoor two-dimensional map raster data. This method involves converting raster data into vector data and classifying elements such as parking spaces, pathways, and obstacles based on their coordinate positions with the grid indexing method, thereby facilitating efficient storage and rapid querying of indoor maps. Additionally, interpolation algorithms were employed to extract vector data and convert it into raster data. Navigation testing was conducted to validate the accuracy and reliability of the map model under this method, providing effective technical support for the digital storage and navigation of garage maps.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB)
Cite as: [cs.CV]
  (or [cs.CV] for this version)

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

A Study on Various Image Processing Techniques

International Journal of Emerging Technology and Innovative Engineering Volume 5, Issue 5, May 2019

7 Pages Posted: 4 Jun 2019

Chithra P.L

University of Madras

Date Written: May 7, 2019

The image processing techniques plays vital role on image Acquisition, image pre-processing, Clustering, Segmentation and Classification techniques with different kind of images such as Fruits, Medical, Vehicle and Digital text images etc. In this study the various images to remove unwanted noise and performs enhancement techniques such as contrast limited adaptive histogram equalization, Laplacian and Harr filtering, unsharp masking, sharpening, high boost filtering and color models then the Clustering algorithms are useful for data logically and extract pattern analysis, grouping, decision-making, and machine-learning techniques and Segment the regions using binary, K-means and OTSU segmentation algorithm. It Classifying the images with the help of SVM and K-Nearest Neighbour(KNN) Classifier to produce good results for those images.

Keywords: Image Acquisition, Image preprocessing, Image enhancement, Clustering, Region of Interest (ROI) Image segmentation, Classification

Suggested Citation: Suggested Citation

Chithra P.L (Contact Author)

University of madras ( email ).

Chennai, Tamil Nadu 600 005 India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, computing methodology ejournal.

Subscribe to this fee journal for more curated articles on this topic

Computer Science Education eJournal

Electrical engineering ejournal, cognitive psychology ejournal, computer science negative results ejournal.

image processing in research paper

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Image Processing

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Computer Vision Follow Following
  • Remote Sensing Follow Following
  • Machine Learning Follow Following
  • Pattern Recognition Follow Following
  • Artificial Intelligence Follow Following
  • Computer Science Follow Following
  • Geographic Information Systems (GIS) Follow Following
  • Digital Image Processing Follow Following
  • Image Analysis Follow Following
  • Soft Computing, Image Processing and Robotics Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

SCA-YOLOv4: you only look once with squeeze-and-excitation, coordinate attention and adaptively spatial feature fusion

  • Original Paper
  • Published: 26 June 2024

Cite this article

image processing in research paper

  • Pengfei Liu 1 , 2 &
  • Qing Wang 1 , 2  

15 Accesses

Explore all metrics

How to effectively and efficiently identify multi-scale objects is one of the key challenges in object detection. In order to make the classification and regression of single-stage object detector more accurate, an improved algorithm named you only look once with squeeze-and-excitation, coordinate attention and adaptively spatial feature fusion (SCA-YOLOv4) was proposed based on YOLOv4 algorithm. In this paper, firstly, by studying the different combination models of squeeze-and-excitation (SE) and coordinate attention (CA) module, the optimal detection performance model of SE and CA combination model was determined. On this basis, squeeze-and-excitation coordinate attention (SECA) combination model was embedded in the head module of YOLOv4 algorithm to highlight the useful information of the feature graph and suppress the irrelevant information. Finally, adaptively spatial feature fusion (ASFF) module was embedded between neck and head module to learn how to filter other levels of features in space, so as to retain only useful information in order to correctly integrate different levels of features. The average accuracy of the proposed model was 91.1% and 46.4% on PASCAL VOC2007 and MS COCO2017 datasets respectively, which was 1.9% and 2.9% higher than those of the original YOLOv4 algorithm. Therefore, experiments indicate that the proposed SCA-YOLOv4 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

image processing in research paper

Data availability

Data will be made available on request.

Tian, H., Ding, S., Yu, C., et al.: Research on video abstraction based on object detection and tracking. Computer Science. 43 (11), 297–299 (2016)

Google Scholar  

Liu, Y., Zhou, B., Han, C., et al.: A novel method based on deep learning for aligned fingerprints matching. Appl. Intell. 50 (2), 397–416 (2020)

Article   Google Scholar  

Lu, W., Zhou, Y., Wan, G., et al.: L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving, pp. 6382–6391. IEEE, Long Beach, CA (2019)

Mao, Q., Sun, H., Zuo, L., et al.: Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl. Intell. 50 (10), 3125–3136 (2020)

Wei, X., Wei, D., Suo, D., et al.: Multi-target defect identification for railway track line based on image processing and improved YOLOv3 model. IEEE Access. 8 , 61973–61988 (2020)

Mortensen, E.N., Hongli, D., Shapiro, L.: A SIFT Descriptor with Global Context, pp. 184–190. IEEE, San Diego, CA (2005)

Wang, X., Han, T.X., Yan, S.: An HOG-LBP Human Detector with Partial Occlusion Handling, pp. 32–39. IEEE, Kyoto, Japan (2009)

Hearst, M.A., Dumais, S.T., Osuna, E., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13 (4), 18–28 (1998)

Li, K., Zhou, G., Zhai, J., et al.: Improved PSO_AdaBoost Ensemble Algorithm for Imbalanced Data. Sensors. 19 (6), 1476 (2019)

Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-Supervised Self-Training of Object Detection Models, pp. 29–36. IEEE, Breckenridge, CO (2005)

He, W., Yuan, K., Zou, W.: Self-adaptive threshold edge detection and its implementation in hardware. Syst. Eng Elect. 31 (01), 233–237 (2009)

Agrawal, P., Girshick, R., Malik, J.: Analyzing the Performance of Multilayer Neural Networks for Object Recognition, pp. 329–344. Springer International Publishing Springer, Cham, Zurich, Switzerland (2014)

He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37 (9), 1904–1916 (2015)

Girshick, R.: Fast R-CNN, pp. 1440–1448. IEEE, Santiago, Chile (2015)

Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 (6), 1137–1149 (2017)

He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN, pp. 2980–2988. IEEE, Venice, Italy (2017)

Redmon, J., Divvala, S., Girshick, R., et al.: You Only Look Once: Unified, Real-Time Object Detection, pp. 779–788. IEEE, Las Vegas, NV (2016)

Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger, pp. 6517–6525. IEEE, Honolulu, HI (2017)

Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. (2018). arXiv: 1804.02767

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. (2020). arXiv: 2004.10934

Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single Shot MultiBox Detector, pp. 21–37. Springer International Publishing, Springer, Cham, Amsterdam, The Netherlands (2016)

Wang, C., Liao, H.M., Wu, Y., et al.: CSPNet: A New Backbone that can Enhance Learning Capability of CNN, pp. 1571–1580. IEEE, Seattle, WA (2020)

Liu, S., Qi, L., Qin, H., et al.: Path Aggregation Network for Instance Segmentation, pp. 8759–8768. IEEE, Salt Lake City, UT (2018)

Feng, X., Zhang, Z., Shi, J.: Text sentiment analysis based on convolutional neural networks and attention model. Appl. Res. Comput. 35 (05), 1434–1436 (2018)

Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42 (8), 2011–2023 (2020)

Woo, S., Park, J., Lee, J., et al.: CBAM: Convolutional Block Attention Module, pp. 3–19. Springer International Publishing, Springer, Cham, Munich, Germany (2018)

Fu, J., Liu, J., Tian, H., et al.: Dual Attention Network for Scene Segmentation, pp. 3141–3149. IEEE, Long Beach, CA (2019)

Li, X., Wang, W., Hu, X., et al.: Selective Kernel Networks, pp. 510–519. IEEE, Long Beach, CA (2019)

Wang, Q., Wu, B., Zhu, P., et al.: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks, pp. 11531–11539. IEEE, Seattle, WA (2020)

Hou, Q., Zhou, D., Feng, J.: Coordinate Attention for Efficient Mobile Network Design. (2021). arXiv:2103.02907

Wang, L., Ouyang, W., Wang, X., et al.: Visual Tracking with Fully Convolutional Networks, pp. 3119–3127. IEEE, Santiago, Chile (2015)

Lin, T., Dollár, P., Girshick, R., et al.: Feature Pyramid Networks for Object Detection, pp. 936–944. IEEE, Honolulu, HI (2017)

Ghiasi, G., Lin, T., Le, Q.V.: NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, pp. 7029–7038. IEEE, Long Beach, CA (2019)

Zoph, B., Le, Q.V.: Neural Architecture Search with Reinforcement Learning. (2016). arXiv:1611.01578

Liu, S., Di, H., Wang, Y.: Learning Spatial Fusion for Single-Shot Object Detection. (2019). arXiv:1911.09516

Dai, Y., Gieseke, F., Oehmcke, S., et al.: Attentional Feature Fusion. (2020). arXiv:2009.14082

Su, Q., Tong, Y., He, R., et al.: Improved YoloV3 lightweight multi-target detection algorithm. Comput. Eng. Des. 45 (02), 420–427 (2024)

Dai, J., Li, Y., He, K., et al.: R-FCN: Object Detection via Region-Based Fully Convolutional Networks, pp. 379–387. Barcelona Spain (2016)

Chen, H., Luo, H.: Multi-scale semantic information fusion for object detection. J. Electron. Inf. Technol. 43 (07), 2087–2095 (2021)

Zhao, H., Li, Z., Zhang, T.: Attention based single shot multibox detector. J. Electron. Inf. Technol. 43 (07), 2096–2104 (2021)

Wei, L., Cui, W., Hu, Z., et al.: A single-shot multi-level feature reused neural network for object detection. Vis. Comput. 37 (1), 133–142 (2021)

Liang, Y., Li, J.: Small objects detectionmethod based onmulti-scale non-localattention network. J. Front. Comput. Sci. Technol. 14 (10), 1744–1753 (2020)

Leng, J., Liu, Y.: Single-shot augmentation detector for object detection. Neural Comput. Appl. 33 (8), 3583–3596 (2021)

Yang, Y., Deng, H.: GC-YOLOv3: you only look once with global context block. Electronics 9 (8), 1235 (2020)

Zhang, H., Kang, D., He, H., et al.: APLNet: attention-enhanced progressive learning network. Neurocomputing 371 , 166–176 (2020)

Zhang, X., Gao, Y., Wang, H., et al.: Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection. Int. J. Adv. Rob. Syst. 17 (4), 1738093438 (2020)

Li, X., Liu, C., Dai, S., et al.: Scale specified single shot multibox detector. IET Comput. Vision 14 (2), 59–64 (2020)

Wu, X., Sahoo, D., Zhang, D., et al.: Single-shot bidirectional pyramid networks for high-quality object detection. Neurocomputing 401 , 1–9 (2020)

Huang, Z., Wang, J., Fu, X., et al.: DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. 522 , 241–258 (2020)

Article   MathSciNet   Google Scholar  

Liu, P., Wang, Q., Zhang, H., et al.: A lightweight object detection algorithm for remote sensing images based on attention mechanism and YOLOv5s. Remote Sens. 15 (9), 2429 (2023)

Li, C., Li, L., Jiang, H., et al.: YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. (2022). arXiv:2209.02976

Wang, C., Bochkovskiy, A., Liao, H.M.: YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors, pp. 7464–7475. IEEE, Vancouver, BC (2023)

Kumari, S., Gautam, A., Basak, S., et al.: YOLOv8 Based Deep Learning Method for Potholes Detection, pp. 1–6. IEEE, Gwalior (2023)

Everingham, M., Eslami, S.M.A., Van Gool, L., et al.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111 (1), 98–136 (2015)

Everingham, M., Van Gool, L., Williams, C.K.I., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88 (2), 303–338 (2010)

Lin, T., Maire, M., Belongie, S., et al.: Microsoft COCO: Common Objects in Context, pp. 740–755. Springer International Publishing, Springer, Cham, Zurich, Switzerland (2014)

Download references

This work was supported by the projects of State Key Laboratory of Satellite Navigation System and Equipment Technology Foundation (grant number: CEPNT-2021KF-09) and the National Natural Science Foundation of China (No.42074039).

Author information

Authors and affiliations.

State Key Laboratory of Satellite Navigation System and Equipment Technology, Shijiazhuang, 050081, Hebei, China

Pengfei Liu & Qing Wang

School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China

You can also search for this author in PubMed   Google Scholar

Contributions

Pengfei Liu and Qing Wang conceived and designed the study; Pengfei Liu performed the experiments; Pengfei Liu analyzed the data; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Qing Wang .

Ethics declarations

Conflict of interest.

The authors declare that they have no competing interests.

Consent to participate

Not applicable.

Consent for publication

Ethical approval, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Liu, P., Wang, Q. SCA-YOLOv4: you only look once with squeeze-and-excitation, coordinate attention and adaptively spatial feature fusion. SIViP (2024). https://doi.org/10.1007/s11760-024-03378-9

Download citation

Received : 17 July 2023

Revised : 09 June 2024

Accepted : 12 June 2024

Published : 26 June 2024

DOI : https://doi.org/10.1007/s11760-024-03378-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Object detection
  • Squeeze-and-excitation
  • Coordinate attention
  • Adaptively spatial feature fusion
  • Find a journal
  • Publish with us
  • Track your research

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 27 June 2024

Application of improved and efficient image repair algorithm in rock damage experimental research

  • Mingzhe Xu 1 ,
  • Xianyin Qi 2   nAff1 &
  • Diandong Geng 1  

Scientific Reports volume  14 , Article number:  14849 ( 2024 ) Cite this article

Metrics details

  • Civil engineering
  • Engineering

In the petroleum and coal industries, digital image technology and acoustic emission technology are employed to study rock properties, but both exhibit flaws during data processing. Digital image technology is vulnerable to interference from fractures and scaling, leading to potential loss of image data; while acoustic emission technology is not hindered by these issues, noise from rock destruction can interfere with the electrical signals, causing errors. The monitoring errors of these techniques can undermine the effectiveness of rock damage analysis. To address this issue, this paper focuses on the restoration of image data acquired through digital image technology, leveraging deep learning techniques, and using soft and hard rocks made of similar materials as research subjects, an improved Incremental Transformer image algorithm is employed to repair distorted or missing strain nephograms during uniaxial compression experiments. The concrete implementation entails using a comprehensive training set of strain nephograms derived from digital image technology, fabricating masks for absent image segments, and predicting strain nephograms with full strain detail. Additionally, we adopt deep separable convolutional networks to optimize the algorithm’s operational efficiency. Based on this, the analysis of rock damage is conducted using the repaired strain nephograms, achieving a closer correlation with the actual physical processes of rock damage compared to conventional digital image technology and acoustic emission techniques. The improved incremental Transformer algorithm presented in this paper will contribute to enhancing the efficiency of digital image technology in the realm of rock damage, saving time and money, and offering an innovative approach to traditional rock damage analysis.

Similar content being viewed by others

image processing in research paper

Learning nonlinear operators in latent spaces for real-time predictions of complex dynamics in physical systems

image processing in research paper

Laplace neural operator for solving differential equations

image processing in research paper

Physics-informed machine learning

Introduction.

In rock mining, when the rock mass is subjected to external forces, phenomena such as the development, penetration, and fragmentation of microcracks occur, which can easily lead to collapse and instability. To ensure mining safety and structural stability, it is essential to delve into the damage mechanisms of sedimentary rock layers. In this context, the capabilities of digital image correlation techniques (DIC), which offer dynamic monitoring of the global displacement and strain fields, and acoustic emission (AE) technology, are increasingly favored in the exploration of rock mass damage.

DIC is a widely used non-contact optical digital technology for measuring material deformation. It involves using high-resolution images of a specimen before and after deformation to calculate correlations and obtain displacement fields and strain fields on the specimen surface. DIC strain nephograms allow direct observation of the deformation behavior of material areas, providing a reliable reference for studying rock failure processes and crack development. DIC has yielded numerous theoretical and experimental findings on the mechanisms underlying rock damage and crack development. For example, Xing et al.have conducted studies in the field of crack propagation, utilizing DIC technology to monitor the full-field strain and strain rate fields during dynamic rock compression, revealing that the size of crack propagation is related to the duration of localized strain occurrence 1 . Tang et al. investigated the mechanical properties of rocks by analyzing the damage changes in Compression–Tension (C/T) cycle tests using 3D-DIC. They discussed the characteristics both inside and outside the local damage zone 2 . Additionally, Wang et al. delved into the structure and physical properties of composite rocks. They employed DIC technology to monitor the three-dimensional deformation information of heterogeneous materials under uniaxial compression. Their findings indicated that the damage difference increases with the material difference. DIC strain nephograms not only have application value in observing crack propagation but also play a crucial role in studying the constitutive models of rocks 3 . For instance, Song et al. utilized DIC technology in uniaxial compression tests to calculate damage coefficients based on strain deviations. They selected larger strain points to effectively reflect the evolution of full-field damage 4 . Similarly, Xu et al. conducted generalized stress relaxation experiments using DIC technology to analyze the evolution law of surface strain in rocks. They discovered that the difference in the evolution rate between axial and radial strain values during rheology is positively correlated with overall strain changes 5 . Furthermore, Niu et al. combined DIC and indirect tensile tests to analyze the stress–strain development of rocks under tensile action. They integrated fracture and statistical damage theories to establish a damage constitutive model and determined the stress intensity factor at the crack tip under tensile action 6 . According to the research findings, it has been observed that there are errors in the strain data obtained through DIC technology, resulting in partial deviations between the backtracking curve and the curve obtained by the test machine itself, the primary reasons for these defects are attributed to the heavy reliance of DIC technology on the camera’s frame rate and the integrity of the speckle pattern 7 . During the process of capturing and recording material deformation through the camera, high frame rate cameras are more capable of detecting subtle surface deformations, whereas low frame rate cameras may produce blurred images due to overlong exposure. In the initial stages of the experiment, when the deformation of the specimen is minor, the impact of the camera’s frame rate on the results of DIC strain nephograms is not significant. However, as the experiment progresses and the specimen experiences displacement, missing sections, and other phenomena due to load-induced damage, issues such as insufficient lighting, overexposure, and speckle drop-off may arise during the DIC camera’s capture process. This, in turn, affects the recognition and tracking of feature points, leading to significant errors in the output data. Some scholars have devoted themselves to optimizing experimental protocols to reduce errors 8 , 9 . For instance, in terms of adjusting lighting conditions, Badaloni et al. conducted a comprehensive analysis of the effects of illumination and noise on DIC imaging quality by continuously capturing 50 images using a Canon high-performance camera in an environment with fiber optic light sources. Their study revealed that uncertainties such as noise and lighting conditions can be minimized as much as possible by using high-performance light sources during experiments 8 . However, even with the best settings, image quality can still be affected, highlighting that every optical optimization method inevitably influences the inherent and unavoidable noise of digital cameras. Regarding hardware facilities, Rubino et al. attempted to address the issue of low frame rate cameras failing to capture rock fractures in a timely manner by combining high-speed cameras with DIC technology. The research findings demonstrated that high-speed camera technology can record images at high frame rates and low noise levels 9 . However, the high-speed cameras and high-performance light sources used in the aforementioned studies are several times more expensive than standard equipment, making them prohibitively costly. Most small to medium-sized laboratories are unable to afford such high-end equipment due to limited experimental conditions and financial constraints. Additionally, even with the replacement of high frame rate cameras and matching camera shutters and lenses, errors caused by equipment are still difficult to avoid. Although optimizing experimental protocols helps reduce errors, it cannot eliminate the problem of errors. Furthermore, in the process of DIC data processing, the image undergoes dimension reduction, which leads to the loss of some detailed information or is affected by noise, resulting in a missing output image and affecting the accuracy of experimental results.

AE technology is frequently employed alongside DIC technology to monitor the failure process of rocks. AE technology detects AE phenomena by sensing transient pulse signals and determining the presence of stress wave signals. Each AE signal carries information about internal structure defects and the state of the rock mass, which can be analyzed to determine the displacement field and strain of the rock surface. In comparison to DIC technology, many experiments opt for AE alone or a combination of both technologies. For example, Du et al. conducted a series of experiments including Brazilian indirect tensile tests, three-point bending tests, and uniaxial compression tests to study the types of microcracks generated during the rock fracturing process 10 . By analyzing AE characteristics, they were able to determine the microcrack types and reveal the fracture mode of rocks, as well as the properties of the microcracks. Another study by Dai et al.combined DIC and AE technologies to quantitatively investigate the characteristics of rock fracture process zones. They were able to obtain information such as the locations of crack tips, crack extension lengths, and stress intensity factors through displacement fields and AE signals 11 . Li et al. used AE and DIC technologies to examine the effects of the spatial distribution of voids and joint combinations on the instability phenomenon of rock masses. The DIC results showed that changes in joint angles affected displacement fields and crack types, while the AE results indicated that shear cracks played a dominant role in the ultimate failure of specimens 12 . Gu et al. utilized AE technology to explore the effects of crack closure behavior on rock deformation. They derived damage evolution equations for four brittle rocks using AE ring counts 13 . L.M. studied material crack development and evolution using AE technology. They treated AE ring counts as characteristic parameters and proposed a damage constitutive model based on AE ring counts and strain 14 . When comparing damage strain curves constructed using AE and DIC technologies, it is evident that AE technology can compensate for the impact of missing data in DIC technology on experimental results, thereby providing theoretical references for DIC missing data. However, there are deviations between the curves where DIC data is not missing, indicating that AE technology also encounters issues of data loss or alteration in practical applications. AE technology effectively avoids problems related to missing images, but it can still be influenced by environmental noise, as well as pressure and temperature changes during experiments, resulting in data discrepancies. AE signals are generally weak and vulnerable to external noise, and the substantial noise generated by rock failure and machine operation can easily overshadow AE signals, thus affecting the accuracy of experimental results.

In recent years, with the rise of Artificial intelligence (AI) methods, although the issue of experimental noise is difficult to handle, AI image recognition and restoration provide new ways for improving DIC technology. Currently, AI methods are predominantly used for identifying and predicting rock cracks in various studies. For instance, Sidorenko et al. employed deep convolutional neural networks (CNN) to preprocess noisy Computed Tomography (CT) images and establish high-quality three-dimensional rock models 15 . Li et al. utilized multi-pixel point cloud coordinate regional growth segmentation models to accurately segment deformation areas of complex rock slopes under different rainfall conditions and prevent slope instability 16 . Tariq et al. combined support vector machines (SVM), artificial neural networks (ANN), and adaptive neural fuzzy inference systems (ANFIS) to predict crack growth and mitigate issues such as wellbore instability 17 . Similarly, Robson et al. employed convolutional neural networks to identify images and subsequently utilized object-based image analysis (OBIA) to classify the images, enabling the study of spectral similarity between rocks and surrounding materials 18 . These studies primarily focus on identifying and predicting damaged images during the rock failure process. While they leverage the advantages of AI methods in image processing, their treatment of damaged images is limited to clarity and target segmentation. Therefore, further exploration of the constitutive relationship of rocks is required. The study of rock damage evolution necessitates the utilization of DIC technology. Thus, addressing the existing gaps in DIC images through efficient restoration algorithms is crucial to enhancing the integrity and accuracy of DIC images. This will provide more reliable research data’s for investigating the process of rock damage evolution.

This article focuses on conducting uniaxial compression tests on soft and hard rocks using DIC and AE technology. The aim is to explore the limitations of these two technologies in the process of rock damage evolution. To address the issue of missing strain and displacement fields caused by speckle shedding in DIC technology, an incremental Transformer algorithm (ZITS) based on the Transformer is used to repair the speckle shedding areas in DIC images. After repair, the damaged areas with complete strain field information are extracted, thereby solving the consistency issues between DIC technology and AE technology in the field of rock damage.

Compression test based on DIC and AE

Test preparation and process.

Due to the significant challenges in extracting cores from native rock, this experiment selected mudstone and sandstone from the dense composite strata of the Jimsar Basin as prototypes 19 , 20 , 21 , 22 . Utilizing principles of similarity theory, we fabricated artificial rock-like materials with physical and mechanical properties adhering to similarity laws 23 . The relationship between the original rocks and the model specimens is as follows:

where C σ , C γ , C L respectively represent the Stress ratio, Bulk density ratio, Geometric similarity ratio.

Core specimens were drilled using equipment into standard cylindrical rock specimens measuring 50 mm × 100 mm, as depicted in Fig.  1 . The material mixture comprised cement, quartz sand, silica fumes, water, water-reducing agents, and defoamers, with a similarity ratio of cement:sand:water:defoamer = 1:0.7:0.35:0.003 to simulate the characteristics of soft rock (mudstone). The use of high-purity silica powder combined with a water-reducing agent enhances the strength and density of the rocks. To simulate the characteristics of hard rock (sandstone) 24 , a similar ratio of cement:sand:silica fumes:water:water-reducing agent:defoamer = 1:0.8:0.1:0.3:0.003:0.003 was employed. Experiments conducted with these parameters yielded two types of artificial rocks whose mechanical properties, including elastic modulus, are presented in Table 1 .

figure 1

50 mm × 100 mm cylindrical rock specimens.

In this experiment, a total of five groups of specimens were prepared, with each group consisting of 4 to 5 specimens each of the artificial soft rock and artificial hard rock, amounting to 47 specimens in total. The relevant physical indicators of the specimens were averaged. Following experimental testing and research analysis, the results from the 5 sets of specimens exhibited similar trends.

For the upcoming deep learning algorithm experiment discussed below, a sufficient number of images are required as the foundation of the dataset. Thus, this paper selects the images from the first four groups of specimens as the source for the training set of the dataset, while the images from the fifth group of specimens serve as the source for the validation set 25 . The first four groups of experiments encompass different strain distributions and variation patterns under various conditions, providing rich data foundations for the algorithm. The aim is to further enhance the accuracy and precision of the algorithm in strain measurement and repair through deep learning algorithm training. Simultaneously, the fifth group of specimens is utilized as the validation set to evaluate the effectiveness and generalization capability of the algorithm based on its performance. This ensures that the deep learning algorithm can consistently perform effectively under different conditions. Due to space constraints, this paper analyzes representative specimens from the fifth group, including specimens resembling soft and hard rocks.

The experiment was conducted using the HYAS-1000C rock triaxial testing system with a loading rate of 0.001 mm/s. The testing system is equipped with DIC and AE modules. The DIC hardware includes two cameras with resolutions of 2448 × 2048 pixels and a frame rate of 70 fps, along with lenses of 8 mm, 16 mm, 25 mm, and 35 mm, as depicted in Fig.  2 . This setup enables the simultaneous acquisition of DIC dynamic strain data and AE data during uniaxial testing. The experimental procedure is as follows:

Random speckles were created on the surface of the rock specimens using spray paint, based on the resolution and measurement area of the DIC cameras.

The DIC and AE equipment are cleaned, and the camera undergoes a pre-heating check to ensure it maintains a stable frame rate when in operation.

Before the start of the experiment, the DIC and AE devices are assembled and connected to the system. The DIC measurement head is initialized, and the measurement distance and camera spacing are determined based on the lighting conditions. The cameras are calibrated using a calibration plate. Additionally, an ultrasonic transmitter and receiver are installed above and below the rock specimen surface to ensure full contact with the specimen.

The experimental parameters for AE testing were adjusted according to the current noise level measurement. The DIC acquisition frequency and step size best suited for image acquisition were determined through preliminary tests.

As stress loading commences, AE and DIC testing begin simultaneously. The camera captures images of the rock specimen at a set frequency, with an interval of 1 s per image, while the AE system continuously collects and analyzes the AE signals in real time.

After the experiment concludes, the collected AE data and DIC data are organized and analyzed to determine the deformation behavior of the rock specimen throughout the loading process.

figure 2

Diagram of uniaxial compression test system of rock acting together with 3D-DIC and AE.

Some scholars have conducted research on reducing errors inherent in experiments 8 , 9 , focusing on aspects such as lighting, camera shutter, and lens processing. Drawing upon the findings of these scholars, this paper employs methods of optimizing experimental settings before experimentation to mitigate these issues to some extent. The experiment at hand is optimized through the following approaches:

Adjusting lighting conditions. LED lights are selected as the light source, and their position and direction are adjusted by moving the light source or using brackets to ensure uniform illumination on the specimen. Additionally, light propagation is controlled using a light shield to prevent light leakage. Through pre-experimentation to compare optimal lighting conditions, ensuring images do not suffer from resolution differences or occlusions. A comparison of images before and after the change in lighting conditions is shown in Fig.  3 Before and after light conditions

Speckle creation. During speckle creation, contrast is adjusted to approximately 50% based on the most suitable lighting conditions, and particle size is controlled between 3–6 pixels to ensure accurate recognition of speckles by the camera. To capture DIC strain nephograms more effectively, the camera shutter speed is set to a maximum of 200 ms, based on the optimal speckle size of 3–5 pixels for the DIC system and the laboratory environment, to avoid issues such as image blurring due to overexposure or incomplete exposure.

Peripheral assistance. To ensure accurate capture of strain information before and after specimen failure by the DIC camera, a phase-locked loop system is utilized to adjust the DIC camera. Since the DIC camera lacks high-frequency shooting capabilities, the phase-locked loop system is employed for signal frequency division and delay processing, achieving stable high-frequency measurements by low-frequency devices.

Speckle patch reconstruction. Speckle information is systematically outputted as DIC strain nephograms. Due to various reasons, data loss or invalid areas may result in voids in the DIC strain nephograms. Therefore, this paper employs interpolation to fill voids as an experiment optimization technique. Interpolation is a post-processing step that uses functions to calculate neighboring pixel values and fill missing information. A comparison of the reconstruction before and after is shown in Fig.  4 . Interpolation and hole filling reconstruction before and after comparison

figure 3

Before and after light conditions.

figure 4

Interpolation and hole filling reconstruction before and after comparison.

Rock damage analysis based on compression test

Analysis of ae ringing count-time curve.

The AE technique monitors the deformation process of rocks during uniaxial compression by measuring the number of electrical signals exceeding a threshold, known as AE events 26 . In Fig.  5 , the graph illustrates the changes in AE counts and stress over time for both hard rock and soft rock specimens.

figure 5

The variations of AE hit and loads versus time in different rock specimens: ( a ) hard rock, ( b ) soft rock.

As shown in Fig.  5 , the overall curve for hard rock and soft rock specimens exhibits similar changes. Initially, during the compaction and elastic stages, hard rocks, with their lower porosity and incomplete closure of internal pores, experience greater local deformation and damage accumulation compared to soft rock specimens. Consequently, hard rocks exhibit a higher frequency of AE events, resulting in small fluctuations in the ring-down count curve. As both specimens enter the plastic stage, surface cracks begin to form, leading to a slight increase in the AE ring-down count. During the post-peak failure stage, cracks in hard rock specimens expand and penetrate, causing extensive detachment of the specimen surface and a significant increase in AE events. The ring-down count curve reaches its peak when the specimen is destroyed, followed by a gradual decrease and convergence to a straight line. In contrast, soft rock specimens only exhibit one distinct ring-down peak during the complete failure stage.

After conducting extensive research, scholars have discovered a strong correlation between the AE ring-down count and the progression of internal structural defects in rocks. L.M proposed a damage index defined by the change in ring-down count in cross Sections 14 , as follows:

where N d is the cumulative AE ring-down count when micro-defects in the bearing section reach a certain level, where N 0 is the cumulative AE ring-down count when the bearing section is completely destroyed.

As shown in Fig.  6 , during the initial compaction stage, the damage progression in both types of rocks is relatively gradual. In the plastic phase, which corresponds to the development period of microcracks, the curve of rock damage accelerates from a slow to a fast rise. In the post-peak failure stage, the damage values quickly escalate to reach 1. However, during the actual compression process of the rocks, there is virtually no change in damage during the initial compaction, but the rock damage estimated based on AE statistics may show an increase of 2 to 3% over the actual damage values due to the inherent noise from the operation of the testing machine. Additionally, during the plastic and post-peak failure stages, factors such as extensive specimen damage, intense noise from failure, and sensor movement can contribute to an increased ringing count. Consequently, AE-based statistics may overestimate rock damage values compared to the actual values.

figure 6

AE damage-strain curves of two specimens.

Damage analysis based on DIC strain nephogram

The DIC strain nephogram offers comprehensive strain distributions across the specimen surface by sampling points 27 . By comparing and analyzing the DIC strain nephogram at the intersection points of different loading stages, it becomes possible to effectively observe and understand the damage and failure characteristics of the specimen. Figure  7 showcases the rock failure images and instantaneous DIC strain nephograms for both hard rock and soft rock specimens under uniaxial compression at various loading stages.

figure 7

Failure process diagram of different rock specimens and corresponding DIC strain nephogram: ( a ) hard rock, ( b ) soft rock.

According to Fig.  7 a, During the initial compaction stage, the strain distribution is uniform, and the species remains relatively unchanged, the local maximum strain is only 0.325%. Subsequently, as the hard rock underwent densification and elastic stages, its high compressive strength and load-bearing capacity led to minimal surface alterations 28 , with only a slight amount of strain visible on the strain nephogram. With increasing load intensity, small cracks started appearing on the specimen surface during the plastic deformation stage 29 , 30 , gradually expanding until they penetrated the specimen from top to bottom, resulting in fragment spalling. At this juncture, the local maximum strain peaked at 6.084%. Once the peak strength is reached, the stress–strain curve drops rapidly, and the specimen enters the failure stage. The cracks within the specimen develop quickly and intersect to create a visible macroscopic fracture surface. In various force areas of the strain nephogram, distinct boundaries can be observed, highlighting the obvious stress concentration characteristics with a maximum local strain of 6.770%. Ultimately, the specimen underwent complete failure, forming macroscopic tensile cracks.

In Fig.  7 b, the strain trend of soft rocks is similar to that of hard rocks. However, the failure process of the hard rock curve is longer during the post-peak failure stage, indicating certain ductile characteristics. Based on the strain nephogram, the local maximum strain on the surface of the soft rock is 13.147%, which is 7.063% higher than that of the hard rock, indicating that soft rock has poor compressive strength. When the deformation on the rock surface is small, the rock specimen has already reached the ultimate compressive strength and failure occurs. The specimen mainly fails in shear, with some local tensile failure 31 , 32 .

The comprehensive analysis indicates that while DIC strain nephograms can somewhat intuitively reflect the process of rock specimens under loading, from the initiation and expansion of local cracks to their coalescence, there are inevitable shortcomings when using DIC strain nephograms to detect the progression from initial loading to failure in rocks, due to the nature of their measurement. DIC strain nephograms are generated by tracking the deformation of speckle patterns on the object’s surface and calculating changes in the grayscale values of speckle domains, to obtain the deformation and strain data of the tested surface. Consequently, their use in rock diagnostics during the entire loading to failure process can result in the following deficiencies: (1) Blurred boundaries: As shown in Fig.  7 , when the rock is in the initial compaction or elastic phase, the deformation and strain produced by the rock surface are inherently small. When influenced by objective factors such as ambient lighting, camera resolution, and speckle domain deviations 33 , the strain nephograms derived from DIC algorithms are prone to blurred boundaries and mixed areas. (2) Loss of information: As shown in Figs. 8 and 9 , the formation of DIC strain nephograms heavily depends on the speckle patterns on the rock surface 34 . The loading forces can create various cracks and even cause delamination of the surface, thereby damaging the speckle domains and leading to loss of information from these regions. Therefore, to ensure the completeness of the detection results and the accuracy of the observed strain patterns, the DIC technique for rock diagnostics still requires further refinement.

figure 8

The process diagram of speckle shedding when rock specimen is damaged.

figure 9

Defect diagram of DIC and AE system in uniaxial compression test of rock.

Typically, the DIC technique is employed to analyze the quantitative relationship between damage and strain in rocks during uniaxial compression tests. Essentially, this involves using DIC technology to acquire global strain data of the rock throughout the loading to failure process. Subsequently, this data is converted through the relationship between damage and strain to calculate the final damage curve of the rock. Song identified in uniaxial rock experiments that the initial 5 to 10% of high-strain points are closely related to the evolution of damage concentration 4 . They also found that the average value of all strain points reflects that most measurement points are in a state of uniform deformation. Based on these findings, it is proposed to evaluate the extent of specimen damage by the difference in the average values of these two metrics, denoted as \(\overline{\varepsilon }\) , that is to say:

where \(\frac{1}{M}\sum\limits_{i = 1}^{M} {\left( {\varepsilon_{x} } \right)_{i} }\) is the average value of the first M larger strain points, where \(\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {\varepsilon_{x} } \right)_{i} }\) is the average value of strains across all measuring points.

On this basis, the damage severity factor D was defined as follows:

where \(\overline{\varepsilon }_{\max }\) is the maximum value of ε , which is the critical failure value.

As shown in Fig.  10 , damage mutation points appear at 1.0% strain for hard rocks and at 1.4% strain for soft rocks, after which the damage value rapidly rises to 1. This occurrence is because although DIC strain nephograms allow for the observation of the dynamic failure process of the specimens, actual specimen failure can lead to speckle swelling or detachment, resulting in the loss or inaccuracy of some strain data. Consequently, the corresponding global strain averages obtained through DIC technology decrease, resulting in damage values on the curve being smaller than the actual compression process of the rock. Therefore, obtaining valid full-field strain information across the entire compression process of rock is one of the key technical challenges to be addressed in the subsequent sections of this study.

figure 10

DIC damage-strain curves of different rock specimens.

Difference analysis of rock damage curves between DIC and AE techniques

As depicted in Fig.  11 , notable differences exist between two monitoring technologies, DIC and AE, in the analysis of rock damage. Firstly, there is a discrepancy in the onset point of damage between the two curves. In actual conditions, rocks generally show no damage variation during the initial consolidation and elastic stages. However, as Fig.  11 suggests, the AE-based damage curve indicates some damage occurring at this stage. This could be due to the influence of noise generated by the testing machine during the monitoring of AE signals, leading to irrelevant signals passing through the AE counting threshold, thereby causing monitoring errors and resulting in damage generation during curve fitting. On the other hand, the DIC-obtained curve shows damage onset occurring much later during or after the plastic phase transition. This delay is attributed to the DIC software losing edge data through dimension reduction processing, shifting the starting point of damage. Secondly, there are distinct differences in the transition trends of the two curves as they enter the plastic phase and post-peak failure stages. In the damage analysis based on DIC technology, when the specimen enters the plastic phase under uniaxial compression conditions, the initiation of microcracks and the subsequent expansion of surface cracks cause speckle division. When DIC technology maps the strains from the speckles, the cracked areas appear as voids in the strain nephograms, resulting in data loss and an inability for the DIC damage curve to correctly reflect the inflection at this phase. In contrast, with the AE-based damage analysis, when specimens suffer a breakage due to stress during this phase, the noise generated interferes with the propagation of AE signals, leading to a significant deviation between the AE and DIC damage curves. In the post-peak failure phase, the rock suffers extensive damage, with a portion of the speckles peeling off with the outer surface, resulting in widespread data loss and a severely affected strain dataset. This causes the DIC-based damage curve to ascend rapidly during this stage. Meanwhile, the AE-based damage curve is prone to misjudgments under high-intensity environmental noise, mistaking noise for AE signals and thus increasing ringing counts. As a result, the damage values obtained from AE measurements are higher than the actual damage values of the rocks.

figure 11

Comparison of damage-strain curves in different rock specimens based on DIC and AE techniques: ( a ) hard rock, ( b ) soft rock.

In summary, it can be concluded that damage analysis based on AE technology relies heavily on precise ringing count measurements, while the analysis based on DIC technology depends on the full-process speckle pattern strain nephogram. During the uniaxial compression test of rocks, AE technology can detect the initiation and propagation of microcracks in real-time. However, the operational noise from the testing machine can either amplify or diminish the AE signals, as shown in Fig.  12 . AE ringing count refers to the number of oscillations after the waveform exceeds a threshold voltage, and high-intensity environmental noise can lead to false counts from irrelevant signals, diminishing the precision of the data. Furthermore, during the experimental process, the detachment of AE sensors can also result in data anomalies. The sensors, which are attached to the surface of the specimen to capture AE signals and convert them into electrical signals for software analysis, can detach, shift, or be obstructed during rock failure, leading to challenges in signal collection and localization, and thus affecting the assessment of damage severity.

figure 12

The process diagram of acoustic emission signal changes under the influence of noise.

The main issue with DIC technology in rock compression damage analysis is its high reliance on the full-process coordinate changes of speckle points; the loss of these points leads to missing global strain field information, also negatively impacting damage analysis. Current solutions to address noise interference in AE technology include using sensors with high sensitivity and low noise characteristics, positioning sensors away from noise sources with appropriate shielding measures, and employing filtering and noise reduction algorithms during signal processing to extract useful AE signals and suppress noise. Sensor improvement plays a significant role in noise reduction but is hindered by the high cost of precision sensors, which limits further development of AE technology. Meanwhile, faced with the issue of missing data in DIC speckle strain nephograms, numerous scholars have used artificial intelligence algorithms to infer missing area information through the diffusion of boundary information or generate pixel blocks based on existing image information to supplement the image data 35 , 36 , 37 , 38 , . Compared to the sensor improvement approach in AE, the DIC image enhancement techniques based on algorithmic methods offer a better cost–benefit ratio. Therefore, this article addresses the issue of damage analysis errors arising from both monitoring technologies by employing image restoration methods on DIC strain nephograms. Herein, a Transformer-based ZITS algorithm is proposed to repair and predict missing speckle areas in DIC strain nephograms, the objective is to recover the lost strain information of monitoring points, segment, and distinguish between different strains. On this basis, by recognizing the damaged pixels in the repaired DIC images as areas of rock damage and utilizing a new damage quantification model, the study compares it against two traditional damage models to assess the impact of data loss issues on research outcomes.

An improved image crack recognition method based on deep learning

Before deep learning became prevalent, image restoration relied on traditional learning algorithms, such as patch-based restoration algorithms and pixel diffusion-based image restoration algorithms. These methods suffered from low task accuracy and poor handling of detail and texture 38 , 39 . With the rise of deep learning, a wealth of algorithmic research has provided opportunities for breakthroughs in the challenging field of image restoration. Numerous scholars have found that CNN and autoencoders hold unique advantages in image processing. Among them 40 , 41 , 42 , the U-Net algorithm, which is based on a CNN architecture, and the Transformer algorithm, rooted in an autoencoder structure, are currently the most widely used models for image processing tasks. Chapter 3.1 will analyze and compare the strengths and weaknesses of these two image restoration algorithms.

Comparison of common repair algorithms

The U-Net algorithm is based on convolutional neural networks, which consist of convolutional layers and pooling layers 43 , as shown in Fig.  13 . Convolutional layers process image pixels into feature values, which are then classified into corresponding features through matrix transformations. The pooling layer performs screening by selecting important feature values from the output of the convolutional layer, resulting in the output of a target feature image. The U-Net algorithm primarily utilizes pixel reconstruction for image restoration, offering benefits such as image invariance, image classification, and segmentation. However, the U-Net algorithm is time-consuming in image processing and requires a large dataset for label creation. The quality of experimental results is heavily reliant on the dataset used. Moreover, the restoration outcomes are significantly impacted by the receptive field, which denotes the area in the input image that each layer’s feature map corresponds to during pixel mapping. Consequently, the size of the convolutional layer limits the receptive field, and the restored image may not accurately reflect the output features.

figure 13

Convolution with Max pooling graph.

The Transformer algorithm consists of a left half with six layers of encoders and a right half with six layers of decoders. As shown in Fig.  14 , the encoder is responsible for extracting features, while the decoder is responsible for restoring the image information 44 . After these two stages, the data is classified using the Softmax function to generate the final restoration result. The Softmax function is defined by the equation provided below:

where Z i is the input vector, where K is the number of classifier classes.

figure 14

Transformer structure diagram.

The image information is processed through the encoder and decoder, where the calculation part mainly consists of two components: multi-head attention layer and feed-forward neural network. These two components are connected end-to-end, with the feature processing starting with the multi-head attention layer. The multi-head attention layer is composed of multiple self-attention mechanisms. The self-attention mechanism involves three weight matrices: Q (query vector), K (key vector), and V (value vector), which are used to pre-calculate the attention weights at each position for the input data. Subsequently, the weighted sum is computed to generate features at different levels. The formula for the self-attention mechanism is as follows:

where d k is the column size of the matrices Q and K , which corresponds to the vector dimension.

Feed-forward neural networks are a type of one-way, multi-layer network structure, where each layer of neurons can only receive signals from the previous layer and output to the next layer. However, the output of the feature by the multi-head attention layer undergoes nonlinear processing in the feed-forward neural network, resulting in more accurate feature values for subsequent outputs. The framework adopts multiple self-attention mechanisms, each computing features at different levels of the image. After computation, the features from different levels are assembled and concatenated, and then input into the feed-forward neural network. The feature values after each layer output undergo layer normalization and residual connections. The data is divided into different samples through layer normalization and then processed in the residual connections to address the problem of gradient vanishing. The goal of each layer is to reduce the occurrence of data loss during transmission and filter out redundant and invalid feature values.

Compared to the U-Net algorithm, the Transformer algorithm demonstrates higher accuracy in processing feature values. However, the Transformer algorithm also has certain drawbacks, such as poor interpretability and limited long-distance learning capabilities. This is due to the attention mechanism’s diminishing ability to capture information as the feature sequence grows, resulting in incomplete feature convergence in large-scale image restoration. To overcome the limitations of both algorithms, some scholars have proposed an improved Transformer algorithm. Experimental results show that this approach not only optimizes the time-consuming training set creation process but also resolves the blurring issue in image restoration when dealing with large-scale image loss.

Incremental Transformer algorithm

Incremental transformer algorithm principle.

This article presents ZITSthat combines convolutional neural networks and Transformer algorithms 45 . The algorithm aims to enhance the generalization ability by improving image preprocessing, optimizing the overall structure to eliminate long-term feature dependencies, and addressing the issue of incomplete image restoration. Image preprocessing plays a crucial role in eliminating irrelevant information and restoring useful real information in the image. The quality and integrity of the output image depend on the processing of information. Different image preprocessing methods primarily aim to better align the original images with the model’s recognition and computation, such as resizing images, image segmentation, and feature recognition. While commonly used restoration model algorithms typically apply simple techniques like cropping and flipping for image preprocessing, in this paper image preprocessing techniques such as bilinear interpolation and mask generation are employed.

Bilinear interpolation is a texture mapping method that performs linear interpolation in two directions to map pixel values. It calculates the weighted average value of the four nearest pixels to the mapping point as the pixel attribute value. Through two rounds of image mapping, the image size changes while preserving the original pixel values, leading to a more realistic image post-processing.

Before applying image input algorithms, standardizing image sizes is an important preparatory task. The purpose of uniform size is to effectively eliminate pixel differences caused by varying image sizes, simplify image processing and algorithmic computation processes, facilitate the input of training sets into algorithms for training, and reduce the burden on computer resources. Van et al. mentioned that 480 × 480 pixel-sized images exhibit unique advantages in the field of image processing 46 . This square size not only helps maintain feature consistency but also preserves sufficient image details while maintaining lower computational complexity. Leveraging the benefits of bilinear interpolation, which preserves frame integrity after image size transformation, this paper utilizes bilinear interpolation to resize images to 480 × 480 dimensions, facilitating subsequent mask processing.

As shown in Fig.  15 , a mask is a single-channel matrix that controls the region of image processing. In the mask, values are divided into zero and non-zero. Non-zero values indicate areas where pixel points from the original image are copied, represented as a white area in the mask. When the mask value is zero, no copying is performed, represented as a black area in the mask. After copying the pixel values into the mask, it is inputted into the model along with the original image. The model processes the masked region while preserving the areas outside the mask. Therefore, the mask specifies the restoration area of the image, significantly improving image processing time and accuracy.

figure 15

Make mask process diagram.

With the advancement of deep learning, significant progress has been made in image restoration. However, restoring images with realistic textures and coherent structures remains a challenging issue. Convolutional neural networks have limited receptive fields and are only able to handle certain regular textures, making them less effective in overall structural restoration, particularly in edge and line recovery. The U-Net algorithm, which is based on convolutional neural networks, has shown great performance in image segmentation. However, it has been observed that there are issues such as texture disappearance and texture discontinuity in the restored images. On the other hand, the Transformer algorithm has addressed these texture-related deficiencies, but it faces challenges in integrating the overall structure of the image.

This paper introduces an enhanced model based on the Transformer architecture, as shown in Fig.  16 . The model comprises three primary components. The first component is the Transformer layer, which processes the image using upsampling and downsampling convolutions. Downsampling reduces the image size and generates thumbnail images 47 , while upsampling enlarges the image to display higher resolution. The second component consists of gate-convolution (GateConv) and Recursive Networks (ReNet). GateConv is an improved network for CNN. Unlike traditional convolutions that treat all input pixels in each channel as valid, GateConv addresses this limitation by adding normalization layers and a multilayer perceptron (a type of feedforward neural network). It employs a Sigmoid function for pixel classification, allowing for a learnable dynamic selection mechanism. The Sigmoid function is represented as follows:

figure 16

Make Transformer algorithm structure diagram of mask process diagram increment.

The ReNet network is employed for target recognition, composed of ReLU activation functions and convolutional layers. The activation function plays a crucial role in filtering out unnecessary feature values. The ReLU function can be represented by the following Equation 48 :

Compared to the Transformer algorithm, the second section of this approach eliminates unnecessary feature values. This enables better repair of irregular surfaces, enhances feature highlighting, and ensures a more complete transfer of the required features to the third section.

The third module consists of convolutional layers and Fast Fourier convolution (FFC) layers. The FFC layer is designed to expand the receptive field. It is composed of two interconnected paths: one for partial ordinary convolution and the other for flourier spectral convolution in the spectral domain. These two pathways interact and complement each other by capturing information from different receptive fields. However, the output results of the Transformer algorithm may suffer from resolution loss or partial texture blur. The third module addresses the resolution problem in Transformer algorithm output.

After preprocessing, the original image and the mask are inputted into the model. The algorithm performs feature extraction based on the selected region indicated by the mask. The first module is responsible for the overall structural reconstruction. The output features then pass through the second module for line and edge restoration, and finally, the third module restores textures. The result is the repaired image as the output.

Image repair results

This study selected a total of 1213 DIC strain nephograms from various stages of damage as the basic dataset. Two sets of algorithms were used to create different training sets. The U-Net algorithm group requires classification training using labels, so during the creation process, 16 different color strains were labeled using LabelMe software to produce 16 labels. The ZITS algorithm group annotated the missing areas using LabelMe software and then generated masks using Open-Source Computer Vision Library (OpenCV). The NVIDIA Tesla A100 graphics card was chosen as the experimental server, and two sets of experiments were conducted. The U-Net algorithm group took approximately 12 h to train, while the algorithm group in this study took approximately 6 h.

During the rock destruction stage after peak load, image defects can occur due to factors such as speckle detachment 49 . Due to space constraints, this article focuses on comparing two models of DIC strain nephogram during the post-peak failure stage. Figure  17 shows that the restored image by the U-Net algorithm does not completely repair the original missing part, resulting in some loss of information. However, the algorithm proposed in this article successfully repairs the missing area in the image without leaving any gaps. Additionally, notable differences are observed in terms of image texture and structure. In terms of image texture, the U-Net algorithm disregarded the division of grids in the original strain nephogram. DIC strain nephograms primarily rely on the displacement changes of monitoring points (i.e., speckle points in Fig.  2 ) to obtain strain data 7 . The missing grid in the U-Net model affects subsequent experimental observations as it cannot predict the area of lost strain. On the other hand, the algorithm proposed in this paper not only restores the missing region but also maintains the characteristics of the original image and ensures the continuity of the grid in the restored region. This demonstrates the superiority of the Transformer structure in handling complex textures. Regarding the structure, the U-Net algorithm’s restoration image shows misalignment and severe mismatches at the boundaries between the repair and intact areas. In contrast, the algorithm in this article achieves a smooth grid connection between the missing and non-missing areas, with consistent colors at the intersection point of image blocks. Therefore, it can be concluded that the algorithm in this article effectively addresses the discontinuity problem between lines and edges, making it superior to the U-Net algorithm.

figure 17

DIC strain nephograms of different rocks are compared with repaired images obtained by two algorithms: ( a ) hard rock, ( b ) soft rock.

As shown in Fig.  18 , the predicted images of the proposed algorithm at different stages of elasticity, plasticity, and post-peak failure are compared with the original images for two types of rocks. It can be observed that each DIC strain nephogram in the hard rock contains 702 grids, while each DIC strain nephogram in the soft rock contains 455 grids. By examining the image pixels, it was found that each grid in the hard rock contains 48 pixels, while each grid in the soft rock contains 132 pixels. During the experimental process, grid loss and dimensionality reduction of displacement data at monitoring points are the main causes of strain data distortion, which mainly occurs at the edges of the images, with a data distortion rate between 8 and 10%.

figure 18

The DIC strain nephograms of different rocks were repaired by ZITS algorithm and compared with the original images: ( a ) hard rock, ( b ) soft rock.

In Fig.  18 a, in the original DIC strain nephograms of hard rock at the stages of elasticity, plasticity, and post-peak failure, there are 702 grids, among which 177 grids are distorted. After being processed by the proposed algorithm, 177 grids and 8496 pixels are repaired, improving the information integrity by 8.41%. In Fig.  18 b, in the original DIC strain nephograms of soft rock at the stages of elasticity, plasticity, and post-peak failure, there are 455 grids, among which 129 grids are distorted. After being processed by the proposed algorithm, 129 grids and 17,028 pixels are repaired, improving the information integrity by 9.45%. The repaired DIC strain nephograms provide can provide more complete and accurate global strain data, which provides reliable experimental data support for the establishment of subsequent soft and hard rock constitutive equations.

Improved incremental transformer algorithm

While using the ZITS algorithm for image restoration of DIC strain nephograms, two issues were identified. Firstly, the quality of image restoration is closely related to the content in the dataset, requiring a sufficient number of DIC strain nephograms to support dataset creation. Although the process of creating masks is faster compared to annotation with LabelMe software, it still poses some obstacles during practical use. Secondly, the ZITS algorithm can only be used on high-performance servers, which are not only expensive but also scarce. Therefore, it is extremely challenging to use the ZITS algorithm for DIC strain nephogram restoration either in field trials or for long-term use in the laboratory. To address these two issues, the following sections will focus on lightweight algorithm processing and the use of suitable datasets to save time and computational resources.

Deep separable convolutional networks

In the field of deep learning, most algorithms are designed under the premise of handling complex computations, involving various feature selection processes that increase computational load. Therefore, when utilizing high-performance algorithms to process secondary operations, lightweight processing is employed to reduce the computational complexity and parameters while maintaining system functionality. The purpose is to lower the algorithm’s demands on computational resources, reduce processing time, and investigate the feasibility of deployment on mobile devices. As the DIC strain nephograms requiring restoration in this study do not exhibit complex textures and structures, there are numerous unnecessary computations during the restoration process, leading to prolonged algorithm training time. To address this issue, this study implemented lightweight improvements to the ZITS algorithm by replacing the computationally intense convolutional neural network modules with depthwise separable convolutional network modules.

Depth-wise separable convolutional networks are a variant of convolutional neural networks. They split convolutional layers into depthwise convolutional layers and pointwise convolutional layers, reducing computational and parameter overhead to achieve lightweight designs. As shown in Figs. 19 and 20 , the depth-wise convolutional layer applies separate convolutions to the R (red), G (green), and B (blue) channels, generating different weight values for each channel. These values are then processed by the point-wise convolutional layer, which utilizes individual 1 × 3 convolutional kernels to reduce model size and perform channel-wise pooling for dimensionality reduction, effectively reducing parameters and computational overhead. Once the three weight values have passed through the 1 × 3 convolutional kernel, the channel pixel values are fused, resulting in weight values of the same size as a regular convolution. Whereas traditional neural networks consider both channels and regions simultaneously in weight calculations, depth separable convolutions prioritize regions before channels. They individually compute differences across channels, reducing significant computational overhead while cutting some redundant calculations. This approach shortens training time. The following sections will compare the models based on training time and evaluation metrics.

figure 19

Deep convolutional layer feature processing process diagram.

figure 20

Feature process diagram through point convolution layer.

Regarding image accuracy analysis, the ZITS algorithm utilizes three evaluation metrics, Precision , Accuracy , and F1-Score , based on a confusion matrix to assess model training 50 , 51 . A confusion matrix is an error matrix commonly used for visualizing the performance of deep learning models 52 . As illustrated in Fig.  21 , the matrix is a special kind of cross-tabulation consisting of two-dimensional columns representing true and predicted values. Each dimension corresponds to the combinations of classes to be identified, visualized in tabular form, with a decision threshold set between predicted and true values.

figure 21

Confusion matrix.

The Precision metric is calculated based on the confusion matrix and represents the ratio of correctly classified positive samples to the total number of samples classified as positive by the classifier. It is a statistical measure for a subset of samples. The number of samples refers to the actual data points involved in the study or test, describing the features or states of a phenomenon at a specific time point or period, encapsulating specific events with measurement and observation values. The formula for the Precision metric is as follows:

where TP is the number of correctly classified positive samples; FP is the number of samples predicted as positive.

The Accuracy metric calculates the accuracy, which represents the proportion of samples correctly detected among all samples. The Accuracy metric performs well in classification problems, and the quality of the classifier affects the accuracy. It is used to evaluate the performance of the model. The formula for the Accuracy metric is as follows:

where TN is the number of true negative samples, which are predicted as negative and are actually negative; where FN is the number of false negative samples, which are actually positive but mistakenly labeled as negative.

The F1-score function is the harmonic mean of Accuracy and Precision functions, considering both functions equally important in performance evaluation. The overall increase or decrease in the result comes from the combined effects of these two functions, making it a common evaluation metric for deep learning models. The formula for the F1-score function is as follows:

where Recall is the recalling rate, indicating the proportion of correctly predicted positive samples out of all actual positive samples. The Recall function is shown as follows:

The Loss function was ultimately chosen as the model completion indicator to evaluate the gap between the model’s predicted values and the actual values. The Loss function can determine whether the model can quickly find the optimal solution based on the speed of convergence of the curve. The faster the convergence speed, the faster the model can find the optimal solution. Conversely, a slow convergence speed indicates that the model’s training time is too long, leading to inadequate predictions. In this study, the cross-entropy function was selected, and the formula for the Loss function is as follows 53 :

where N is the number of training samples; y i is the predicted value, which is the probability of being predicted category; ŷ i is the label value, which is the probability of being the true category.

Dataset optimization

The dataset is a model composed for data visualization, serving as the foundation and core of deep learning algorithms, providing data support for the training and validation set. In the case of repairing graphics with complex textures and structures, the original graphic data is typically used as the dataset. Initially, the most relevant data related to the repair part is extracted from the dataset as the training set, which is then input into the model. The model trains on the corresponding features to identify feature patterns and obtain target weights, resulting in targeted training outcomes. For repairing graphics with simple textures and structures, where feature calculation is simpler, and considering time and cost constraints, the selection of training sets is more diverse.

In most cases, larger datasets can train more stable and higher-performing models. This is primarily because more data’s can provide more information, aiding the model in learning more complex patterns and relationships. However, as the dataset grows to a certain scale, the improvement in model performance begins to diminish, and overfitting may even occur. Therefore, it can be inferred that the number of data points selected for a dataset is not necessarily better when increased but should be evaluated based on the specific task and data conditions. Hence, incorporating diverse features can enhance the model’s generalization capability.

This paper obtained a dataset comprising a total of 1213 DIC strain nephograms from various stages of damage. Careful selection of a portion of the region from each DIC strain nephogram was made to create masks, forming the dataset. This process was time-consuming. To address this issue, the author made changes to the selection of the dataset.

By observing the DIC strain nephogram, it was found that these nephograms can be divided into 16 different colored grid blocks, with each block containing one or more colors. Based on this discovery, the original data was defined using color blocks, and the open-source Places2 dataset was selected. The Places dataset is a commonly used image dataset for visual tasks, and with continuous improvements, the Places1 version has been iterated to the Places2 version, which exhibits superior image categories and quality compared to Places1.The Places2 dataset contains color blocks from various scenes, which can effectively repair the color blocks. By utilizing existing datasets, the need for creating targeted datasets was reduced, thus avoiding significant time wastage. To mitigate potential errors in model training results caused by targeted datasets and open-source datasets, both training sets were deployed on the same server for training, resulting in repair outcomes that could be compared.

Comparing the repair images generated by the two training sets, one DIC strain nephogram with high color contrast was selected for illustration, as shown in Fig.  22 . Upon observing the images, it was noted that the two repair results were evaluated for errors based on similarity and color vividness. Regarding similarity, by referencing colors, grid curves, and color boundaries, three missing areas in the two repair images were compared, revealing minimal differences between them. In terms of color vividness, a section of the missing area with the same color was selected for comparison, showing that the repair area trained by the Places2 dataset had brighter colors, blending more harmoniously with the unrepaired areas of the original image. This phenomenon was attributed to the imperfect noise handling during the creation of DIC strain nephograms, whereas images in the Places2 dataset underwent more refined technical processing. A comprehensive comparison led to the conclusion that the repair images generated by the Places2 dataset exhibited extremely high similarity to those produced by targeted datasets and excelled in color vividness.

figure 22

The algorithm is trained by two datasets to repair the result comparison graph.

Result comparison

To compare the differences between the improved ZITS algorithm and the ZITS algorithm, experiments were conducted using the NVIDIA Tesla A100 graphics card and the NVIDIA GeForce RTX3070Ti as contrasting servers. The Place2 dataset was used as the training set, and both sets of servers were trained simultaneously to compare and analyze the resulting repair images.

As shown in Fig.  23 , the following graph compares the training times of the ZITS algorithm and the improved ZITS algorithm. From the graph, it can be observed that when using the same server, the A100 server group took approximately 6 h for training using the ZITS algorithm, with a result output time of over 3 min. On the other hand, the improved algorithm group took about 3 h for training, with a result output time of 1–3 min, reducing the time by half compared to the ZITS algorithm group. For the 3070ti server group, the ZITS algorithm took around 12 h for training, with a result output time of over 3 min. In contrast, the improved algorithm group took about 6 h for training, with a result output time of 1–3 min, again reducing the time by half compared to the ZITS algorithm group. Based on the available data, it can be concluded that the improved algorithm requires less time for both training and result prediction compared to the ZITS algorithm, thus saving a significant amount of time for image restoration processing on comparable devices.

figure 23

Comparison of training time between two algorithms.

After training, the two sets of models obtained different evaluation metrics to assess the quality of the repaired images output by the models. The evaluation metrics are shown in Table 2 . It can be observed from the table that, due to the reduction in parameter size after algorithm lightweight processing and the removal of some unimportant parameters, the evaluation metrics of the improved ZITS algorithm are slightly lower than those of the original ZITS algorithm.

A single image from the validation set typically cannot represent the entire model’s level of restoration. The DIC strain nephogram comprises distinct stages including initial compaction, elastic phase, plastic phase, and post-peak failure stage, each exhibiting unique characteristics. As mentioned earlier, the visual representation of DIC strain nephograms consists of 16 color bands, rendering traditional methods for distinguishing four damage stages ineffective. Algorithms address this issue by deeply training color features to balance performance across different stages.

Researchers have found that selecting representative images is subjective and non-reproducible 54 . To address this, a two-step approach is proposed. Firstly, features of images in the collection are prominently extracted. Secondly, the selection of representative images is influenced by feature values. To evaluate the disparity in repair image effectiveness between the two algorithms, image quality, clarity of the repaired area, and pre-and post-repair image comparisons serve as assessment criteria. In terms of image quality, DIC strain nephograms rich in strain information with intricate lines are chosen to fully showcase their capability in handling complex details. Regarding the clarity of the restored region, strain nephograms with the largest missing range in the post-peak failure stage are selected as validation samples, ensuring superior repair capability when algorithms repair extensively damaged strain nephograms. Regarding pre-and post-repair image comparisons, images with repair effects at a moderate level are selected. Moderately repaired images to some extent reflect the overall level of the model and its feature advantages. In summary, strains representing the post-peak failure stage of soft rock-like specimens are selected as representative DIC strain nephograms for comparative verification of overall repair effectiveness, as depicted in Fig.  24 .

figure 24

Comparison of repair results between the two algorithms.

In this study, the repaired images are compared in terms of image content and grid lines. In terms of image content, both algorithms demonstrate a high degree of consistency in colorizing the damaged regions, with no significant differences observed in the color transitions. Further observation of the grid lines reveals that, when isolated and visualized through binarization, the repaired grid sections exhibit smooth connections in some areas and discontinuities in others. However, the disconnected segments are identical in both images, and no additional disconnections are present. Moreover, an analysis of the tilt, quantity, and length of the grid lines indicates a consistent reconstruction of the grid lines. Thus, based on comprehensive comparison, it can be concluded that although the evaluation metrics of the improved ZITS algorithm slightly decrease after training, the content features of the repaired images remain unchanged, and the minor decline in the metrics has minimal impact on the quality of the image restoration results.

Discussions

A quantitative damage model.

In Chapter 3, this study clarifies the local strain boundaries in different DIC strain nephograms through image data restoration processing, ensuring the accuracy and integrity of the images. As observed in Fig.  18 , there are concentrated strain regions, which are densely populated areas of crack development. To assess the rock damage caused by these concentrated strain regions, some researchers propose that all pixels near the micro-crack regions are considered damaged 55 . Therefore, the damage factor is defined as the ratio between the number of pixels in the damaged region caused by cracks and the total number of pixels in the image, and it is defined as follows:

where A 0 is the number of damaged pixels, where A is the total number of pixels.

Due to the potential vulnerability of the areas surrounding cracks, in order to determine the number of pixels damaged due to cracks, it is assumed that the pixels near the damaged pixels are also damaged. The formula is as follows:

where A s is the total number of pixels in the damaged region and its neighboring grids. The number of pixels can be calculated directly in Matlab based on the DIC strain nephogram combined with the Monte Carlo method, providing experiment-based accurate damage values.

According to the previous research, it is believed that when rocks are subjected to stress, they undergo deformation 56 . However, only when the deformation of rocks reaches a certain degree, damage occur. During the process of rock failure under load, it first goes through the stage of pore compression, where the external pressure is mainly used to densify the rocks, leading to the gradual closure of original elastic structural surfaces, fractures, and pores within the rocks. At this stage, there is almost no stress generated internally in the rocks. Hence, the concept of a damage threshold ε d is introduced, suggesting that ε d should be established at a strain of 45% of the peak stress of the rock. The distribution of internal rock damage is determined by comparing the maximum strain value ε max to ε d , where ε max  >  ε d indicates regions of damage within the rock. Therefore, in the DIC strain nephogram, the strain region where ε max  >  ε d is identified as the damaged area. In summary, the damage factor is defined as follows:

When D f  = 0, it represents undamaged soft or hard rocks, when D f  = 1 represents completely damaged soft or hard rocks. It is important to note that Eq. ( 16 ) differs from the elastic stage model during the initial densification stage and uniaxial compression of rocks. Therefore, when inverting the stress–strain curve, the strain termination point ε d for the densification stage should be the starting point of the elastic stage in the rocks.

Based on the above analysis, the model shown in Eq. ( 16 ) is highly dependent on the accuracy of the DIC strain nephogram. If the image has issues like missing data or blurring, it will have a significant impact on the determination of damage values. However, in this study, the algorithm proposed in this paper for the correction and prediction of DIC strain nephograms effectively addresses these issues. By establishing strain thresholds, the characteristics of undamaged changes in the initial densification stage are compensated for, and a segmented curve damage model applicable to both the initial densification stage and subsequent damage characteristics is established.

Model verification

To validate the reliability of the quantitative damage models for soft and hard rocks, as well as the rationality of the data optimization effects of the algorithm proposed in this paper, simulation analyses were conducted on both soft and hard rocks. The improved damage quantitative model established by Eq. ( 16 ) with the DIC-based damage analysis model mentioned by Song and the AE-based model mentioned by L.M 4 , 14 . The fitted damage strain curves are shown in Fig.  25 for comparison.

figure 25

Comparison of damage strain curves of different rocks based on three damage constitutive models: ( a) hard rock, ( b ) soft rock.

Research by Gong and colleagues indicates that damage variable growth exhibits three distinct phases 57 : the stable damage evolution stage, the intensified damage evolution stage, and the post-peak residual strength stage. These correspond to the consolidation phase, elastic–plastic phase, and post-failure phase of the stress–strain curve, respectively. During the consolidation phase, internal specimen porosity is reduced to closure, resulting in zero damage to the rock. As the material enters the elastic–plastic phase, internal micro-crack expansion accelerates, and new cracks continually extend, causing a rapid increase in the damage rate. During the failure stage, the specimens’ overall strength has already been compromised after the peak strength, the damage growth rate starts to stabilize, and finally, approaching 1 signifies that the rock has been completely fractured.

According to Fig.  25 , the damage strain curve obtained from the DIC-based damage analysis model in soft rock specimens exhibits a concave shape. The rate of damage growth gradually increases until the final destruction stage, indicating an overall trend of accelerated growth, which contradicts the typical damage pattern observed during rock failure. In addition, rocks experience significant damage only in the later stages of the plastic phase, which also contradicts the expected physical changes during the rock loading process. Therefore, it is evident that the damage analysis curve derived from this method lacks rationality. On the other hand, the damage model curve based on acoustic emission technology shows a slow ascent during the elastic stage, compensating for the damage variation in this stage. It also addresses the issue of the plastic stage curve accelerating too quickly to reach 1. However, in soft rocks, damage begins to occur from the very start of the loading and compaction stage, which is not in line with normal physical changes. This discrepancy is likely due to a large amount of noise generated during the test affecting AE signals and leading to misinterpretation, thus impacting the results of the experiment. Upon reviewing the damage analysis curve based on repaired DIC images and optimized through the algorithm presented in this paper in comparison with the other two curves, it is found that the damage curve from this model is smoother, with the rate of damage increase following a “slow-rise-then-decline” trend. Additionally, two distinct inflection points can be identified during the elastic and failure stages. This indicates that during the elastic phase, as the rock pores close, the external load is directly applied to the rock matrix, initiating the expansion of cracks, and leading to a sudden damage increase. In the post-peak failure stage, the overall strength of the rock has largely been lost, leaving only residual strength to counteract the loading effect, causing the damage growth rate to decrease and the curve to flatten. Therefore, the damage model optimized using the algorithm in this paper aligns more accurately with the actual damage evolution observed.

By assuming isotropic internal damage of the rock and applying the principle of equivalent strain, the material’s relationship among strain, damage, and stress under uniaxial loading can be quantitatively determined 58 :

where E is the Young’s modulus of the rock, expressed in MPa.

By integrating Eq. ( 17 ) with Eq. ( 16 ), we can derive the stress–strain relationship obtained from the damage model optimized by the algorithm presented in this paper:

It is particularly noteworthy that Eq. ( 18 ) differs from the model in the initial consolidation and elastic phases during the uniaxial compression of rocks. Consequently, when inverting the stress–strain curve, the endpoint of the strain in the consolidation phase should correspond to the starting point of the rock’s elastic phase. Based on this, we can derive the stress–strain curve informed by the optimized damage model, as illustrated in Fig.  26 .

figure 26

The stress–strain curves of different rocks are compared based on three damage constitutive models and testing machine: ( a ) hard rock, ( b ) soft rock.

After optimizing the damage model algorithm in this paper, and comparing it to the stress–strain curves obtained from the damage models mentioned by the DIC-based damage model, as well as those from the AE-based damage model, and the original data curves, it is evident that the optimized model aligns more closely with the curves derived from the original data. The strain data derived from DIC experience missing values due to factors like speckle dropout during post-peak failure stages, which hinders an accurate representation of damage evolution at this phase. Consequently, the back-calculated curves show a significant drop, inconsistent with the rock uniaxial compression process. Although AE data do not suffer from image loss issues, noise during the experimental procedure results in curve deviations, making the response to damage evolution at various stages less reliable. On observing the stress–strain curves based on AE technology, a substantial divergence is found from the stress–strain curves recorded by the testing machine. During the rock compaction stage, internal pores gradually close, and almost no damage occurs. However, environmental noise causes some signals to exceed the threshold, leading to errors in damage analysis. By observing the curves analyzed using the damage model of DIC images repaired by the optimized algorithm proposed in this paper, the initiation point of damage should correspond to the start of the rock’s plastic phase. During the compaction and elastic stages, rock damage is virtually zero; during the plastic phase, accelerated damage due to stress leads to a sudden increase in damage, with corresponding stress rapidly climbing to peak strength. In the post-peak failure stage, as rock strength decreases, the damage growth slows down, corresponding to a sharp drop in stress, and the stress–strain curve plummets quickly. The stress–strain curves derived from this paper’s model reach an agreement of over 90% with the experimentally obtained curves, indicating that our model more accurately reflects the trend of specimen damage evolution, thus validating the model’s rationality.

Conclusions

In response to the issue of partial distortion in traditional DIC technology images, this paper focuses on both soft and hard rocks as research subjects. By employing the improved ZITS algorithm for the identification, correction, and prediction of DIC images, the following conclusions have been drawn:

When constructing damage constitutive models using strain maps generated by DIC technology, data loss due to speckle fragmentation and detachment leads to significant errors in damage analysis. Concurrently, as the stress–strain curves inferred from the damage strain curves exhibit substantial discrepancies compared to those obtained from the testing machine, repairing DIC strain nephograms becomes a crucial step to enhance the effectiveness of rock damage analysis based on DIC technology. The improved DIC strain nephograms have shown an 8% to 10% increase in information completeness, and the enhanced damage constitutive models proposed using the restored image data result in inferred stress–strain curves that align more closely with those from the testing machine. This demonstrates that the repaired DIC strain nephograms effectively improve the precision of damage analysis.

The operation of the ZITS algorithm not only demands high server performance but also involves a complex and time-consuming computational process. In this study, lightweight measures are adopted to mitigate these challenges. By replacing the traditional convolutional network with a depthwise separable network, the new ZITS algorithm allows for parallel computation of multiple features, reducing both computational workload and parameter count. The improved ZITS algorithm not only cuts the computation time by approximately half compared to its predecessor but it can also be used on standard low-configuration servers. Additionally, the enhanced ZITS algorithm has no negative impact on image detail. Therefore, adopting the modified ZITS algorithm proposed in this paper will significantly save time and money in the field of DIC image restoration and improve the efficiency of analyzing rock damage using DIC technology.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Xing, H. Z. et al. Full-field measurement and fracture characterisations of rocks under dynamic loads using high-speed three-dimensional digital image correlation. Int. J. Impact Eng. 113 , 61–72. https://doi.org/10.1016/j.ijimpeng.2017.11.011 (2018).

Article   ADS   Google Scholar  

Tang, Y., Okubo, S., Xu, J. & Peng, S. J. Experimental Study on damage behavior of rock in compression-tension cycle test using 3D digital image correlation. Rock Mech. Rock Eng. 52 (5), 1387–1394. https://doi.org/10.1007/s00603-018-1685-9 (2019).

Wang, W. et al. Experimental study on anisotropy of strength, deformation and damage evolution of contact zone composite rock with DIC and AE technique. Rock Mech. Rock Eng. 55 (2), 837–853. https://doi.org/10.1007/s00603-021-02682-x (2022).

Song, H. P. et al. Experimental study on damage evolution of rock under uniform and concentrated loading conditions using digital image correlation. Fatigue Fract. Eng. M 36 (8), 760–768. https://doi.org/10.1111/ffe.12043 (2013).

Article   Google Scholar  

Xu, J. et al. Experimental study of generalized stress relaxation of rock based on 3D-DIC technology. Rock Soil Mech. 42 (1), 2. https://doi.org/10.16285/j.rsm.2020.5816 (2021).

Niu, H. et al. Damage constitutive model of microcrack rock under tension. Adv. Civ. Eng. 2020 , 1–10. https://doi.org/10.1155/2020/8835305 (2020).

Su, Y., Zhang, Q. C., Xu, X. H. & Zeren, G. Quality assessment of speckle patterns for DIC by consideration of both systematic errors and random errors. Opt. Lasers Eng. 86 , 132–142. https://doi.org/10.1016/j.optlaseng.2016.05.019 (2016).

Badaloni, M. et al. Impact of experimental uncertainties on the identification of mechanical material properties using DIC. Exp. Mech. 55 , 1411–1426. https://doi.org/10.1007/s11340-015-0039-8 (2015).

Article   CAS   Google Scholar  

Rubino, V., Rosakis, A. J. & Lapusta, N. Full-field ultrahigh-speed quantification of dynamic shear ruptures using digital image correlation. Exp. Mech. 59 , 551–582. https://doi.org/10.1007/s11340-019-00501-7 (2019).

Du, K. et al. Experimental study on acoustic emission (AE) characteristics and crack classification during rock fracture in several basic lab tests. Int. J. Rock Mech. Min. Sci. 133 , 104411. https://doi.org/10.1016/j.ijrmms.2020.104411 (2020).

Dai, S., Liu, X. & Nawnit, K. Experimental study on the fracture process zone characteristics in concrete utilizing DIC and AE methods. Appl. Sci. 9 (7), 1346. https://doi.org/10.3390/app90713465 (2019).

Li, S. et al. Mechanical behavior of rock-like specimen containing hole-joint combined flaw under uniaxial loading: Findings from DIC and AE monitoring. J. Mater. Res. Technol. 26 , 3426–3449. https://doi.org/10.1016/j.jmrt.2023.08.102 (2023).

Gu, Q. et al. Damage constitutive model of brittle rock considering the compaction of crack. Geomech. Eng. 15 (5), 1081–1089. https://doi.org/10.12989/gae.2018.15.5.1081 (2018).

Kachanov, L. M. Rupture time under creep conditions. Int. J. Fract. 97 (1–4), 11–18. https://doi.org/10.1023/A:1018671022008 (1999).

Sidorenko, M. et al. Deep learning in denoising of micro-computed tomography images of rock samples. Comput. Geosci. 151 , 104716. https://doi.org/10.1016/j.cageo.2021.104716 (2021).

Li, Q. et al. An image recognition method for the deformation area of open-pit rock slopes under variable rainfall. Measurement 188 , 110544. https://doi.org/10.1016/j.measurement.2021.110544 (2022).

Tariq, Z., Elkatatny, S., Mahmoud, M. et al . (2017) A new technique to develop rock strength correlation using artificial intelligence tools. SPE Reservoir Characterisation and Simulation Conference and Exhibition. OnePetro . 2017 https://doi.org/10.2118/186062-MS

Robson, B. A. et al. Automated detection of rock glaciers using deep learning and object-based image analysis. Remote Sens. Environ. 250 , 112033. https://doi.org/10.1016/j.rse.2020.112033 (2020).

Jiang, Y. F. et al. Research on dynamic cracking properties of cracked rock mass under the effect of thermal treatment. Theor. Appl. Fract. Mech. 122 , 103580. https://doi.org/10.1016/j.tafmec.2022.103580 (2022).

Zhang, D., Yang, Y. X., Ren, H. T., Huang, K. L. & Niu, S. W. Experimental research on efficiency and vibration of polycrystalline diamond compact bit in heterogeneous rock. J. Pet. Sci. Eng. 220 , 111175. https://doi.org/10.1016/j.petrol.2022.111175 (2023).

Zheng, C. M. et al. Statistical study of squeezing for soft rocks based on factor and regression analyses of effective parameters. Int. J. Rock Mech. Min. Sci. 163 , 105306. https://doi.org/10.1016/j.ijrmms.2022.105306 (2023).

Mishra, S., Kumar, A., Rao, K. S. & Gupta, N. K. Experimental and numerical investigation of the dynamic response of tunnel in soft rocks. Structures 29 , 2162–2173. https://doi.org/10.1016/j.istruc.2020.08.055 (2021).

Lu, H. F., Zhang, K., Yi, J. L. & Wei, A. C. A study on the optimal selection of similar materials for the physical simulation experiment based on rock mineral components. Eng. Fail. Anal. 140 , 106607. https://doi.org/10.1016/j.engfailanal.2022.106607 (2022).

Wu, K. et al. Characterizing rock transverse anisotropic spatial variations using digital drilling. Geoenergy Sci. Eng. 232 , 212451. https://doi.org/10.1016/j.geoen.2023.212451 (2024).

Sun, H., Du, W. & Liu, C. Uniaxial compressive strength determination of rocks using X-ray computed tomography and convolutional neu-ral networks. Rock Mech. Rock Eng. 54 (8), 4225–4237. https://doi.org/10.1007/s00603-021-02503-1 (2021).

Cao, R. H. et al. Damage deterioration mechanism and damage constitutive modelling of red sandstone under cyclic thermal-cooling treatments. Arch. Civ. Mech. Eng. 22 (4), 188. https://doi.org/10.1007/s43452-022-00505-6 (2022).

Rossi, M. et al. Evaluation of volume deformation from surface DIC measurement. Exp. Mech. 58 (7), 1181–1194. https://doi.org/10.1007/s11340-018-0409-0 (2018).

Li, H. M., Li, H. G., Wang, K. L. & Liu, C. Effect of rock composition microstructure and pore characteristics on its rock mechanics properties. Int. J. Min. Sci. Technol. 28 (02), 303–308. https://doi.org/10.1016/j.ijmst.2017.12.008 (2018).

He, M. et al. Numerical simulation of rock bursts triggered by blasting disturbance for deep-buried tunnels in jointed rock masses. Comput. Geotech. 161 , 105609. https://doi.org/10.1016/j.compgeo.2023.105609 (2023).

Yang, B. et al. Effect of horizontal stress on fractal characteristics of rockburst fragments in coal mining. Energy 281 , 128181. https://doi.org/10.1016/j.energy.2023.128181 (2023).

Zheng, Z. et al. Microdynamic mechanical properties and fracture evolution mechanism of monzogabbro with a true triaxial multilevel disturbance method. Int. J. Min. Sci. Technol. https://doi.org/10.1016/j.ijmst.2024.01.001 (2024).

Zheng, Z. et al. Disturbance mechanical behaviors and anisotropic fracturing mechanisms of rock under novel three-stage true triaxial static-dynamic coupling loading. Rock Mech. Rock Eng. https://doi.org/10.1007/s00603-023-03696-3 (2023).

Martin, R., Marcel, A., Ondřej, J. & Anne, J. Improving DIC accuracy in experimental setups. Adv. Eng. Mater. 21 (7), 1900092. https://doi.org/10.1002/adem.201900092 (2019).

Dong, Y. L. & Pan, B. A review of speckle pattern fabrication and assessment for digital image correlation. Exp. Mech. 57 (8), 1161–1181. https://doi.org/10.1007/s11340-017-0283-1 (2017).

Juan, J. R. et al. Multi-class structural damage segmentation using fully convolutional networks. Comput. Ind. 112 , 103121. https://doi.org/10.1016/j.compind.2019.08.002 (2019).

Criminisi, A., Pérez, P. & Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13 (9), 1200–1212. https://doi.org/10.1109/TIP.2004.833105 (2004).

Article   ADS   PubMed   Google Scholar  

Guo, Q. et al. A method of blasted rock image segmentation based on improved watershed algorithm. Sci. Rep. 12 (1), 7143. https://doi.org/10.1038/s41598-022-11351-0 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wang, N., Ma, S. H., Li, J. Y., Zhang, Y. P. & Zhang, L. F. Multistage attention network for image inpainting. Pattern Recognit. 106 , 107448. https://doi.org/10.1016/j.patcog.2020.107448 (2020).

Chen, Y. T. et al. The improved image inpainting algorithm via encoder and similarity constraint. VC Print 37 (7), 1691–1705. https://doi.org/10.1007/s00371-020-01932-3 (2021).

Liu, Z. Q., Cao, Y. W., Wang, Y. Z. & Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 104 , 129–139. https://doi.org/10.1016/j.autcon.2019.04.005 (2019).

Yan, L. et al. Cascaded transformer U-net for image restoration. Signal Process. 206 , 108902. https://doi.org/10.1016/j.sigpro.2022.108902 (2023).

Bizhani, M., Ardakani, O. H. & Little, E. Reconstructing high fidelity digital rock images using deep convolutional neural networks. Sci. Rep. 12 (1), 4264. https://doi.org/10.1038/s41598-022-08170-8 (2022).

Yu, Z. et al. Optimization of postblast ore boundary determination using a novel sine cosine algorithm-based random forest technique and Monte Carlo simulation. Eng. Optimiz. 53 (9), 1467–1482. https://doi.org/10.1080/0305215X.2020.1801668 (2021).

Sadegh, K. & Pejman, T. Segmentation of digital rock images using deep convolutional autoencoder networks. Comput. Geosci. 126 , 142–150. https://doi.org/10.1016/j.cageo.2019.02.003 (2019).

Dong, Q., Cao, C., Fu, Y. Incremental transformer structure enhanced image inpainting with masking positional encoding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition . 11358-11368. https://doi.org/10.1109/CVPR52688.2022.01107 (2022)

Van Eycke, Y. R. et al. Segmentation of glandular epithelium in colorectal tumours to automatically compartmentalise IHC biomarker quantification: A deep learning approach. Med. Image Anal. 49 , 35–45. https://doi.org/10.1016/j.media.2018.07.004 (2018).

Article   PubMed   Google Scholar  

Arthur, C., Daniel, P., Matthew, T. & Pierre, B. Deconvolution of ultrasonic signals using a convolutional neural network. Ultrasonics 111 (6), 106312. https://doi.org/10.1016/j.ultras.2020.106312 (2021).

Dmitry, Y. Error bounds for approximations with deep ReLU networks. Neural Netw. 94 , 103–114. https://doi.org/10.1016/j.neunet.2017.07.002 (2017).

Bai, F. Y., Fan, M. Q., Yang, H. L. & Dong, L. P. Fast recognition using convolutional neural network for the coal particle density range based on images captured under multiple light sources. Int. J. Min. Sci. Technol. 31 (06), 1053–1061. https://doi.org/10.1016/j.ijmst.2021.09.004 (2021).

Arora, M., Kanjilal, U. & Varshney, D. Evaluation of information retrieval: Precision and recall. Int. J. Indian Cult. Bus. Manag. 12 (2), 224–236. https://doi.org/10.1504/IJICBM.2016.074482 (2016).

Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21 , 1–13. https://doi.org/10.1186/s12864-019-6413-7 (2020).

Hong, C. S. & Oh, T. G. TPR-TNR plot for confusion matrix. Commun. Stat. Appl. Methods 28 (2), 161–169. https://doi.org/10.29220/CSAM.2021.28.2.161 (2021).

Zhou, Y. F. et al. MPCE: A maximum probability based cross entropy loss function for neural network classification. IEEE Access 7 , 146331–146341. https://doi.org/10.1109/ACCESS.2019.2946264 (2019).

Markey, M. K., Boland, M. V. & Murphy, R. F. Toward objective selection of representative microscope images. Biophys. J. 76 (4), 2230–2237. https://doi.org/10.1016/S0006-3495(99)77379-0 (1999).

Maruyam, I. A. & Sasano, H. Strain and crack distribution in concrete during drying. Mater. Struct. 47 (3), 517–532. https://doi.org/10.1617/s11527-013-0076-7 (2014).

Lagier, F., Jourdain, X., Sa, D. C., Benboudjema, F. & Colliat, J. B. Numerical strategies for prediction of drying cracks in heterogeneous materials: Comparison upon experimental results. Eng. Struct. 33 (3), 920–931. https://doi.org/10.1016/j.engstruct.2010.12.013 (2010).

Gong, F. Q., Zhang, P. L., Luo, S., Li, J. C. & Huang, D. Theoretical damage characterisation and damage evolution process of intact rocks based on linear energy dissipation law under uniaxial compression. Int. J. Rock Mech. Min. Sci. 146 , 104858. https://doi.org/10.1016/j.ijrmms.2021.104858 (2021).

Lemaitre, J. How to use damage mechanics. Nucl. Eng. Des. 80 (2), 233–245. https://doi.org/10.1016/0029-5493(84)90169-9 (1984).

Article   MathSciNet   Google Scholar  

Download references

Acknowledgements

The authors gratefully acknowledge the support of the Hubei Provincial Natural Science Foundation of China (Grant No. 2020CFB367).

This work was supported by the Natural Science Foundation of Hubei Province [Grant Number 2020CFB367].

Author information

Present address: School of Urban Construction, Yangtze University, Jingzhou, 434023, China

Authors and Affiliations

School of Urban Construction, Yangtze University, Jingzhou, 434023, China

Mingzhe Xu & Diandong Geng

State Key Laboratory of Geomechanics and Geotechnical Engineering, Wuhan Institute of Rock and Soil Mechanics, Chinese Academy of Sciences, Wuhan, 430071, Hubei, China

You can also search for this author in PubMed   Google Scholar

Contributions

Qi and Xu were responsible for writing the main text of the manuscript, while Geng prepared Figures 1-26. All authors contributed to the review and finalization of the manuscript.

Corresponding author

Correspondence to Xianyin Qi .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Xu, M., Qi, X. & Geng, D. Application of improved and efficient image repair algorithm in rock damage experimental research. Sci Rep 14 , 14849 (2024). https://doi.org/10.1038/s41598-024-65790-y

Download citation

Received : 17 March 2024

Accepted : 24 June 2024

Published : 27 June 2024

DOI : https://doi.org/10.1038/s41598-024-65790-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Digital image
  • Image restoration
  • Transformer algorithm
  • Neural network
  • Rock damage

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

image processing in research paper

ACM Digital Library home

  • Advanced Search

A survey of artificial intelligence-assisted analysis of breast ultrasoundimages classification: Research progress and future directions

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Computing methodologies

Machine learning

Machine learning approaches

Neural networks

Recommendations

Radiomics and artificial intelligence in breast imaging: a survey.

Medical imaging techniques, such as mammography, ultrasound and magnetic resonance imaging, plays an integral role in the detection and characterization of breast cancer. Although computers are believed to gain an important role in the assessment ...

Artificial intelligence in gynecologic cancers: Current status and future challenges – A systematic review

Over the past years, the application of artificial intelligence (AI) in medicine has increased rapidly, especially in diagnostics, and in the near future, the role of AI in medicine will become progressively more ...

  • A review was conducted using “artificial intelligence” and “gynecologic cancers.”

Artificial intelligence for breast cancer analysis: Trends & directions

Breast cancer is one of the leading causes of death among women. Early detection of breast cancer can significantly improve the lives of millions of women across the globe. Given importance of finding solution/framework for early detection and ...

  • Critical analysis of commonly used breast imaging modalities along with their strengths and limitations.
  • Details of different imaging modalities datasets available for research.
  • Detailed analysis of popular deep learning ...

Information

Published in.

cover image ACM Other conferences

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • CAD-assisted diagnosis
  • artificial intelligence
  • breast ultrasound images classification
  • Research-article
  • Refereed limited

Funding Sources

  • Science and Technology Projects in Guangzhou
  • Chaozhou Science and Technology Plan Project
  • Guangdong Provincial Medical Research Fund Project

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

IMAGES

  1. 😊 Research paper on digital image processing. Digital Image Processing Research Papers. 2019-01-24

    image processing in research paper

  2. 😊 Research paper on digital image processing. Digital Image Processing Research Papers. 2019-01-24

    image processing in research paper

  3. Review Paper of Image processing technique for the diagnosis of diseases

    image processing in research paper

  4. Research Paper Processing

    image processing in research paper

  5. (PDF) Review Paper On Image Processing

    image processing in research paper

  6. 🎉 Medical image processing research papers. Most Downloaded Medical Image Analysis Articles

    image processing in research paper

VIDEO

  1. DATA COLLECTION AND PROCESSING ||RESEARCH AND METHODOLOGY || SYBAF || sem 4 #mumbaiuniversity

  2. AI for Journalists: Enhancing Research and Storytelling

  3. video enhancement matlab

  4. Combined Methods for Information Retrieval Systems

  5. Basic Image Processing Methods

  6. Digital image processing paper rtu exam for computer science engineering 6th semester

COMMENTS

  1. Image Processing: Research Opportunities and Challenges

    Image Processing: Research O pportunities and Challenges. Ravindra S. Hegadi. Department of Computer Science. Karnatak University, Dharwad-580003. ravindrahegadi@rediffmail. Abstract. Interest in ...

  2. Image processing

    Image processing articles from across Nature Portfolio ... Research Open Access 25 Jun 2024 ... When it comes to bioimaging and image analysis, details matter. Papers in this issue offer guidance ...

  3. (PDF) A Review on Image Processing

    Edges characterize boundaries and edge detection is one of the most difficult tasks in image processing hence it is a problem of fundamental importance in image processing. In this paper ...

  4. Deep learning models for digital image processing: a review

    Within the domain of image processing, a wide array of methodologies is dedicated to tasks including denoising, enhancement, segmentation, feature extraction, and classification. ... This compilation of research papers presents a comprehensive exploration of deep learning methodologies applied to two prominent types of image segmentation ...

  5. Frontiers

    The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging ...

  6. Image Processing Technology Based on Machine Learning

    Machine learning is a relatively new field. With the deepening of people's research in this field, the application of machine learning is increasingly extensive. On the other hand, with the advancement of science and technology, graphics have been an indispensable medium of information transmission, and image processing technology is also booming. However, the traditional image processing ...

  7. The Constantly Evolving Role of Medical Image Processing in Oncology

    In this paper, it is argued that the evolution of medical image processing has been a gradual process, and the diverse factors that contributed to unprecedented progress in the field with the use of AI are explained. ... During the last decades CAD-driven precision diagnosis has been the holy grail of medical image processing research efforts ...

  8. Image processing

    Read the latest Research articles in Image processing from Scientific Reports

  9. Advances in image processing using machine learning techniques

    With the recent advances in digital technology, there is an eminent integration of ML and image processing to help resolve complex problems. In this special issue, we received six interesting papers covering the following topics: image prediction, image segmentation, clustering, compressed sensing, variational learning, and dynamic light coding.

  10. (PDF) Advances in Artificial Intelligence for Image Processing

    AI has had a substantial influence on image processing, allowing cutting-edge methods and uses. The foundations of image processing are covered in this chapter, along with representation, formats ...

  11. J. Imaging

    When we consider the volume of research developed, there is a clear increase in published research papers targeting image processing and DL, over the last decades. A search using the terms "image processing deep learning" in Springerlink generated results demonstrating an increase from 1309 articles in 2005 to 30,905 articles in 2022, only ...

  12. Deep Learning-based Image Text Processing Research

    Deep learning is a powerful multi-layer architecture that has important applications in image processing and text classification. This paper first introduces the development of deep learning and two important algorithms of deep learning: convolutional neural networks and recurrent neural networks. The paper then introduces three applications of deep learning for image recognition, image ...

  13. Search for image processing

    In Defense of Classical Image Processing: Fast Depth Completion on the CPU. 2 code implementations • 31 Jan 2018. With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from hand crafted classical image processing algorithms.

  14. Frontiers

    PAMI Research Lab, Computer Science, University of South Dakota, Vermillion, SD, United States ... With this theme, we opened a call for papers on Current Trends in Image Processing & Pattern Recognition that exactly followed third International Conference on Recent Trends in Image Processing & Pattern Recognition ...

  15. IET Image Processing

    This paper proposes a night flare removal network trained on synthetic data which can remove a variety of flare interference while reasonably retaining the light source information. Compared with the existing methods, this method achieves better visual perception and makes up for the lack of methods in the field of night flare removal. Abstract.

  16. 471383 PDFs

    All kinds of image processing approaches. | Explore the latest full-text research PDFs, articles, conference papers, preprints and more on IMAGE PROCESSING. Find methods information, sources ...

  17. Techniques and Applications of Image and Signal Processing : A

    This paper comprehensively overviews image and signal processing, including their fundamentals, advanced techniques, and applications. Image processing involves analyzing and manipulating digital images, while signal processing focuses on analyzing and interpreting signals in various domains. The fundamentals encompass digital signal representation, Fourier analysis, wavelet transforms ...

  18. Texts and Images as Data in Qualitative Social Research: Proposing a

    In order to prepare the argument that "texts are like images," at least more similar to images than is often assumed, I firstly summarize some frequently formulated methodological positions on images and texts as primary data of research.Summarizing these positions demonstrates that in the methodologically contested borderline between the data genres of image and text, scholars, perhaps ...

  19. [2406.18567] Research on Image Processing and Vectorization Storage

    For the purpose of achieving a more precise definition and data analysis of images, this study conducted a research on vectorization and rasterization storage of electronic maps, focusing on a large underground parking garage map. During the research, image processing, vectorization and rasterization storage were performed. The paper proposed a method for the vectorization classification ...

  20. Image Processing

    The two examples in this paper show the possibilities of image processing for structure suppression and contrast enhancement of low contrast features. ... Image processing is a constantly growing research area that is used in many applications in different fields, such as security, medicine, quality control, and astronomy, among others. It ...

  21. Image Processing based on Deep Neural Networks for Detecting Quality

    5. Conclusions and future work This paper investigated the use of deep neural networks for performing automatic quality inspections based on image processing to eliminate the current manual inspection process. The focus of this study is a real-world industrial case study of paper bag production using a Faster R-CNN to detect defective bags.

  22. A Study on Various Image Processing Techniques

    Abstract. The image processing techniques plays vital role on image Acquisition, image pre-processing, Clustering, Segmentation and Classification techniques with different kind of images such as Fruits, Medical, Vehicle and Digital text images etc. In this study the various images to remove unwanted noise and performs enhancement techniques ...

  23. A lightweight underwater fish image semantic segmentation model based

    IET Image Processing journal publishes the latest research in image and video processing, covering ... this paper proposes an image segmentation method based on a lightweight U-Net network to segment fish targets in underwater images. Specifically, we first design a multiple-input method in the first four encoder levels to obtain the optimal ...

  24. Research on adaptive object detection via improved HSA‐YOLOv5 for

    IET Image Processing journal publishes the latest research in image and video processing, covering the generation, processing ... In this paper, the best parameter is selected after debugging several sets of Himp, Simp and Vimp parameters through several experiments with the values of 0, 0.2, and 0.3, respectively, under which the model shows ...

  25. Image Processing Research Papers

    ABSTRACT In this paper we discuss issues in real-time image processing, including applications, approaches and hardware. In particular, we discuss the failure of existing programming languages to support these considerations and present requirements for any language that can support real-time image processing.

  26. Research on the Application of Computer Graphics Image Processing

    The application of image processing technology is critical in intelligent computer graphics images; however, it has an issue with erroneous performance positioning. The typical Grid algorithm is unable to address the inaccurate processing and positioning issue in...

  27. SCA-YOLOv4: you only look once with squeeze-and-excitation ...

    As shown in Fig. 1a, the input image resolution is first adjusted to 608 × 608 (or 416 × 416) and then input to the network for training and detection. Backbone mainly uses the CSP (cross stage partial) module composed of CBM convolution block and residual module stack to extract picture features, which can deepen the number of network layers to obtain richer semantic information and ...

  28. A review on image processing and image segmentation

    A methodological study on significance of image processing and its applications in the field of computer vision is carried out here. During an image processing operation the input given is an image and its output is an enhanced high quality image as per the techniques used. Image processing usually referred as digital image processing, but optical and analog image processing also are possible ...

  29. Application of improved and efficient image repair algorithm in rock

    In the petroleum and coal industries, digital image technology and acoustic emission technology are employed to study rock properties, but both exhibit flaws during data processing. Digital image ...

  30. A survey of artificial intelligence-assisted analysis of breast

    This paper specifically focuses on reviewing and analyzing the research progress of AI-assisted breast ultrasound image classification to provide researchers with a clear understanding of the research context and future directions. ... Image Processing and Pattern Recognition. April 2024. 373 pages. ISBN: 9798400716607. DOI: 10.1145/3663976 ...