Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Brief Communication
Cardiac Critical Care, Case Report
Cardiac Critical Care, Case Series
Cardiac Critical Care, Editorial
Cardiac Critical Care, Invited Editorial
Cardiac Critical Care, Original Article
Cardiac Critical Care, Point of Technique
Cardiac Critical Care, Review Article
Case Report
Case Report, Cardiac Critical Care
Case Series
Case Series, Cardiac Critical Care
Editorial
Editorial, Cardiac Critical Care
Guest Editorial
Invited Editorial
Invited Editorial, Cardiac Critical Care
JCCC Quiz, Cardiac Critical Care
Legends in Cardiac Sciences
Letter to Editor
Letter To Editor Response
Letter to Editor, Cardiac Anesthesia
Letter to Editor, Cardiac Critical Care
Letter to the Editor
Media & News
Narrative Review, Cardiac Critical Care
Notice of Retraction
Original Article
Original Article, Cardiac Critical Care
Original Article, Cardiology
Perspective
Perspective Insights
Perspective, Cardiac Critical Care
Point of Technique
Point of Technique, Cardiac Critical Care
Point of View, Cardiac Critical Care
Review Article
Review Article, Cardiac Critical Care
Review Article, Cardiology
Review Article, Evidence Based Medicine
Review Article, Invited
Short Communication
Short Communication, Cardiac Critical Care
Surgical Technique
Surgical Technique, Cardiac Critical Care
Surgical Technique, Cardiology
Systematic Review
Technical Note
Video Case Report
Video Commentary
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Brief Communication
Cardiac Critical Care, Case Report
Cardiac Critical Care, Case Series
Cardiac Critical Care, Editorial
Cardiac Critical Care, Invited Editorial
Cardiac Critical Care, Original Article
Cardiac Critical Care, Point of Technique
Cardiac Critical Care, Review Article
Case Report
Case Report, Cardiac Critical Care
Case Series
Case Series, Cardiac Critical Care
Editorial
Editorial, Cardiac Critical Care
Guest Editorial
Invited Editorial
Invited Editorial, Cardiac Critical Care
JCCC Quiz, Cardiac Critical Care
Legends in Cardiac Sciences
Letter to Editor
Letter To Editor Response
Letter to Editor, Cardiac Anesthesia
Letter to Editor, Cardiac Critical Care
Letter to the Editor
Media & News
Narrative Review, Cardiac Critical Care
Notice of Retraction
Original Article
Original Article, Cardiac Critical Care
Original Article, Cardiology
Perspective
Perspective Insights
Perspective, Cardiac Critical Care
Point of Technique
Point of Technique, Cardiac Critical Care
Point of View, Cardiac Critical Care
Review Article
Review Article, Cardiac Critical Care
Review Article, Cardiology
Review Article, Evidence Based Medicine
Review Article, Invited
Short Communication
Short Communication, Cardiac Critical Care
Surgical Technique
Surgical Technique, Cardiac Critical Care
Surgical Technique, Cardiology
Systematic Review
Technical Note
Video Case Report
Video Commentary
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Brief Communication
Cardiac Critical Care, Case Report
Cardiac Critical Care, Case Series
Cardiac Critical Care, Editorial
Cardiac Critical Care, Invited Editorial
Cardiac Critical Care, Original Article
Cardiac Critical Care, Point of Technique
Cardiac Critical Care, Review Article
Case Report
Case Report, Cardiac Critical Care
Case Series
Case Series, Cardiac Critical Care
Editorial
Editorial, Cardiac Critical Care
Guest Editorial
Invited Editorial
Invited Editorial, Cardiac Critical Care
JCCC Quiz, Cardiac Critical Care
Legends in Cardiac Sciences
Letter to Editor
Letter To Editor Response
Letter to Editor, Cardiac Anesthesia
Letter to Editor, Cardiac Critical Care
Letter to the Editor
Media & News
Narrative Review, Cardiac Critical Care
Notice of Retraction
Original Article
Original Article, Cardiac Critical Care
Original Article, Cardiology
Perspective
Perspective Insights
Perspective, Cardiac Critical Care
Point of Technique
Point of Technique, Cardiac Critical Care
Point of View, Cardiac Critical Care
Review Article
Review Article, Cardiac Critical Care
Review Article, Cardiology
Review Article, Evidence Based Medicine
Review Article, Invited
Short Communication
Short Communication, Cardiac Critical Care
Surgical Technique
Surgical Technique, Cardiac Critical Care
Surgical Technique, Cardiology
Systematic Review
Technical Note
Video Case Report
Video Commentary
View/Download PDF

Translate this page into:

Original Article
10 (
2
); 137-149
doi:
10.25259/JCCC_39_2025

Deep Learning Methods for Precise Cardiac Image Segmentation Review

Department of Data Science and Business Systems , Sri Ramaswamy Memorial (SRM) Institute of Science and Technology, Kattankulathur, Tamil Nadu, India.

*Corresponding author: Subramani Nandhagopal, Department of Data Science and Business Systems, Sri Ramaswamy Memorial (SRM) Institute of Science and Technology, Kattankulathur, Tamil Nadu, India. ns1118@srmist.edu.in

Licence
This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.

How to cite this article: Nandhagopal S, Sasikala E. Deep Learning Methods for Precise Cardiac Image Segmentation Review. J Card Crit Care TSS. 2026;10:137-49. doi: 10.25259/JCCC_39_2025

Abstract

Objectives:

This study aims to review state of the art (SOTA) deep learning (DL) methods that segment cardiac images from magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound imaging modalities, and to discuss the strengths and weaknesses of these approaches as well as current research challenges.

Material and Methods:

Recent advances in DL architecture include convolutional neural networks (CNN), fully convolutional networks (FCN), U-Nets, V-Nets, and recurrent neural networks (RNN). Studied and reviewed the application of these architectures to cardiac segmentation tasks and considered preprocessing, loss functions, and hybrid approaches (i.e., DL combined with traditional segmentation methods).

Results:

DL methods consistently outperform conventional segmentation techniques in accuracy and efficiency. MRI segmentation is most widely studied due to the availability of public datasets, while CT and ultrasound present challenges such as motion artifacts, speckle noise, and annotation sparsity. Advanced models (e.g., Dense U-Net, OmegaNet, multi-view CNN) achieve high Dice scores; however, 3D models require greater computational resources and risk overfitting.

Conclusion:

Deep learning has transformed cardiac imaging segmentation by improving its predictive accuracy and clinical utility. However, challenges remain in data availability, computational complexity, and interpretability. Future work should focus on lightweight architectures, implementing semi-supervised learning, and utilizing explainable AI to facilitate broader clinical implementation.

Keywords

Boundary objects
Cardiac magnetic resonance imaging
Convolutional neural network
Deep learning
Image segmentation

INTRODUCTION

Globally, cardiovascular diseases (CVDs) are considered a major cause of death based on the report provided by the World Health Organization. 18 million people die due to CVDs, more specifically due to stroke or heart disease. This rate is increasing very steeply.[1] In recent times, diverse improvements have been made in cardiovascular research and analysis that attempt to enhance treatment and diagnosis of this disease, which eventually reduces the mortality rate. Modern medical imaging approaches, such as computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound, are widely deployed by the medical community, thus facilitating quantitative and non-invasive qualitative assessment of functional and cardiac anatomical structures to support disease monitoring, diagnosis, prognosis, and treatment planning.

Cardiac image segmentation is a pivotal step in medical imaging.[2] It partitions the image into several anatomical/semantic regions based on its quantitative measures such as wall thickness, myocardial mass, right ventricle (RV) and left ventricle (LV) volume, and ejection fraction (EF) that have to be extracted. Anatomical structures for cardiac image-based segmentation include coronary arteries, RV, LV, right atrium,and left atrium (LA). The overall typical tasks associated with cardiac image segmentation are shown in Figure 1.

Automatic cardiac image segmentation partitions the image into anatomical region.
Figure 1:
Automatic cardiac image segmentation partitions the image into anatomical region.

Before the advent of deep learning (DL) techniques, conventional machine learning approaches, such as atlas-based techniques and model-based approaches (i.e., appearance and shape models), were widely deployed to achieve superior performance in cardiac image segmentation.[3] As well, it may need appropriate feature engineering or extensive knowledge to attain the required prediction accuracy. Alternatively, DL-based techniques, known for their automatic identification of intricate features from images, are viewed as a more promising venture for segmentation and object detection.[4] The features that are automatically extracted from the images are used for training the learning models and end-to-end analysis. This property of DL is essentially tapped by the medical community for utilization in various image analysis applications.[5] With rapid advances in computer hardware (i.e., tensor processing unit or graphical processing units) along with the abundance of data available for training, the learning methods have affirmatively led to the dominance of DL over other approaches.[6] The investigations over MRI-based segmentation are higher than the other two modalities.[7] Unlike the other two modalities, MRI data are freely available and accessible in a public dataset the analysis has been done more frequently.

This extensive survey presents a comprehensive insight into the existing DL approaches for cardiac image segmentation in three common modalities in clinical practice, namely ultrasound, CT, and MRI. This article also commensurates the benefits and limitations associated with the existing DL-based segmentation process that may hinder the extensive clinical operations or practices. Based on this knowledge, there are various investigations and analyses that are presented in the application of learning-based techniques for medical image analysis, and various analyses are done based on the application model for cardiac image analysis. As well, there is no extensive systematic overview concentrated on cardiac segmentation applications. Literature does not witness many works that effectively review the existing DL-based cardiac image segmentation, more specifically, in the region of vessel, LV, and RV segmentation.

MATERIAL AND METHODS

DL-based techniques provide an effective way for segmenting tissues or organs (e.g., scars, vessels, and LV) in certain modalities, thus providing qualitative analysis of cardiovascular functionalities and structures. This is used for segmenting the ventricle more specifically in ultrasound and MRI domains.[7] The ultimate objective for ventricle segmentation is to segment the epicardium and endocardium of the RV and LV. This partitioning is essential for providing clinical indices such as left ventricular end-systolic volume, left ventricular end-diastolic volume, right ventricular endsystolic volume and right ventricular end-diastolic volume, and EF.[8] As well, segmentation mapping is extremely needed for 3D shape analysis, 3D times motion analysis, and survival prediction.[9] Figure 1 shows the automatic cardiac image segmentation partitions the image into anatomical regions.[8]

Segmentation with MRI images

MRI images of cardiac regions are a non-invasive approach that shows structural visualization of the heart. MRI does not need any ionizing radiation. Cardiac MRI image facilitates appropriate quantification of both functions and anatomy, and pathological tissues such as scars. Based on this MRI image segmentation, a standardized quantitative cardiac analysis is conducted. The process of MRI cardiac image segmentation is illustrated in Figure 2.[10]

Magnetic resonance imaging–based cardiac image segmentation. The schematic (left panel) demonstrates labeled regions: right ventricle (1), left ventricular cavity (2), left ventricular myocardium (3), and papillary muscles/trabeculations (4). The corresponding cardiac MRI slice (right panel) illustrates segmentation contours, with the red dotted line indicating the left ventricular endocardial border and the green dotted line representing the epicardial border; additional outer contour (blue, if present) denotes the right ventricular or overall cardiac boundary. Internal white markings highlight papillary muscles or intracavitary structures within the left ventricle.
Figure 2:
Magnetic resonance imaging–based cardiac image segmentation. The schematic (left panel) demonstrates labeled regions: right ventricle (1), left ventricular cavity (2), left ventricular myocardium (3), and papillary muscles/trabeculations (4). The corresponding cardiac MRI slice (right panel) illustrates segmentation contours, with the red dotted line indicating the left ventricular endocardial border and the green dotted line representing the epicardial border; additional outer contour (blue, if present) denotes the right ventricular or overall cardiac boundary. Internal white markings highlight papillary muscles or intracavitary structures within the left ventricle.

Trans[10] used a fully convolutional network (FCN) to segment RV, LV, and myocardium appropriately with shorter axis cardiac MRI. This approach is based on FCN to attain accuracy in the segmentation process, which outperforms conventional techniques in accuracy and speed. Many notable works have been done on optimizing network structure to improve feature-based learning ability during the segmentation process. Bernard et al.[11] modeled a dense U-net to merge multi-scale features for proper segmentation with larger anatomical variability. Zheng et al.[12] Examined various loss functions such as weighted dice loss, weighted cross entropy, focal loss, and supervision loss, to enhance the segmentation process. Based on FCN, two-dimensional networks are considered indeed of three-dimensional networks. This is due to typical low planar resolution and artefacts identified in cardiac magnetic resonance (MR) scans that restrict the 3D network availability. The main limitation of this approach is the slice-by-slice processing without leveraging the inter-slice dependencies. Therefore, the 2D network fails in segmenting and locating heart slices such as basal slices and apical slices, where ventricles are not determined properly. To combat this issue, various works are done in obtaining contextual information, such as shapes from multi-view images and labels in 2D FCN. Alternatively, spatial information can also be extracted from slices to perform segmentation using recurrent units. These units can then be used for leveraging information over diverse temporal frames in cardiac cycles to enhance temporal and spatial consistency for attaining the finest segmentation.

Other predominant issues associated with 2D and 2D FCN segmentation processes are that the network is trained with a pixel-wise loss function, thus making it unsuitable for learning features specific to the anatomical structures.[13] Various approaches focus on applying and modeling anatomical constraints to train the network to enhance robustness and accuracy. These are specified in terms of regularization by considering contour, topology, and shape or region information to boost the network to produce plausible anatomical segmentation. The proposed network regularization provides more appropriate segmentation during post-processing stages.

Multitask learning is augmented with the FCN-based cardiac segmentation, to regularise partitioning by training auxiliary tasks that are more appropriate for the segmentation process, such as motion estimation, cardiac function estimation, image reconstruction, and ventricle size classification. Multitask-based network training assists in extracting features that result in enhancing prediction accuracy and learning efficiency. Some of the researchers concentrate on multi-stage pipelining that breaks down the segmentation process into multiple sub-tasks. For instance, localization with a region of interest (ROI) is adopted in some cases. The author in Dou et al.[14] proposed a network model termed OmegaNet, which may include a U-net for the localization of the cardiac chamber. Here, learning modules are normalized as image orientation, and U-net series are provided for fine-grained segmentation. With ROI localization and input image rotation for canonical orientation, the anticipated model may provide superior outcomes in orientation and in various sizes.

Detailed analysis has been done by merging traditional segmentation techniques with neural networks (NNs), for instance, atlas-based models, deformable models, level sets, and graph-based cut methods. Here, NNs are used in initialization and feature extraction to diminish manual interactions and enhance the segmentation accuracy of traditional segmentation for deployment. For LV segmentation, DL-based models are used in cardiac MRI images. The author used a convolutional NN (CNN) to detect LV automatically and then used an auto-encoder to evaluate LV shape. The estimated shape was utilized to initialize the deformation model for shape refinement. This model converges faster than traditional deformable models, thus achieving better accuracy.[15] This idea is extended for segmenting RV. Here, a hybrid model is deployed that shows superior segmentation accuracy than non-DL methods. The segmentation process is used for bi-ventricular segmentation with FCNs for simultaneous segmentation of RV and LV. These methods achieve superior accuracy than the traditional models in confronting segmentation regions such as ventricular myocardium. This projects the superiority of DL models.

Next, atrial fibrillation (AF) is measured as a common cardiac electrical disorder that affects more than 1 million people. Based on this clinical practice segmentation process, the atrial anatomy process is used in both AF pre-operative planning and postoperative evaluation. The atrium segmentation is used for scar segmentation. Conventional methods like region growing, non-rigid registration, and atlas-based label fusion are applied for LA segmentation. As well, the accuracy of this method is superior with better initialization and pre-processing techniques. Similarly, scar segmentation is done with contrast-enhanced MRI and late gadolinium enhancement imaging. This facilitates segmentation over myocardial scar identification and fibrosis. In DL, scar segmentation is done with clustering techniques and threshold methods that are used for local intensity changes. The foremost limitations associated with these techniques are the need for manual segmentation in the ROI to diminish computational costs and search space. As an outcome, semi-automated techniques are not appropriate for clinical deployment. Some DL techniques are merged with traditional approaches for scar segmentation purposes. Dou et al.[15] used an atlas model to recognise the LA and used a deep NN (DNN) to identify fibrotic tissues over those regions. Multi-view CNN is used to fuse features for superior segmentation accuracy. This model provides a 0.90 mean Dice score for LA and a 0.78 mean Dice score for atrial scars. The aortic lumen segmentation from MR images is needed for appropriate hemodynamic and mechanical aorta characterization. The important challenge associated with this is annotation sparsity in image sequences, where some frames are not annotated. This limitation is an open-ended research problem that attracts researchers’ attention.

Segmentation with CT images

CT imaging is a non-invasive method utilized for disease prediction and treatment planning, particularly in evaluating cardiac anatomy and coronary arteries. It includes two modalities: contrast-enhanced CT angiography, which provides superior visualization of cardiac chambers and vessels after contrast agent injection, and non-contrast CT imaging, which uses tissue density to generate images based on varying attenuation values of materials such as calcium and fat. This review also discusses prevalent DL techniques used for cardiac CT segmentation. The emphasis on diagnosis and treatment planning is crucial for managing critical care patients. As shown in Figure 3, CT image segmentation can be performed using deep learning approach.[16]

Segmentation with computed tomography images. RA: Right atrium; RV: Right ventricle, LA: Left atrium; RV: Left ventricle.
Figure 3:
Segmentation with computed tomography images. RA: Right atrium; RV: Right ventricle, LA: Left atrium; RV: Left ventricle.

DL techniques for segmentation involve extracting ROI and classifying them using CNNs. Various methods, including 3D FCN and bounding boxes for LV detection, enhance heart segmentation accuracy, particularly in CT images compared to MR images due to differences in intensity distributions and image quality. A volumetric representation of the heart is derived from training multiple CNN models on different views. In addition, advancements in segmentation refinement utilize shape context and adaptive fusion strategies, while hybrid loss functions are employed to address class imbalance and improve performance.

The author in Simonyan and Zisserman[17] suggested an automatic patient identification approach by segmenting and analyzing the LV myocardium using a multi-scale FCN for segmentation and a convolutional autoencoder for characterization. Analyzing coronary arteries is crucial for diagnosing CVD and planning surgeries, with the segmentation process involving lumen and centerline extraction, which faces challenges due to motion artefacts in CT imaging. DL models, particularly CNNs, are applied in both pre- and post-processing stages, with 3D dilated CNNs trained on posterior probability distributions to enhance extraction. A multi-task segmentation strategy across different modalities improves performance, while a multi-scale supervision method integrated into the U-net architecture demonstrates better voxel-level predictions compared to baseline models, producing smooth surfaces without the need for post-processing.[18]

Many works focus on directly segmenting chest CT and non-contrast cardiac CT using a combination of Dense Net and U-net for improved accuracy. A two-stage pipelining approach is utilized for localizing the target ROI through rigid transformation, enhancing segmentation robustness with a DL framework enhanced by LV segmentation in echocardiograms. The method features a deep belief network (DBN) for identifying rigid transformations and localization, highlighting that DBN-based feature extraction is more resilient to image appearance variations, which reduces the complexity of inference and training with sparse learning in the detection process.

Some works focus on segmentation without restoring two-stage processes to decrease computational complexity, using sparse manifold learning to maintain segmentation accuracy while reducing training and search complexity. The FCN author applies coarse segmentation to refine level set-based methods. Cardiac ultrasound images record temporal sequences, with various methods enhancing coherence among frames for improved segmentation robustness. A common approach is dynamic modeling with particle filtering, which yields better outcomes by disregarding temporal information. The integration of long-short-term memory and U-Net considers multiple frames, enhancing segmentation accuracy and robustness against image quality variations compared to using a single U-Net frame. Moreover, machine learning models efficiently produce both unlabeled and labeled images by expanding training data, employing a deep network for initializing a small set of labeled data, and utilizing U-Net with Kalman filtering for pre-training annotations to validate external classifiers.

Hu et al.[19] proposed a model to train a CNN for partial labelling of multiple sequences and fine-tuned the network for manual segmentation. Semi-supervised framework facilitates training on both unlabeled and labeled images. This uses a generative network to produce an ultrasound image for segmentation of unlabeled frames and to identify segmentation to resourcefully generate input for ultrasound images. Unlabeled data may manually improve annotated data from multiple domains and assist in improving the segmentation of certain fields.

Segmentation with ultrasound images

Ultrasound cardiac imaging, known as echocardiography, is a key tool for assessing cardiovascular function due to its low cost and portability. Automation of anatomical structure segmentation employs methods like level sets, active contours, and active shapes, though accuracy suffers from issues like speckle noise and image contrast. To enhance image quality for tissue classification, DL techniques combined with deformable models improve 2D segmentation. Features extracted through DNN advance robustness, while one-shot segmentation reduces computational complexity in two-stage approaches.[20] Temporal ultrasound data recording aids coherence, enhancing segmentation accuracy. CNN effectively manages segmentation without post-processing complications, leveraging U-Net for annotation in 2D ultrasound images, which improves accuracy through extensive training data and real-time processing capabilities. Focusing on clinical situations involves the application of critical care monitoring and decision-making tailored to individual patients.

3D cardiac segmentation presents greater challenges than 2D segmentation due to issues like temporal resolution and image quality. The increased dimensionality of 3D images complicates DL processing. A two-stage approach is employed, utilizing a 2D CNN for initial coarse segmentation, which aids in refining the 3D model. Transfer learning is implemented to optimize training by utilizing a segmented dataset while addressing challenges related to device variation, patient conditions, and protocols. Figure 4 presents ultrasound image-based segmentation.[21]

Ultrasound image–based cardiac segmentation. Representative transthoracic echocardiographic frames from standard views illustrating variability in anatomy and image quality.
Figure 4:
Ultrasound image–based cardiac segmentation. Representative transthoracic echocardiographic frames from standard views illustrating variability in anatomy and image quality.

To encourage and to resemble real-time clinical situations, the evaluation and development of clinical segmentation must be done in a large set of publicly available datasets. The benchmarking dataset shows that an auto-encoder-based CNN may reduce the inter-observer and lower error conditions in cardiologists. A sparsely connected NN with a lesser number of parameters is used to identify a bounding box to locate the target structure, where learning with a bounding box reduces computational complexity. To enhance clarity and reproducibility, detailing dataset acquisition, preprocessing steps (normalization, resizing, augmentation, slice selection), network architecture, training parameters (learning rate, batch size, epochs, optimizer, loss function), and performance evaluation metrics.

DL network models

DL network (DLN) models are deep artificial neural networks (D-ANN). Every NN comprises of input layer, hidden layers and output layers [Figure 5].[3] This section presents a detailed analysis of DLN models and the techniques associated with this for segmentation algorithms has been analyzed. DL networks, known as DLN, consist of a structure called a D-ANN, which comprises three main components: an input layer, hidden layers, and an output layer.

Generic view of convolutional neural network.
Figure 5:
Generic view of convolutional neural network.

The methodology employed for dataset acquisition and preprocessing is detailed, encompassing normalization, resizing, augmentation, and slice selection. It describes the network architectures utilized alongside the training parameters, including learning rate, batch size, epoch count, optimizer, and loss function. Consistency in pre-processing was maintained across all datasets. The network architectures referenced are consistent with their original publications, and all training parameters are provided to enable replication of results. Publicly available datasets were prioritized to enhance replicability, and standardized evaluation metrics in medical imaging segmentation were used for performance comparison with previous studies.

CNNs

CNNs consist of multiple layers, including convolutional layers that use kernels to extract features from images, and pooling layers that reduce the dimensions of the image. The input layer, directly connected to the image, features neurons corresponding to the image pixels. Convolutional layers generate an activation map to represent filter effects, followed by non-linear transformations.[22] Pooling layers, such as average and max pooling, reduce dimensionality, while fully connected layers provide higher-level abstraction. Weights and kernels are optimized through backpropagation during training.

Two-dimensional (2D) CNN

The concept of segmentation in image processing, particularly using 2D images and filters within a CNN framework. It highlights the advantages of using multi-modality images, specifically through RGB channels, over single modality inputs for improved segmentation results. Techniques such as transfer learning are employed for training low-level and high-level features. In addition, 2D CNNs efficiently extract spatial information with reduced computational costs, and multi-task segmentation assesses network designs for various organ segmentation. The training model benefits from 2D labelled data, which is generally more accessible than 3D data, while also noting that volumetric image decomposition aims to alleviate dimensionality issues. Challenges associated with 2D convolution and lower resolution depth are also mentioned. Figure 6 shows the segmentation using 2D CNN architecture.[12]

2D convolutional neural network.
Figure 6:
2D convolutional neural network.

Three-dimensional (3D) CNN

3D techniques are utilized to extract volumetric data across the X, Y, and Z axes, focusing on the labels of central voxels. The 2D model emphasizes higher spatial information while the 3D model focuses on volumetric data across the X, Y, and Z axes. A 3D model predicts central voxel labels using 3D patches and has an enhanced structure compared to 2D models. 3D medical imaging benefits from improved spatial information and utilizes a multi-scale 3D CNN to enhance resolution through a receptive field. This model employs 3 × 3 kernels for better accuracy and addresses dimensionality issues with shared spatial weights, reducing parameters and processing time. Despite its effectiveness in segmenting volumetric images and organs, the training of 3D models poses challenges.[23] The use of 3D pooling stabilizes features across 3D space, promoting quicker convergence rates compared to traditional 3D CNNs. While the model excels at segmenting large organs, it struggles with brain boundary detection and faces drawbacks when juxtaposed with 2D models.

FCN

Fully connected layers are replaced by fully convolutional layers, enabling pixel-wise predictions. In FCN, the fully connected layer is replaced by a fully convolutional layer, enabling pixel-wise predictions [Figure 7][12] and enhancing localization through the combination of higher resolution activation maps with up-sampled outputs fed into convolutional layers. This approach supports full-sized image processing instead of relying on patch-wise predictions, particularly in multi-organ segmentation for abdominal imaging.[24] A separate FCN model is employed for 2D sectional views, merging segmentation outputs at the pixel level to achieve higher accuracy, especially for liver segmentation, while also improving the results for smaller organs when applied to 3D image segmentation.

Fully connected network.
Figure 7:
Fully connected network.

The segmentation accuracy in cascaded FCN is enhanced by stacking FCN series to utilize contextual features for predictive mapping. This process increases complexity and computational cost, while merging FCN effectively segments the ROI using separate filters for each stage, significantly improving segmentation quality, particularly in fetal segmentation. A multi-streaming model within the 3D FCN optimizes the use of contextual information across different image resolutions, thereby increasing system robustness. However, FCN faces challenges due to a fixed receptive size, which complicates object identification as size varies. Multi-scaling addresses this issue, but parameter sharing remains suboptimal as diverse object scaling requires varied parameters. The model is impacted by a high class imbalance between background and foreground images, necessitating a two-step segmentation approach, with multi-streaming methods employed for multi-organ detection.

U-Net

U-Net utilizes the concept of deconvolution and employs skip connections to address localization issues. The U-Net is a significant structural model for medical image segmentation, employing deconvolution and consisting of 19 layers with skip connections to address localization issues [Figure 8].[13] It balances patch size and pooling layers to enhance accuracy, utilizing connections between resolution layers to improve feature extraction in deconvolution layers. The model demonstrates efficient and quick image segmentation, with advancements in segmentation accuracy over the traditional U-Net. The 3D U-Net variant further enhances spatial information, enabling volumetric segmentation from 2D slices and utilizing a softmax loss function for improved sample annotation and densification.

U-Net model.
Figure 8:
U-Net model.

V-Net model

V-Net employs feature extraction and reduces resolution, facilitating probabilistic segmentation through softmaxThis text describes a model resembling the U-Net, known as V-Net, which focuses on feature extraction and resolution reduction through convolutional pathways [Figure 9].[15] It functions similarly to pooling, reducing memory requirements while lacking backpropagation input. The process concatenates lower resolution feature maps to enable probabilistic segmentation and employs softmax for distinguishing between foreground and background. The V-Net model potentially offers a higher dice coefficient and presents advantages over traditional methods, specifically the original V-Net model.

V-Net model.
Figure 9:
V-Net model.

Recurrent NNs (RNN)

RNN models utilize recurrent connections for pattern memorization, but long short-term memory (LSTM) networks can lose spatial information. In contrast, gated recurrent unit (GRU) models simplify this by removing memory cells. This RNN model facilitates pattern memorization with recurrent connections, using multiple adjacent slices to extract contextual information from sequential data [Figure 10].[12] LSTM, a subset of RNN, is employed for image segmentation, though it risks losing spatial information. The contextual LSTM enhances the U-Net structure by capturing slice context for improved segmentation[24] GRUs simplify performance but result in memory cell removal. The clockwork RNN optimises modeling with fewer parameters and better long-term dependency. While RNNs allow for parallel processing, they can be challenging and time-consuming; their performance is notably superior in segmenting organs, such as the liver, compared to inter-slice information.[24,25]

Recurrent neural network.
Figure 10:
Recurrent neural network.

RESULTS

DL has significantly improved medical image segmentation accuracy and the handling of complex structures. However, it requires large annotated datasets difficult to obtain from clinical experts. Techniques such as transfer learning enable the adaptation of pre-trained models for specific applications. Segmentation often employs overlapping patches to enhance accuracy, particularly in addressing class imbalance, with weighted loss functions helping to train underrepresented classes. While 3D segmentation offers improved spatial information, it demands more computational resources and risks overfitting on small datasets; thus, 2D models are favoured for quicker training and reduced data needs. Data augmentation techniques enhance training datasets and mitigate overfitting. Multi-organ segmentation poses challenges due to organ variability, prompting the development of new networks to improve accuracy. Although 3D models yield the best volumetric performance, 2D models remain more practical for resource-constrained scenarios. Emphasises the necessity for efficient and dependable segmentation tools tailored for critical care environments, where prompt patient monitoring is essential, and resources are constrained. Further research focuses on lightweight architectures and enhancing data generalization for clinical use. Deep learning models for ventricle segmentation[26], atrial segmentation[27], aorta segmentation[10], scar segmentation[24], whole heart segmentation[26], cardiac substructure segmentation[28], coronary artery segmentation[29], coronary artery calcium and plaque segmentation[16], four chamber segmentation[30] have been widely explored in the literature and are summarized in Table 1.

Table 1: Applications based on modality and structure.
Application Description Structure Modality
Ventricle segmentation 2D FCN Bi-ventricle ---
2D U-net+3D U-net (ensemble) Bi-ventricle ---
2D M-Net with weighted cross-entropy loss Bi-ventricle ---
2D Dense U-net with inception module Bi-ventricle ---
2D FCN Myo, LV ---
Atrial segmentation Multi-view CNN with adaptive fusion strategy LA ---
Two-stage pipeline; 3D U-net (localization)+3D U-net (segmentation) LA ---
Aorta segmentation RNN to learn temporal coherence Aorta ---
Scar segmentation Fully automated; Atrial scars, LA ---
Multi-atlas method for LA segmentation, followed by an AE to find the atrial scars Atrial scars ---
Fully automated; Multi-view two-task recursive attention model   Atrial dscars ---
Whole heart segmentation 3D U-net with deep supervision Blood pool+myocardium of the heart ---
Dilated CNN with deep supervision Blood pool+myocardium of the heart ---
3D FCN with deep supervision Myocardium of the heart ---
Cardiac substructure segmentation Orthogonal 2D U-nets with shape context --- MR/CT
Multi-planar FCNs with an adaptive fusion strategy --- MR/CT
3D U-net with deep supervision --- MR/CT
3D deeply-supervised U-net with multi-depth fusion --- CT
Multi-scale FCN --- CT
Unsupervised segmentation with GANs --- MR/CT
Coronary artery segmentation CNN as path pruning --- CT
Multi-task FCN with a minimal patch extractor --- CT
3D FCN with level set --- CT
CNN to estimate direction classification and radius regression --- CT
Graph convolutional network --- CT
Coronary artery calcium and plaque segmentation Patch-based CNN --- CT
U-net and FC DenseNet --- CT
U-DenseNet --- CT
DenseRAU-net --- CT
View classification and four-chamber segmentation 2D U-net --- Ultrasound

FCN: Fully convolutional network, LV: Left ventricle, CNN: Convolutional neural networks, LA: Left atrium, AF: Atrial fibrillation, MR: Magnetic resonance, CT: Computed tomography, RNN: Recurrent neural network, AE: Autoencoder, GAN: Generative adversarial networks

DISCUSSION

The analysis in Table 1 highlights several factors linked to the segmentation model: (1) Enhancement of network features through advanced building blocks; (2) addressing class imbalance using specific loss functions; (3) boosting generalization and robustness in multi-stage pipelines; and (4) producing anatomically plausible segmentations via anatomical constraints or adversarial loss for training regularization. In addition, a multi-scale FCN is employed to improve segmentation accuracy and ensure temporal consistency.[21,31-34] Performance in DL has improved with the use of publicly available training samples, enabling transparent comparison and evaluation. Challenges in 3D network models have prompted the adoption of 2D networks,[35-40] particularly for cardiac image segmentation. While 2D networks can manage larger slice thickness, 3D networks are more complex, costly to optimise, and prone to over-fitting. Consequently, there is a trend toward developing simplified 3D networks to address these issues. Overall, 2D network models remain more popular than their 3D counterparts for segmentation across all modalities.[41-47]

Challenges

Design challenges in DL models include overfitting, primarily caused by small dataset sizes. Solutions involve training with larger datasets, creating multiple patch views, using drop-outs, and applying pooling layers to reduce parameter dimensionality. Batch normalization enhances convergence, but downsampling leads to information loss. Deeper networks may face exploding signal issues addressed with supervised models. In medical image segmentation, challenges include the appearance of target organs and contrast boundaries, especially with 3D models, which complicate training and increase computational demands. Future analysis is required to identify effective solutions to these challenges for critical care cardiac imaging.

CONCLUSION

A comprehensive overview of recent advancements in medical imaging segmentation techniques is provided, focusing on the strengths and weaknesses of 2D and 3D techniques for CT, MRI, and ultrasound imaging in cardiac critical care. The review highlights the impact of CNN, segmentation model complexity, and adequate dataset availability on segmentation output quality. A critical assessment of DL network architectures is covered, including CNNs, FCNs, U-Nets, V-Nets, and RNN-based Models, which tackle challenges such as class imbalances, limited sample sizes, anatomical consistency, and multi-scale feature representation. The review also addresses common issues in segmentation models, such as overfitting, the computational cost of 3D networks, motion artefacts, and the difficulty in accurately defining organ boundaries. It underscores the need for more efficient architectures, improved generalization techniques, and training configurations to enhance segmentation effectiveness. The review outlines the current state of research in medical image segmentation, identifies limitations that require attention, and suggests future directions, including the development of lightweight and data-efficient DL architectures, strategies to mitigate overfitting from small sample sizes, and the enhancement of segmentation accuracy through anatomical priors, hybrid methods, and advanced optimization techniques.

Ethical approval:

Institutional Review Board approval is not required since it is a technology comparison article and no real experiments are involved, no ethical clearance was required.

Declaration of patient consent:

Patient’s consent not required as there are no patients in this study.

Conflicts of interest:

There are no conflicts of interest.

Use of artificial intelligence (AI)-assisted technology for manuscript preparation:

The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing or editing of the manuscript and no images were manipulated using AI.

Financial support and sponsorship: Nil.

References

  1. , , , , , . A review of heart chamber segmentation for structural and functional analysis using cardiac magnetic resonance imaging. MAGMA. 2016;29:155-95.
    [CrossRef] [PubMed] [Google Scholar]
  2. , . A survey of shaped-based registration and segmentation techniques for cardiac images. Comput Vis Image Understand. 2013;117:966-89.
    [CrossRef] [Google Scholar]
  3. , , . Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Trans Med Imaging. 2016;35:1153-9.
    [CrossRef] [Google Scholar]
  4. , , , . Deep learning in radiology: An overview of the concepts and a survey of the state of the art with focus on MRI. J Magn Reson Imaging. 2019;49:939-54.
    [CrossRef] [PubMed] [Google Scholar]
  5. , , , , , , et al. Evaluation of current algorithms for segmentation of scar tissue from late gadolinium enhancement cardiovascular magnetic resonance of the left atrium: An open-access grand challenge. J Cardiovasc Magn Reson. 2013;15:105.
    [CrossRef] [PubMed] [Google Scholar]
  6. , , , , , , et al. Evaluation of state-of-the-art segmentation algorithms for left ventricle infarct from late gadolinium enhancement MR images. Med Image Anal. 2016;30:95-107.
    [CrossRef] [PubMed] [Google Scholar]
  7. , , , , , . Interactive whole-heart segmentation in congenital heart disease. Med Image Comput Comput Assist Interv. 2015;9351:80-8.
    [CrossRef] [PubMed] [Google Scholar]
  8. , , , , , , et al. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problemsolved? IEEE TransMed Imaging. 2018;37:2514-25.
    [CrossRef] [PubMed] [Google Scholar]
  9. , , , , , , et al. Evaluation of algorithms for multi-modality whole heart segmentation: An open-access grand challenge. Med Image Anal. 2019;58:101537.
    [CrossRef] [PubMed] [Google Scholar]
  10. . A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI. ArXivabs/1604.00494
    [Google Scholar]
  11. , , , , , , et al. Standardized Evaluation system for left ventricular segmentation algorithms in 3D echocardiography. IEEE Trans Med Imaging. 2016;35:967-77.
    [CrossRef] [PubMed] [Google Scholar]
  12. , , , . 3-D consistent and robust segmentation of cardiac images by deep learning with spatial propagation. IEEE Trans Med Imaging. 2018;37:2137-48.
    [CrossRef] [PubMed] [Google Scholar]
  13. , , , , , , et al. Automatic 3D Bi-ventricular segmentation of cardiac images by a shape-constrained multi-task deep learning approach. IEEE Trans Med Imaging. 2019;38:2151-64.
    [CrossRef] [PubMed] [Google Scholar]
  14. , , , , . Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss In: International Joint Conferences on Artificial Intelligence. .
    [CrossRef] [PubMed] [Google Scholar]
  15. , , , , , , et al. PnPAdaNet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation. IEEE Access. 2019;99:1.
    [CrossRef] [Google Scholar]
  16. , , , , , , . Automatic segmentation of the left ventricle in cardiac CT angiography using convolutional neural networks. In Biomedical Imaging (ISBI) 2016 IEEE 13th International Symposium on (pp. 40-43) 2016 April
    [CrossRef] [Google Scholar]
  17. , . Very Deep Convolutional Networks for Largescale Image Recognition In: In: 3rd International Conference on Learning Representations, ICLR. . 2015
    [Google Scholar]
  18. , , . Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39:640-51.
    [CrossRef] [PubMed] [Google Scholar]
  19. , , . Squeeze-and-Excitation Networks. In: In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 Salt Lake City UT: IEEE Computer Society. Available from: https://arxiv.org/abs/1709.01507 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  20. , , , , . 3D UNet: Learning Dense Volumetric Segmentation from Sparse Annotation In: In: 19th International Conference on Medical Image Computing and Computer-Assisted Intervention-MICCAI. . Available from: https://arxiv.org/abs/1606.06650 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  21. , . Combining multiple dynamic models and deep learning architectures for tracking the left ventricle endocardium in ultrasound data. IEEE Trans Pattern Anal Mach Intell. 2013;35:2592-607.
    [CrossRef] [PubMed] [Google Scholar]
  22. , , , . Densely Connected Convolutional Networks In: Conference on Computer Vision and Pattern Recognition. . Available from: https://arxiv.org/abs/1608.06993 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  23. , , . Med3D: Transfer Learning for 3D Medical Image Analysis. . [arxiv Preprint]. Available from: https://arxiv.org/abs/1904.00625 [Last accessed on 2025 Jul 10]
    [Google Scholar]
  24. , , , , . Deep fusion net for multi-atlas segmentation: Application to cardiac MR images In: , , , , , eds. Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016 - 19th International Conference, Proceedings. Springer Verlag; . p. :521-528.
    [CrossRef] [Google Scholar]
  25. , , , , , . Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imaging using a dual fully convolutional neural network. IEEE Trans Med Imaging. 2019;38:515-24.
    [CrossRef] [PubMed] [Google Scholar]
  26. , , , , , , et al. Fully automatic segmentation and objective assessment of atrial scars for long-standing persistent atrial fibrillation patients using late gadolinium-enhanced mRI. Med Phys. 2018;45:1562-76.
    [CrossRef] [PubMed] [Google Scholar]
  27. , , , , , , et al. Direct delineation of myocardial infarction without contrast agents using a joint motion feature learning architecture. Med Image Anal. 2018;50:82-94.
    [CrossRef] [PubMed] [Google Scholar]
  28. , , , , , , et al. Recurrent Neural Networks for Aortic Image Sequence Segmentation with Sparse Annotations. 21st International Conference on Medical Image Computing and Computer Assisted Intervention-MICCAI; 2018 Available from: https://arxiv.org/abs/1808.00273 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  29. , , , , , , et al. Automatic 3D Cardiovascular MR Segmentation with Densely-Connected Volumetric ConvNets In: In: 20th International Conference on Medical Image Computing and Computer Assisted Intervention-MICCAI. . Available from: https://arxiv.org/abs/1708.005732025 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  30. , , , , , , et al. Deep Learning for Multi-Task Medical Image Segmentation in Multiple Modalities. 19th International Conference on Medical Image Computing and Computer-Assisted Intervention-MICCAI; 2016 Available from: https://arxiv.org/abs/1704.03379 [Last accessed on 2025 Jul 10]
    [CrossRef] [Google Scholar]
  31. , , , , , . Automatic coronary artery calcium scoring in cardiac ct angiography using paired convolutional neural networks. Med Image Anal. 2016;34:123-36.
    [CrossRef] [PubMed] [Google Scholar]
  32. , , , , , , et al. Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Trans Med Imaging. 2019;38:2198-210.
    [CrossRef] [PubMed] [Google Scholar]
  33. , , , , . FR-Net: Joint reconstruction and segmentation in compressed sensing cardiac mri In: Functional imaging and modelling of the heart. Berlin: Springer International Publishing; . p. :352-60.
    [CrossRef] [Google Scholar]
  34. , . Batch normalization: Accelerating deep network training by reducing internal covariate shift In: ICML [Paper]. . p. :448-56.
    [Google Scholar]
  35. , , , , , , et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison In: Conference on Artificial Intelligence. . p. :590-7.
    [CrossRef] [Google Scholar]
  36. , , , , . Automatic Segmentation of LV and RV in Cardiac MRI In: International Workshop on Statistical Atlases and Computational Models of the Heart (Springer). . p. :161-9.
    [CrossRef] [Google Scholar]
  37. , , , , , . Heart chambers and whole heart segmentation techniques. J Electron Imaging. 2012;21:10901.
    [CrossRef] [Google Scholar]
  38. , , , , , , et al. Algorithms for left atrial wall segmentation and thickness-evaluation on an open-source ct and mri image database. Medical Image Analysis. 2018;50:36-53.
    [CrossRef] [PubMed] [Google Scholar]
  39. , , , , , . Constrained-CNN losses for weakly supervised segmentation. Med Image Anal. 2019;54:88-99.
    [CrossRef] [PubMed] [Google Scholar]
  40. , , . Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Med Image Anal. 2019;51:21-45.
    [CrossRef] [PubMed] [Google Scholar]
  41. , , , , , , et al. Deep convolutional neural networks for automatic coronary calcium scoring in a screening study with low-dose chest ct. in medical Imaging: Computer Aided Diagnosis. 2016;9785:255-260.
    [CrossRef] [Google Scholar]
  42. , . AdaEn-Net: an ensemble of adaptive 2d-3d fully convolutional networks for medical image segmentation. Neural Netw. 2020;126:76-94.
    [CrossRef] [PubMed] [Google Scholar]
  43. , , . 2D to 3D evolutionary deep convolutional neural networks for medical image segmentation. IEEE Trans Med Imaging. 2021;40:712-721.
    [CrossRef] [PubMed] [Google Scholar]
  44. , , , , , , et al. APCP-NET: Aggregated Parallel Cross-Scale Pyramid Network for CMR Segmentation In: In: 2019 IEEE 16th International Symposium on Biomedical Imaging. . p. :784-88.
    [CrossRef] [Google Scholar]
  45. , , , , . Dilated-Inception net: Multi-scale feature aggregation for cardiac right ventricle segmentation. IEEE Trans Biomed Eng. 2019;66:3499-508.
    [CrossRef] [PubMed] [Google Scholar]
  46. , , , , , , et al. A survey on deep learning in medical image analysis. Med Image Anal. 2019;42:60-88.
    [CrossRef] [PubMed] [Google Scholar]
  47. , , , . Graph Cut Segmentation of the Right Ventricle in Cardiac MRI using Multi-Scale Feature Learning In: Proceedings of the 3rd International Conference on Cryptography. Security and Privacy (ACM). . p. :231-5.
    [CrossRef] [Google Scholar]
Show Sections