Deep learning-driven catheter tracking from bi-plane X-ray fluoroscopy of 3D printed heart phantoms

Minimally invasive surgery (MIS) has changed not only the performance of specific operations but also the more effective strategic approach to all surgeries. Expansion of MIS to more complex surgeries demands further development of new technologies, including robotic surgical systems, navigation, guidance, visualizations, dexterity enhancement, and 3D printing technology. In the cardiovascular domain, 3D printed modeling can play a crucial role in providing improved visualization of the anatomical details and guide precision operations as well as functional evaluation of various congenital and congestive heart conditions. In this work, we propose a novel deep learning-driven tracking method for providing quantitative 3D tracking of mock cardiac interventions on customdesigned 3D printed heart phantoms. In this study, the position of the tip of a catheter is tracked from bi-plane fluoroscopic images. The continuous positioning of the catheter relative to the 3D printed model was co-registered in a single coordinate system using external fiducial markers embedded into the model. Our proposed method has the potential to provide quantitative analysis for training exercises of percutaneous procedures guided by bi-plane fluoroscopy. Page 2 of Torabinia et al. Mini-invasive Surg 2021;5:32 https://dx.doi.org/10.20517/2574-1225.2021.63 12


INTRODUCTION
Since minimally invasive surgery (MIS) emerged in the 1980s, surgical skills and minimally invasive equipment have achieved significant advancements [1][2][3] . The minimally invasive approach holds a unique place for various surgical specialties, such as general surgery, urology [4] , thoracic surgery [5] , plastic surgery [6] , and cardiac surgery [7] . MIS has not only improved the recovery time of patient's from specific procedures, but is also enabled to provide improved outcomes [8,9] . These benefits to patients, hospitals and physicians have attributed to the rapid development of new MIS procedures, including cardiovascular diseases. The success of cardiac interventions over the last three decades has significantly reduce the mortality and morbidity of coronary, valvular, and various congenital diseases [10,11] . However, expansion of MIS to more complex surgeries demand further development of new technologies, including robotic surgical systems [12] , navigation [13] , guidance [14] , and visualizations [15] , dexterity enhancement [16] , and 3D printing technology [17] .
In recent years, 3D printing technology has been attractive in diverse areas of medicine, including cardiovascular disease [18] . Increasing interest in anatomical modeling and the growing need for pre-operative planning using personalized anatomical models to test for device fit and practicing catheter positioning have encouraged the creation and evolution of 3D printed patient-specific models [19] . Recently, there are several studies showing various implementations of 3D printed heart models for different stages of structural heart interventions, such as pre-operative planning [20][21][22][23] , intra-operative models for enhanced structural orientation [24][25][26] , and evaluations of novel procedural pathways [27,28] . Garekar et al. [29] utilized a 3D printed model for a double outlet right ventricle. The study showed the 3D printed model provided better intuition to decide on an operative approach than conventional imaging (i.e., echocardiography) [29] . Chaowu et al. [23] demonstrated a 3D printed model for transcatheter closure of secundum atrial septal defect, where their findings suggested that 3D printing has the potential to screen for appropriate candidates. Other examples include tetralogy of Fallot [22,30] , hypoplastic left heart syndrome [31,32] , and ventricular septal defect [33,34] . Despite the successful implementation from prior work, the existing surgical planning from 3D printed models does not have methods to analyze how a catheter had actually maneuvered in the 3D printed model.
Our group recently reported a novel training system that provides catheter navigation in mixed reality (MR), with real-time visual feedback of a physical catheter's position within a patient-specific 3D heart model [35] . This method used electromagnetic (EM) sensors to track the catheter position. Although this method is advantageous for portability, it has a low accuracy (up to ~5 mm), requires manual integration of sensors into a catheter, and the hardware not readily available in catheterization labs.
To address these limitations, we propose a novel deep learning-driven method for tracking a catheter in a 3D printed model from bi-plane fluoroscopic images acquired during the procedure. The catheter and heart position are co-registered in a single coordinate system using affine transformations based on four fiducial radiopaque markers, which are located on the 3D printed model. Additionally, the 3D trajectory of the catheter is produced, visualizing the path taken during the mock procedures. Our proposed method has the potential to provide quantitative analysis for training exercises of percutaneous procedures guided by biplane fluoroscopy.

Methodology
A schematic of the proposed training system is shown in Figure 1, where a physician conducts a mock catheterization procedure using a bi-plane C-arm X-ray fluoroscopy machine on a patient-specific 3D printed model. The proposed image tracking aims to detect and co-register the catheter's 3D position and provide a 3D trajectory as quantitative feedback. Different features that are utilized for our proposed tracking system are described in detail in the following subsections, which are in the order by which this process is conducted.

3D printed phantom model
To 3D print a patient-specific model, we used a 3D image processing software (Materialize Mimics Research software 21.0) to import an end-diastolic cardiac computed tomography (CT) scan as a DICOM (Digital Imaging Communication in Medicine) data file, shown in Figure 2A. In Mimics, the specific thresholds are set to segment the heart and the spine, enabling a 3D representation of the heart and spine in one mask while maintaining all the relative positions. Then, the 3D segmentation is saved as a STL file. To trim all the vessels, ribs, and other elements that are not necessary for the model, we used Geomagic Wrap (3D Systems Geomagic Corporation, NC, USA). Additionally, as depicted in Figure 2B and C, the artifacts were removed, and the meshwork was smoothed. Finally, using the "Shell" tool in Geomagic, the model obtained a water-tight thickness, and cleaned reconstructed objects were saved as STL files. Moreover, we utilized Solidworks software 2018 (Dassault Systems) to incorporate the supporting base structure for the heart and spine, fixing their relative distance during printing and use [ Figure 2D and E]. This study used Stratasys Object Connex 260 printing system and the rigid and translucent material named VeroClear [ Figure 2F]. Additionally, the post-printing process (i.e., removing supporting SUP705 Stratasys material) was conducted using a high-flow water jet cleaner (i.e., Powerblast) and art supply sculpting tools. In order to conduct mock catheterization procedures under a C-arm X-ray fluoroscopy machine, we integrated the phantom model into a 5-sided acrylic box (shoppopdisplays.com). The model is then glued in the center of the box with its inlet-and outlet-facing holes that were drilled at two opposite ends of the box [ Figure 2G]. Throughout the fluoroscopic imaging, the box is filled with water, eliminating artifacts from the 3D printed model.

Deep learning architecture
The advancement of deep learning architectures like convolutional neural networks (CNN) and deep autoencoders not only transformed typical computer vision tasks like object detection [36] , but are also efficient in other related tasks like classification [37] , localization [38] , tracking [39] , and image segmentation [40,41] . Ronneberger et al. [41] proposed the state-of-the-art U-Net by replacing the pooling operators in Fully Convolutional Network [42] with upsampling operators, allowing the input image's resolution retention. U-Net's performance in segmenting medical images, notably with a small training dataset, promises the potential of such Encoder-Decoder architecture. The U-Net model was later extended for processing other medical images, including, but not limited to, the Xenopus kidney [43] and MRI volume segmentation of prostate [44] , retinal vessels, liver and tumors in CT scans, ischemic stroke lesion, intervertebral disc and pancreas [45][46][47][48][49][50][51][52] . In this work, to track the catheter's position from the bi-plane fluoroscopic images, we primarily leveraged the U-Net model to detect a radiopaque marker at the tip of the catheter. The details of implementation and framework will be discussed in the following sections.

Collection and preparation of datasets
All fluoroscopic images for training the deep learning U-Net model were acquired during the mock procedures in the catheterization lab at New York-Presbyterian Hospital. The datasets comprise 300 paired bi-plane images pertaining to the maneuvering of a catheter (OSCAR Deflectable Steerable Guiding Sheath, Destino™ Twist) within the patient-specific 3D printed model. The datasets were divided into 3 parts: (1)  training set (60%; 180 images); (2) validation set (20%; 60 images); and (3) testing set (20%; 60 images). The training and validation set were used during model training. The testing set was used for model evaluation at the end of the model training. To ensure that both our training and test dataset contain a fair representation of the catheter's tip and avoid overfitting, we randomly shuffled datasets before splitting them into training and test sets.

Training
The overall steps in our developments of a deep learning model are as follows: (1) randomly initialize the model; (2) train the model on the training set; (3) evaluate the trained model's performance on the validation set; (4) choose the model's hyperparameter with the best validation set performance; and (5) evaluate this chosen model on the test set. An adaptive moment (ADAM) estimation was used for training the CNNs [53] . The loss function was set to the binary cross-entropy. An early stopping rule was applied with 200 epochs. Finally, we evaluated the performance of the DL model by computing accuracy metrics and determined the Dice coefficient on the testing set.

Co-registration algorithms
A key step in this system is to co-register the catheter and heart model in a single coordinate system. To this end, four metal spheres were embedded in our heart phantom model and used as fiducial markers. As shown in Figure 3A, the catheter and all four fiducial markers are visible in both of the bi-plane fluoroscopic images, such that they will be tracked and processed using the OpenCV library in Python. The OpenCV processing comprises Bitwise-Not operation, Smoothing operation, and Contours operation, illustrated in Figure 3B and C. Next, the radiopaque markers' 2D coordinates are identified from both fluoroscopic images (RAO30°, LAO55°) and fed into the co-registration algorithms. Utilizing one of the radiopaque markers as a reference, the other coordinates will be offset. With the offset position of the fiducial marker and the known rotation angle, the 3D positions are solved from equation 3, as shown in Figure 3D. Then, the positions of four predefined fiduciary markers are used to calculate the affine transformation matrix in a single coordinate system using Eq. 4 and Eq. 5. The positions of four fiduciary markers are used to calculate the affine transformation matrix in a single coordinate system. Finally, the transformation matrix is applied to the position of the catheter's tip, as retrieved from a U-Net model prediction, to be co-registered in the coordinate system.

Bi-plane co-registration accuracy
To validate the accuracy of our 3D co-registration algorithm, we 3D printed a jig that holds an array of 50 metal spheres at various heights, shown in Figure 4. Using the biplane C-arm, two fluoroscopic images from two different angles were acquired and processed as described in section 2.5. Finally, the absolute error for each sphere was determined based on the difference between the true value measured from the 3D CAD file and the calculated value from the processed bi-plane images using our co-registration algorithm. As can be seen from Figure 4C, the average accuracy was 0.12 ± 0.11 mm, which is highly accurate for cardiac interventions.

Catheter tip detection
The primary region of interest of a catheter during a procedure is its tip. Any intra-operative errors due to catheter tip maneuvering in the vascular system may raise the risk of puncture, embolization, or tissue damage [54,55] . As a result, we trained a deep learning U-Net model to detect the catheter tip's radiopaque marker in each frame of the fluoroscopic images. Figure 5 depicts the groundtruth and predicted segmentation of the catheter tip's radiopaque marker for the testing dataset. To evaluate the model performance, we used the area-based indexes to compare the predicted segmentation results with the groundtruth. These indexes include the Dice coefficient (DSC) [56] , Binary cross-entropy, and Intersection over Union (IOU) which can be found in Table 1. In order to improve the performance of the U-net model over our datasets and avoid the overfitting training phase, we performed extensive data augmentation [54] , including random shifting, scaling, rotation, and brightness/contrast changes, shown in Figure 6. Throughout each augmentation experiment, the IOU for each image and the mean average for the entire testing datasets (60 images) were calculated. We found that the best performance occurred by applying 10 random translations per image (±20 pixels), scaling with a zoom range of 0.1, 10 regular rotations per image, and random brightness and contrast of 0.5 resulting in 83.67% IOU. It should be noted that our reliable segmentation score (Dice of 0.8457 and IOU of 0.8367) resulted in an accuracy of (< 1 mm), which is far beyond the acceptable range for catheter tip tracking in cardiac applications.
To highlight the deep learning segmentation task's accuracy and efficiency, we compared the performance of the U-Net architecture with some classical image processing techniques (i.e., Thresholding, Watershed, Find and draw Contours by OpenCV, etc.). The catheter's radiopaque marker's appearance is affected by partial occlusions, intensity saturation, and motion blur. As can be seen from Figure 5, and despite the  widespread use of such methods (i.e., adaptive thresholding), they are prone to systemic noise and unreliable measurements, mainly due to the assumptions made in the computational design algorithms and failing to identify separable boundaries.

Trajectory of catheter movement
Fluoroscopy only provides a 2D projection image, and therefore no depth information is visible in the image [57] . Alternatively, fusion imaging allows for 3D imaging data of the heart tissue to be overlaid on a fluoroscopic image; but this technology has the drawback that the catheter and rendered tissue is only seen as a 2D projection, providing little to no post-procedural quantitative analysis. To this end, we demonstrate the 3D trajectory of the catheter derived from bi-plane co-registration method. The 3D trajectory of a catheter is vital information for determining how a procedure was performed and providing a quantitative basis for analysis and future improvements. Figure 7A shows the selected fluoroscopic frames (LAO56°, RAO30°) acquired at the beginning and end of a mock procedure in the 3D printed model. After the catheter tip was detected from the two fluoroscopic images (i.e., RAO30°, LAO56°), the tip's coordinate (from LAO56°) and the derived transformation matrix (from Eq. 5) was used to co-register the catheter in a single coordinate system as described earlier in section 2.5. Figure 7B shows the catheter tip's 3D trajectory for the mock test.

CONCLUSION
This work demonstrates the implementation of a deep learning U-Net architecture to track the 3D movement of a catheter during a mock cardiac intervention under bi-plane fluoroscopy. We leveraged an end-diastolic cardiac CT in order to 3D print a patient-specific phantom model. We integrated four fiducial radiopaque markers on the phantom model, allowing us to co-register fluoroscopic images taken at two different angles (RAO30, LAO55). The U-Net model was trained in a supervised manner on the training set, and the trained model's performance was evaluated on the validation set. Finally, we assessed the DL model's performance by computing accuracy metrics and determining the Dice coefficient on the testing set. Additionally, we demonstrated the 3D trajectory of the catheter tip's movement can be visualized graphically.
We believe the 3D trajectory analysis performed by this model can be used to analyze a physicians' performance and/or provide quantitative feedback for training and educational purposes. This work serves as a proof-of-principle that deep learning can be used for catheter tracking for cardiac interventions, however, since this article is a technical note, it has several limitations in its current stage, and we believe these limitations will be the seed for future developments for both our lab others. These limitations include:  (1) Limited data sets. Currently our dataset is only trained on a single 3D printed heart model and catheter. Therefore, a much more expansive dataset is needed to train a model that can accurately track catheters of different shapes and sizes and in hearts of differing anatomy. (2) Unrealistic background. Although these 3D printed models are patients-specific, meaning they accurately recapitulate the anatomy of the heart and spine, the fluoroscopic images don't include image artifacts from other surrounding anatomy, as will be the case for clinical images. (3) Limited analysis. Currently our model is only able to provide a 3D tracking of the catheter's tip, but there is no subsequent analysis to provide metrics for the performance of the intervention. This will require understanding the goals of the procedure and defining key metrics that can be quantified and will be useful for the physician. (4) No motion-compensation. The position of a catheter relative to the human heart is time-varying due to both respiration and cardiac contractions. Since we're using a 3D printed model there was no motion to compensate for, however, solutions will need to be integrated for the catheter tracking to properly co-register the catheter tip to the heart in a clinical procedure. (5) Spherical fiducial markers. Since a 3D printed model was used, it was convenient to use metal spheres as extrinsic fiducial markers. However, placement of these spheres on an individual will not be trivial and therefore methods that utilize the spine as an intrinsic fiducial maker should be used during acquisition of clinical images, as described in our previous work [58] .
Due to the above listed limitations, this work will have the most immediate impact for performing quantitative analysis of training procedure on 3D printed heart models. We expect that more sophisticated heart models that include motion and match disease states will be created, along with specific criteria for success for each model/intervention to provide feedback in the form of quantitative metrics. Furthermore, the ability to process images in real-time and display the catheter in MR renderings will improve training by providing assistance during the training session, as described in our previous work that adopted EM sensors for tracking [35] . We believe this tracking system will serve to lower the learning curve for new fellows and refine the procedural techniques of attendings.