Image Analysis Metrics in Evaluation of Generative Adversarial Network (GAN) Models Performance in Prediction of 177Lu Dose Voxel Kernels

Document Type : Original Paper

Authors

1 Department of Physics, University of Nairobi, P. O. Box 30197, 00100, Nairobi, Kenya. Department of Radiation Oncology, Cancer Treatment Center, The Nairobi Hospital, P.O. Box 30026, 00100 Nairobi, Kenya

2 Department of Physics, University of Nairobi, P. O. Box 30197, 00100, Nairobi, Kenya.

3 Laboratorio di Fisica Medica Istituto Nazionale Tumori Regina Elena - IFO via Elio Chianesi, 53 - 00144 - Roma

4 Department of Physics, Faculty of Science and Technology, University of Nairobi, Nairobi, Kenya

10.22038/ijmp.2025.89964.2590

Abstract

Introduction: In Generative Adversarial Networks (GANs), validation datasets are typically excluded from the adversarial optimization loop, unlike in many conventional machine learning architectures. This study evaluates GAN performance for predicting Dose Voxel Kernels (DVKs) using dedicated image-based metrics and compares these outcomes with training accuracy.
Material and Methods: Density Kernels (DKs) of size 15×15×15 voxels (2.43 mm³ voxel size) were generated from homogeneous materials and CT images using ctcreate/EGSnrc. Each DK incorporated a centrally located isotropic 177Lu source, and corresponding DVKs were simulated using DOSXYZnrc/EGSnrc Monte Carlo methods. Paired DK-DVK datasets were used to train multiple GAN models. Model performance on unseen validation data comprising DKs from water, soft tissue, kidney, and bone was assessed using Structural Similarity Index Method (SSIM), Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and the Relative Global Dimensionless Error of Synthesis (ERGAS).
Results: Twelve GAN models with training accuracies between 96.5% and 99.26% were evaluated. Despite achieving the highest training accuracy, the 99.26% model did not exhibit the best predictive quality. Instead, the 98.4% model achieved superior performance, showing lower MSE (0.020 vs. 0.029 mGy²/MBq·s²), higher PSNR (43.25 vs. 41.35 dB), and a markedly lower ERGAS (10.96 vs. 15.75). SSIM values were consistently high (>0.99) across all models, with no statistically significant differences (p > 0.05), indicating comparable structural fidelity.
Conclusion: Training accuracy alone does not reliably reflect GAN performance. Image-based similarity and error metrics provide a more comprehensive and discriminative evaluation of 177Lu DVK prediction quality.

Keywords

Main Subjects


  1. Scarinci I, Valente M, Pérez P. A Machine Learning-Based Model for a Dose Point Kernel Calculation. EJNMMI Phys. 2023 June;10(1):41. doi:10.1186/s40658-023-00560-9.
  2. Lee M S, Hwang D, Kim J H, Lee J S. Deep-Dose: A Voxel Dose Estimation Method Using Deep Convolutional Neural Network for Personalized Internal Dosimetry. Sci. Rep. 2019 July;9(1):10308. doi:10.1038/s41598-019-46620-y.
  3. Götz T I, Lang E W, Schmidkonz C, Kuwert T, Ludwig B. Dose voxel kernel prediction with neural networks for radiation dose estimation. Z. Für Med. Phys. 2021 Feb.;31(1):23– doi:10.1016/j.zemedi.2020.09.005.
  4. Currie G, Hawk K E, Rohren E, Vial A, Klein R. Machine Learning and Deep Learning in Medical Imaging: Intelligent Imaging. J. Med. Imaging Radiat. Sci. 2019 Dec.;50(4):477– doi:10.1016/j.jmir.2019.09.005.
  5. Carter L M, Ocampo Ramos J C, Kesner A L. Personalized Dosimetry of 177 Lu-DOTATATE: A Comparison of Organ- and Voxel-Level Approaches Using Open-Access Images. Biomed. Phys. Eng. Express. 2021 Sept.;7(5):057002. doi:10.1088/2057-1976/ac1550.
  6. El Naqa I, Ruan D, Valdes G, Dekker A, McNutt T, Ge Y, et al. Machine Learning and Modeling: Data, Validation, Communication Challenges. Med. Phys. 2018;45(10):e834– doi:10.1002/mp.12811.
  7. Kearney V, Chan J W, Wang T, Perry A, Descovich M, Morin O, et al. DoseGAN: A Generative Adversarial Network for Synthetic Dose Prediction Using Attention-Gated Discrimination and Generation. Sci. Rep. 2020 July;10(1):11073. doi:10.1038/s41598-020-68062-7.
  8. Kim K M, Lee M S, Suh M S, Cheon G J, Lee J S. Voxel-Based Internal Dosimetry for 177Lu-Labeled Radiopharmaceutical Therapy Using Deep Residual Learning. Nucl. Med. Mol. Imaging. 2023 Apr.;57(2):94– doi:10.1007/s13139-022-00769-z.
  9. Akhavanallaf A, Shiri I, Arabi H, Zaidi H. Whole-Body Voxel-Based Internal Dosimetry Using Deep Learning. Eur. J. Nucl. Med. Mol. Imaging. 2020 Sept. doi:10.1007/s00259-020-05013-4.
  10. Isola P, Zhu J-Y, Zhou T, Efros A A. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conf. Comput. Vis. Pattern Recognit. CVPR. 2017 July:5967– doi:10.1109/CVPR.2017.632.
  11. Agyeman A Y, Tetteh S G. Technical Evaluation of Machine Learning Models: An Empirical Study. 2024.
  12. Karimipourfard M, Sina S, Mahani H, Karimkhani S, Sadeghi M, Alavi M, et al. A Taguchi-Optimized Pix2pix Generative Adversarial Network for Internal Dosimetry in 18F-FDG PET/CT. Radiat. Phys. Chem. 2024 May;218:111532. doi:10.1016/j.radphyschem.2024.111532.
  13. Chi K N, Yip S M, Bauman G, Probst S, Emmenegger U, Kollmannsberger C K, et al. 177Lu-PSMA-617 in Metastatic Castration-Resistant Prostate Cancer: A Review of the Evidence and Implications for Canadian Clinical Practice. Curr. Oncol. 2024 Mar.;31(3):1400– doi:10.3390/curroncol31030106.
  14. Sartor O, De Bono J, Chi K N, Fizazi K, Herrmann K, Rahbar K, et al. Lutetium-177–PSMA-617 for Metastatic Castration-Resistant Prostate Cancer. N. Engl. J. Med. 2021 Sept.;385(12):1091– doi:10.1056/NEJMoa2107322.
  15. Research C for D E and. FDA Approves Lutetium Lu 177 Dotatate for Treatment of GEP-NETS. FDA. 2019 Feb.
  16. Schneider W, Bortfeld T, Schlegel W. Correlation between CT Numbers and Tissue Parameters Needed for Monte Carlo Simulations of Clinical Dose Distributions. Phys. Med. Biol. 2000 Feb.;45(2):459. doi:10.1088/0031-9155/45/2/314.
  17. Kawrakow I. Accurate Condensed History Monte Carlo Simulation of Electron Transport. I. EGSnrc, the New EGS4 Version. Med. Phys. 2000;27(3):485– doi:https://doi.org/10.1118/1.598917.
  18. Hölscher D, Reich C, Gut F, Knahl M, Clarke N. Pix2Pix Hyperparameter Optimisation Prediction. Procedia Comput Sci. 2024 Mar.;225(C):1009– doi:10.1016/j.procs.2023.10.088.
  19. Breger A, Biguri A, Landman M S, Selby I, Amberg N, Brunner E, et al. A Study of Why We Need to Reassess Full Reference Image Quality Assessment with Medical Images. J. Imaging Inform. Med. 2025 Mar. doi:10.1007/s10278-025-01462-1.
  20. Sara U, Akter M, Uddin M S. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J. Comput. Commun. 2019;07(03):8– doi:10.4236/jcc.2019.73002.
  21. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Networks. Commun. ACM. 2020 Oct.;63(11):139– doi:10.1145/3422622.
  22. Behzadpour M, Ghanbari M. Improving Precision of Objective Image/Video Quality Meters. Multimed. Tools Appl. 2023 Jan.;82(3):4465– doi:10.1007/s11042-022-13416-8.
  23. Rainio O, Teuho J, Klén R. Evaluation Metrics and Statistical Tests for Machine Learning. Sci. Rep. 2024 Mar.;14(1):6086. doi:10.1038/s41598-024-56706-x.