Document Type : Original Paper
Authors
1
Department of Physics, University of Nairobi, P. O. Box 30197, 00100, Nairobi, Kenya. Department of Radiation Oncology, Cancer Treatment Center, The Nairobi Hospital, P.O. Box 30026, 00100 Nairobi, Kenya
2
Department of Physics, University of Nairobi, P. O. Box 30197, 00100, Nairobi, Kenya.
3
Laboratorio di Fisica Medica Istituto Nazionale Tumori Regina Elena - IFO via Elio Chianesi, 53 - 00144 - Roma
4
Department of Physics, Faculty of Science and Technology, University of Nairobi, Nairobi, Kenya
10.22038/ijmp.2025.89964.2590
Abstract
Introduction: In Generative Adversarial Networks (GANs), validation datasets are typically excluded from the adversarial optimization loop, unlike in many conventional machine learning architectures. This study evaluates GAN performance for predicting Dose Voxel Kernels (DVKs) using dedicated image-based metrics and compares these outcomes with training accuracy.
Material and Methods: Density Kernels (DKs) of size 15×15×15 voxels (2.43 mm³ voxel size) were generated from homogeneous materials and CT images using ctcreate/EGSnrc. Each DK incorporated a centrally located isotropic 177Lu source, and corresponding DVKs were simulated using DOSXYZnrc/EGSnrc Monte Carlo methods. Paired DK-DVK datasets were used to train multiple GAN models. Model performance on unseen validation data comprising DKs from water, soft tissue, kidney, and bone was assessed using Structural Similarity Index Method (SSIM), Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and the Relative Global Dimensionless Error of Synthesis (ERGAS).
Results: Twelve GAN models with training accuracies between 96.5% and 99.26% were evaluated. Despite achieving the highest training accuracy, the 99.26% model did not exhibit the best predictive quality. Instead, the 98.4% model achieved superior performance, showing lower MSE (0.020 vs. 0.029 mGy²/MBq·s²), higher PSNR (43.25 vs. 41.35 dB), and a markedly lower ERGAS (10.96 vs. 15.75). SSIM values were consistently high (>0.99) across all models, with no statistically significant differences (p > 0.05), indicating comparable structural fidelity.
Conclusion: Training accuracy alone does not reliably reflect GAN performance. Image-based similarity and error metrics provide a more comprehensive and discriminative evaluation of 177Lu DVK prediction quality.
Keywords
Main Subjects