[1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9351, pp. 234–241, May 2015, doi: 10.1007/978-3-319-24574-4_28.
[2] U. T. Salim, F. H. Ali, and S. A. Dawwd, “U-Net Convolutional Networks Performance Based on Software-Hardware Cooperation Parameters : A Review,” International Journal of Computing and Digital System, vol. 11, no. 1, 2022.
[3] Y. Oyama, N. Maruyama, N. Dryden, E. McCarthy, P.Harrington, , J. Balewski, S. Matsuoka, P. Nugent, and B. Van Essen,“The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 7, pp. 1641–1652, 2021, doi: 10.1109/TPDS.2020.3047974.
[4] D. Pati, C. Favart, P. Bahl, V. Soni, Y. C. Tsai, , M. Potter, J. Guan, X. Dong, and V. R. Saripalli, “Impact of Inference Accelerators on hardware selection,” pp. 1–5, 2019, [Online]. Available: http://arxiv.org/abs/1910.03060.
[5] L. Hou, Y. Cheng, N. Shazeer, N. Parmar, Y. Li, P. Korfiatis, T. M. Drucker, D. J. Blezek, and X. Song, “High Resolution Medical Image Analysis with Spatial Partitioning,” pp. 15–19.
[6] J. Civit-masot, F. Luna-perejón, S. Vicente-díaz, J. María, R. Corral, and A. Civit, “TPU Cloud-Based Generalized U-Net for Eye Fundus Image Segmentation,” vol. 7, pp. 142379–142387, 2019, doi: 10.1109/ACCESS.2019.2944692.
[7] S. Liu and W. Luk, “Towards an efficient accelerator for DNN-based remote sensing image segmentation on FPGAs,” Proc. - 29th Int. Conf. Field-Programmable Log. Appl. FPL 2019, pp. 187–193, 2019, doi: 10.1109/FPL.2019.00037.
[8] S. Liu, H. Fan, X. Niu, H. Ng, and Y. Chu, “Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA,” vol. 1, no. 1, 2018.
[9] B. K. Joardar, N. K. Jayakodi, J. R. Doppa, H. Li, P. P. Pande, and K. Chakrabarty, “GRAMARCH: A GPU-ReRAM based heterogeneous architecture for neural image segmentation,” in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020, pp. 228–233.
[10] D. Ojika, B. Patel, G. A. Reina, T. Boyer, C. Martin, and P. Shah, “Addressing the Memory Bottleneck in AI Model Training,” pp. 3–5, 2020, [Online]. Available: http://arxiv.org/abs/2003.08732.
[11] H. Imai, S. Matzek, T. D. Le, and Y. Negishi, “Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method,” pp. 1–13.
[12] B. Niepceron, A. Nait-sidi-moh, F. Grassia, B. Niepceron, A. Nait-sidi-moh, and F. Grassia, “Moving Medical Image Analysis to GPU Embedded Systems : Application to Brain Tumor Segmentation Moving Medical Image Analysis to GPU Embedded Systems : Application to Brain Tumor Segmentation,” 2020, doi: 10.1080/08839514.2020.1787678.
[13] N. Beheshti, “Squeeze U-Net : A Memory and Energy Efficient Image Segmentation Network.”
[14] S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, 2009, doi: 10.1145/1498765.1498785.
[15] B. Da Silva, A. Braeken, E. H. D’Hollander, and A. Touhafi, “Performance and resource modeling for FPGAs using high-level synthesis tools,” Adv. Parallel Comput., vol. 25, pp. 523–531, 2014, doi: 10.3233/978-1-61499-381-0-523.
[16] A. Ilic, F. Pratas, and L. Sousa, “Beyond the roofline: Cache-aware power and energy-efficiency modeling for multi-cores,” IEEE Trans. Comput., vol. 66, no. 1, pp. 52–58, 2017, doi: 10.1109/TC.2016.2582151.
[17] J. Kwack, T. Applencourt, C. Bertoni, Y. Ghadar, H. Zheng, C. Knight, and S. Parker, “Roofline-based performance efficiency of hpc benchmarks and applications on current generation of processor architectures,” in 2019 Cray User Group Meeting, 2019, vol. 5.
[18] M. Hill and V. Janapa Reddi, “Gables: A roofline model for mobile SoCs,” Proc. - 25th IEEE Int. Symp. High Perform. Comput. Archit. HPCA 2019, pp. 317–330, 2019, doi: 10.1109/HPCA.2019.00047.
[19] C. Yang and L. Berkeley, “Hierarchical Roofline Analysis on GPUs,” 2020.
[20] N. K. Jha and S. Mittal, “Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance,” IEEE Trans. Comput., vol. 70, no. 9, pp. 1526–1538, 2021, doi: 10.1109/TC.2020.3015531.
[21] NVIDIA Corporation, CUDA C Programming Guide - Version 4.2. 2012.
[22] B. Van Werkhoven, J. Maassen, H. E. Bal, and F. J. Seinstra, “Optimizing convolution operations on GPUs using adaptive tiling,” Futur. Gener. Comput. Syst., vol. 30, no. 1, pp. 14–26, 2014, doi: 10.1016/j.future.2013.09.003.