Group-Based Sample Partitioning kNN: A Computationally Efficient kNN Algorithm for Resource-Constrained Environments

Dalloo, Ayad M.; Humaidi, Amjad J,

doi:https://doi.org/10.33103/uot.ijccce.25.2.2

	Group-Based Sample Partitioning kNN: A Computationally Efficient kNN Algorithm for Resource-Constrained Environments
IRAQI JOURNAL OF COMPUTERS, COMMUNICATIONS, CONTROL AND SYSTEMS ENGINEERING
Volume 25, Issue 2, October 2025, Pages 16-32 PDF (987.98 K)
Document Type: Research Paper
DOI: https://doi.org/10.33103/uot.ijccce.25.2.2
Authors
Ayad M. Dalloo^* ¹; Amjad J, Humaidi²
¹university of Technology/ Department of Communication Engineering
²Control and Systems Engineering Department, University of Technology, Baghdad 10001, Iraq
Abstract
The k-Nearest Neighbors (kNN) algorithm is widely used for classification due to its simplicity and effectiveness. However, its computational cost remains a significant challenge, particularly for embedded systems with limited processing power and memory. To address this issue, we propose the Group-Based Sample Partitioning (k²NN) Algorithm, which introduces a two-phase approach to reduce computational complexity while maintaining classification accuracy. In the first phase, the algorithm pre-groups training samples by iteratively selecting anchor points and partitioning their k-nearest neighbors, thereby reducing redundancy in the dataset. In the second phase, the test sample dynamically selects local anchor points, constructing a smaller, more relevant neighborhood for efficient classification. Experimental results using the Breast Cancer Dataset from Kaggle (KGBC) demonstrate that k²NN significantly reduces training and testing iterations while preserving high classification accuracy (95.78%), with a recall of 100%. Compared to exhaustive kNN, our approach achieves a substantial reduction in distance computations (21.79% of exhaustive kNN) without requiring additional storage. While tested on a relatively small dataset, k²NN shows promise for scalable implementation in embedded systems. Also, the proposed approach shows the computation cost reduction can reach 75.5% for larger datasets when we tested different datasets ranging from 100 to 30,000 samples. Future work will explore an extended kⁿNN framework, introducing multiple k-parameters for adaptive scaling to high-dimensional datasets while maintaining computational efficiency. https://github.com/AyadMDalloo/K2NN.
Keywords
k-nearest neighbors; Data-level Approximate Computing; Approximate; Embedded System

Statistics Article View: 70 PDF Download: 83