Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions
The Journal of Pattern Recognition Research (JPRR) provides an international forum for the electronic publication of high-quality research and industrial experience articles in all areas of pattern recognition, machine learning, and artificial intelligence. JPRR is committed to rigorous yet rapid reviewing. Final versions are published electronically
(ISSN 1558-884X) immediately upon acceptance.
Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions
Ruben Buaba, Abdollah Homaifar, William Hendrix, Seung Woo Son, Wei-keng Liao, Alok Choudhary
JPRR Vol 9, No 1 (2014); doi:10.13176/11.599 
Download
Ruben Buaba, Abdollah Homaifar, William Hendrix, Seung Woo Son, Wei-keng Liao, Alok Choudhary
Abstract
A randomized algorithm employs a degree of randomness as part of its logic with uniform random bits as an auxiliary input to guide its behavior, in the hope of achieving good average runtime performance over all possible choices of the random bits. In this paper, we formulate a randomized algorithm capable of nding approximate nearest neighbors, speci cally in high dimensional datasets. The random bits of this algorithm are sets of kernels chosen from the Gaussian normal distribution, which are used to create a data structure that guarantees sublinear runtime complexity and retrieval accuracy. The algorithm focuses on selecting the optimal cardinalities of the kernels and their members. These cardinalities influence computational, memory and runtime complexities of the algorithm. We demonstrate that using the cardinality of the kernels to determine the kernel size guarantees the highest provable lower bound of the probability of collision of related items; thus accurate retrieval of nearest neighbors. When implemented on Cray XE6 ma- chine using 76 processes, our algorithm achieves speedup of 69 and approximately 48x performance gain in average query runtime with the retrieval accuracy of 90.5%.
JPRR Vol 9, No 1 (2014); doi:10.13176/11.599 | Full Text  | Share this paper: