In this thesis, RDH (Randomized Distributed Hashing) method which is developed for fast
and accurate image search on large scale image databases is presented. ANN (Approximate
Nearest Neighbor) approaches are usually used to find the nearest samples to the queried
images in large scale image databases. In these methods approximate nearest samples are
found instead of finding the real nearest samples. Using these methods, which are often
implemented by hashing methods, can significantly reduce the query time. ANN search
methods are generally applied in centralized manner. However in real-world applications,
data are often stored in a distributed manner. This situation requires to implement ANN
search methods in a distributed manner. For this purpose in our proposed approach, LSH
(Locality Sensitive Hashing) method is applied in a distributed way. Data are distributed to
different nodes within a cluster, and then the data are hashed on each node using the same
hash function set. In query phase, the query instance is searched locally on each node. By
exploiting from parallelism, the query time is significantly decreased. In the experimental
studies, we have a speed up of 10 for the query performance in the distributed scheme with
10 nodes. The level of MAP (Mean Average Precision) scores that are used to evaluate
system performance are quite high which are comparable to other methods in literature. We
have also investigated the usage of different and selected randomized hash functions in
different nodes rather than using same indexing. By this way the distributed usages of LSH
are scrutinized. We create selected hash functions according to their data division property
before indexing. Since LSH is data independent method, we have obtained similar results
with using same hash functions. We compared our experimental results with state-of-the-art
methods given in a recent study. The proposed distributed scheme is promising for searching
images in large datasets with multiple nodes. |