赞
踩
This is part of my general survey on LSH in CDN class.
NN search
Given a set P of n points, design an algorithm that, given any point q, returns a point in P that is closest to q (its “nearest neighbor” in P).
ANN search
An approximate nearest neighbor search problem is to find the point, whose distance from the query is at most c times the distance from the query to its nearest points. Its formal definition is denoted as followed:
Given a set P and a point q, find a point pi in P so that
d ( p i , q ) ≤ c × m i n p j d ( p j , q ) {d(p_i, q) \leq c \times min_{p_j}d(p_j, q)} d(pi,q)≤c×minpjd(pj,q)
The appealing point of this approach is that it is almost as good as the exact one in many cases. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter.
General approaches to ANN problem are mainly two kinds, tree-based approach and LSH.
LSH
LSH refers to a family of functions (LSH families) to hash data points into buckets so that data points near each other are located in the same buckets with high probability, while data points far from each other are likely to be in different buckets. This makes it easier to identify observations in high dimensional space. The formal definition is:
A family of hashing functions is ( r , c r , p 1 , p 2 ) − L S H {(r,cr,p_1,p_2)-LSH} (r,cr,p
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。