当前位置:   article > 正文

CDN笔记二 Locality Sensitive Hashing算法 续_broder. on the resemblance and containment of docu

broder. on the resemblance and containment of documents.

This is part of my general survey on LSH in CDN class.

NN search
Given a set P of n points, design an algorithm that, given any point q, returns a point in P that is closest to q (its “nearest neighbor” in P).

ANN search
An approximate nearest neighbor search problem is to find the point, whose distance from the query is at most c times the distance from the query to its nearest points. Its formal definition is denoted as followed:
Given a set P and a point q, find a point pi in P so that
d ( p i , q ) ≤ c × m i n p j d ( p j , q ) {d(p_i, q) \leq c \times min_{p_j}d(p_j, q)} d(pi,q)c×minpjd(pj,q)

The appealing point of this approach is that it is almost as good as the exact one in many cases. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter.

General approaches to ANN problem are mainly two kinds, tree-based approach and LSH.

LSH
LSH refers to a family of functions (LSH families) to hash data points into buckets so that data points near each other are located in the same buckets with high probability, while data points far from each other are likely to be in different buckets. This makes it easier to identify observations in high dimensional space. The formal definition is:
A family of hashing functions is ( r , c r , p 1 , p 2 ) − L S H {(r,cr,p_1,p_2)-LSH} (r,cr,p

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/代码探险家/article/detail/975901
推荐阅读
相关标签
  

闽ICP备14008679号