赞
踩
文本相似度计算
例如:
300030280004 油漆笔-[规格:红色,斑马]
300030280010 油漆笔-[规格:红色,斑马]
数据存到excel文档中
例如
python实现
贴出部分算法代码
# 杰卡德相似度计算
def similarity_jacard(set1, set2): """ 杰卡德相似度,相同则为1 similarity=len(A and B)/len(A or B) 0<=similarity<=1 :param set1: set :param set2: set :return: float """ similarity=0 try: if len(set1 | set2)==0: return similarity similarity=float(len(set1 & set2) / len(set1 | set2)) except: print("division by error") return similarity
#最小编辑距离
def minDistance(word1, word2): """ 最小编辑距离(Levenshtein)实现 :param word1: str :param word2: str :return: int 返回两个字符串的距离 """ if not word1: return len(word2 or '') or 0 if not word2: return len(word1 or '') or 0 size1 = len(word1) size2 = len(word2) last = 0 tmp = [i for i in range(size2 + 1)] # print(tmp) value = None for i in range(size1): tmp[0] = i + 1 last = i # print word1[i], last, tmp for j in range(size2): if word1[i] == word2[j]: value = last else: value = 1 + min(last, tmp[j], tmp[j + 1]) # print(last, tmp[j], tmp[j + 1], value) last = tmp[j+1] tmp[j+1] = value # print(tmp) return value
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。