当前位置:   article > 正文

数据挖掘笔记-特征选择-算法实现-1_数据挖掘特征选择代码

数据挖掘特征选择代码

关于特征选择相关的知识可以参考一下连接

数据挖掘笔记-特征选择-开方检验

数据挖掘笔记-特征选择-信息增益

数据挖掘笔记-特征选择-期望交叉熵

数据挖掘笔记-特征选择-互信息

数据挖掘笔记-特征选择-遗传算法

数据挖掘笔记-特征选择-整体汇总

项目源码里面包含Java和Python的实现,这里只列出Python实现:

代码托管:https://github.com/fighting-one-piece/repository-datamining.git

  1. class Doc:
  2. def __init__(self, name):
  3. self._name = name
  4. def setName(self, name):
  5. self._name = name
  6. def getName(self):
  7. return self._name
  8. def setCategory(self, category):
  9. self._category = category
  10. def getCategory(self):
  11. return self._category
  12. def setWords(self, words):
  13. self._words = words
  14. def getWords(self):
  15. return self._words
  16. def setTfidfWords(self, tfidfWords):
  17. self._tfidfWords = tfidfWords
  18. def getTfidfWords(self):
  19. return self._tfidfWords
  20. def getSortedTfidfWords(self):
  21. results = [sorted(self._tfidfWords.items(), key=lambda i : i[1], reverse=True), ]
  22. return results
  23. def setCHIWords(self, chiWords):
  24. self._chiWords = chiWords
  25. def getCHIWords(self):
  26. return self._chiWords
  27. def setSimilarities(self, similarities):
  28. self._similarities = similarities
  29. def getSimilarities(self):
  30. return self._similarities

  1. #文档操作工具类
  2. class DocHelper:
  3. #获取目录下所有的文档
  4. @staticmethod
  5. def genDocs(path):
  6. docs = []
  7. DocHelper.genDocsIterator(path, docs)
  8. return docs
  9. #遍历目录获取目录下所有的文档
  10. @staticmethod
  11. def genDocsIterator(path, docs):
  12. if os.path.isdir(path):
  13. for subPathName in os.listdir(path):
  14. subPath = os.path.join(path, subPathName)
  15. DocHelper.genDocsIterator(subPath, docs)
  16. else:
  17. name = path[path.rfind('\\') + 1 : path.rfind('.')]
  18. doc = Doc(name)
  19. doc.setCategory(path.split('\\')[-2])
  20. doc.setWords(WordUtils.splitFile(path));
  21. docs.append(doc)
  22. #文档中是否包含指定词
  23. @staticmethod
  24. def docHasWord(doc, word):
  25. for dword in doc.getWords():
  26. if dword == word:
  27. return True
  28. return False
  29. #文档中词频统计
  30. @staticmethod
  31. def docWordsStatistics(doc):
  32. map = {}
  33. for word in doc.getWords():
  34. count = map.get(word)
  35. if count is None:
  36. count = 0
  37. map[word] = co
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/239101
推荐阅读
相关标签
  

闽ICP备14008679号