当前位置:   article > 正文

Django个人博客搭建教程---haystack+whoosh+jieba中文分词搜索实践_haystack.exceptions.searchfielderror: all 'searchi

haystack.exceptions.searchfielderror: all 'searchindex' classes must use the

写在最前面:

 

舔狗要耐得住寂寞

 

一个博客网站怎么可以没有全文检索呢?之前由于时间紧,一直心心念念做个完整的搜索没有实现,只用了数据库简单查询做了一下标题的搜索,今天记录下完整的实现过程。

首先安装包:

  1. pip install django-haystack
  2. pip install jieba
  3. pip install whoosh

注意,不要去

pip install haystack

不然到时候新建索引的时候一定会报错如下:

  1. from haystack import connections
  2. ImportError: cannot import name connections

然后是在settings.py中加入:

  1. INSTALLED_APPS = [
  2. 、、、
  3. 'haystack', #注册 haystack
  4. 、、、
  5. ]
  1. HAYSTACK_CONNECTIONS = {
  2. 'default': {
  3. 'ENGINE': 'haystack.backends.whoosh_cn_backend.WhooshEngine',
  4. 'PATH': os.path.join(BASE_DIR, 'whoosh_index'),
  5. }
  6. } # 每页显示搜索结果数目为10
  7. HAYSTACK_SEARCH_RESULTS_PER_PAGE = 10
  8. # 自动生成索引
  9. HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

在主urls中加入:

  1. urlpatterns = [
  2. url(r'^$', views.blog_index),
  3. url(r'^baidu_verify_iYpMqoGJf4.html/$', views.baiduyz),
  4. path('admin/', admin.site.urls),
  5. path('Blog/', include('Blog.urls',namespace="Blog")),
  6. path('JiaBlog/', include('JiaBlog.urls', namespace="JiaBlog")),
  7. path('mdeditor/', include('mdeditor.urls')),
  8. url(r'', include('social_django.urls', namespace='social')),
  9. url(r'^search/', include('haystack.urls')),# 加入这个,剩下的是我别的地方用的
  10. #path(r'^media/(?P<path>.*)$', serve, {"document_root": MEDIA_ROOT}),
  11. ]+static(settings.MEDIA_URL,document_root=settings.MEDIA_ROOT)

在你的子应用中加入search_indexes.py这个文件

  1. from haystack import indexes
  2. # 修改此处,为你自己的model
  3. from JiaBlog.models import Articles
  4. # 修改此处,类名为模型类的名称+Index,比如模型类为Articles,则这里类名为ArticlesIndex
  5. class ArticlesIndex(indexes.SearchIndex, indexes.Indexable):
  6. text = indexes.CharField(document=True, use_template=True)
  7. def get_model(self):
  8. # 修改此处,为你自己的model
  9. return Articles
  10. def index_queryset(self, using=None):
  11. return self.get_model().objects.all()

找到你python环境下的安装包haystack所在的位置,打开backend文件夹,复制whoosh_backend.py文件,复制一份,名为whoosh_cn_backend.py,然后打开此文件,把下面代码全部复制进行替换:

  1. # encoding: utf-8
  2. from __future__ import absolute_import, division, print_function, unicode_literals
  3. import json
  4. import os
  5. import re
  6. import shutil
  7. import threading
  8. import warnings
  9. from django.conf import settings
  10. from django.core.exceptions import ImproperlyConfigured
  11. from django.utils import six
  12. from django.utils.datetime_safe import datetime
  13. from django.utils.encoding import force_text
  14. from haystack.backends import BaseEngine, BaseSearchBackend, BaseSearchQuery, EmptyResults, log_query
  15. from haystack.constants import DJANGO_CT, DJANGO_ID, ID
  16. from haystack.exceptions import MissingDependency, SearchBackendError, SkipDocument
  17. from haystack.inputs import Clean, Exact, PythonData, Raw
  18. from haystack.models import SearchResult
  19. from haystack.utils import log as logging
  20. from haystack.utils import get_identifier, get_model_ct
  21. from haystack.utils.app_loading import haystack_get_model
  22. from jieba.analyse import ChineseAnalyzer
  23. try:
  24. import whoosh
  25. except ImportError:
  26. raise MissingDependency("The 'whoosh' backend requires the installation of 'Whoosh'. Please refer to the documentation.")
  27. # Handle minimum requirement.
  28. if not hasattr(whoosh, '__version__') or whoosh.__version__ < (2, 5, 0):
  29. raise MissingDependency("The 'whoosh' backend requires version 2.5.0 or greater.")
  30. # Bubble up the correct error.
  31. from whoosh import index
  32. from whoosh.analysis import StemmingAnalyzer
  33. from whoosh.fields import ID as WHOOSH_ID
  34. from whoosh.fields import BOOLEAN, DATETIME, IDLIST, KEYWORD, NGRAM, NGRAMWORDS, NUMERIC, Schema, TEXT
  35. from whoosh.filedb.filestore import FileStorage, RamStorage
  36. from whoosh.highlight import highlight as whoosh_highlight
  37. from whoosh.highlight import ContextFragmenter, HtmlFormatter
  38. from whoosh.qparser import QueryParser
  39. from whoosh.searching import ResultsPage
  40. from whoosh.writing import AsyncWriter
  41. DATETIME_REGEX = re.compile('^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})T(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})(\.\d{3,6}Z?)?$')
  42. LOCALS = threading.local()
  43. LOCALS.RAM_STORE = None
  44. class WhooshHtmlFormatter(HtmlFormatter):
  45. """
  46. This is a HtmlFormatter simpler than the whoosh.HtmlFormatter.
  47. We use it to have consistent results across backends. Specifically,
  48. Solr, Xapian and Elasticsearch are using this formatting.
  49. """
  50. template = '<%(tag)s>%(t)s</%(tag)s>'
  51. class WhooshSearchBackend(BaseSearchBackend):
  52. # Word reserved by Whoosh for special use.
  53. RESERVED_WORDS = (
  54. 'AND',
  55. 'NOT',
  56. 'OR',
  57. 'TO',
  58. )
  59. # Characters reserved by Whoosh for special use.
  60. # The '\\' must come first, so as not to overwrite the other slash replacements.
  61. RESERVED_CHARACTERS = (
  62. '\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
  63. '[', ']', '^', '"', '~', '*', '?', ':', '.',
  64. )
  65. def __init__(self, connection_alias, **connection_options):
  66. super(WhooshSearchBackend, self).__init__(connection_alias, **connection_options)
  67. self.setup_complete = False
  68. self.use_file_storage = True
  69. self.post_limit = getattr(connection_options, 'POST_LIMIT', 128 * 1024 * 1024)
  70. self.path = connection_options.get('PATH')
  71. if connection_options.get('STORAGE', 'file') != 'file':
  72. self.use_file_storage = False
  73. if self.use_file_storage and not self.path:
  74. raise ImproperlyConfigured("You must specify a 'PATH' in your settings for connection '%s'." % connection_alias)
  75. self.log = logging.getLogger('haystack')
  76. def setup(self):
  77. """
  78. Defers loading until needed.
  79. """
  80. from haystack import connections
  81. new_index = False
  82. # Make sure the index is there.
  83. if self.use_file_storage and not os.path.exists(self.path):
  84. os.makedirs(self.path)
  85. new_index = True
  86. if self.use_file_storage and not os.access(self.path, os.W_OK):
  87. raise IOError("The path to your Whoosh index '%s' is not writable for the current user/group." % self.path)
  88. if self.use_file_storage:
  89. self.storage = FileStorage(self.path)
  90. else:
  91. global LOCALS
  92. if getattr(LOCALS, 'RAM_STORE', None) is None:
  93. LOCALS.RAM_STORE = RamStorage()
  94. self.storage = LOCALS.RAM_STORE
  95. self.content_field_name, self.schema = self.build_schema(connections[self.connection_alias].get_unified_index().all_searchfields())
  96. self.parser = QueryParser(self.content_field_name, schema=self.schema)
  97. if new_index is True:
  98. self.index = self.storage.create_index(self.schema)
  99. else:
  100. try:
  101. self.index = self.storage.open_index(schema=self.schema)
  102. except index.EmptyIndexError:
  103. self.index = self.storage.create_index(self.schema)
  104. self.setup_complete = True
  105. def build_schema(self, fields):
  106. schema_fields = {
  107. ID: WHOOSH_ID(stored=True, unique=True),
  108. DJANGO_CT: WHOOSH_ID(stored=True),
  109. DJANGO_ID: WHOOSH_ID(stored=True),
  110. }
  111. # Grab the number of keys that are hard-coded into Haystack.
  112. # We'll use this to (possibly) fail slightly more gracefully later.
  113. initial_key_count = len(schema_fields)
  114. content_field_name = ''
  115. for field_name, field_class in fields.items():
  116. if field_class.is_multivalued:
  117. if field_class.indexed is False:
  118. schema_fields[field_class.index_fieldname] = IDLIST(stored=True, field_boost=field_class.boost)
  119. else:
  120. schema_fields[field_class.index_fieldname] = KEYWORD(stored=True, commas=True, scorable=True, field_boost=field_class.boost)
  121. elif field_class.field_type in ['date', 'datetime']:
  122. schema_fields[field_class.index_fieldname] = DATETIME(stored=field_class.stored, sortable=True)
  123. elif field_class.field_type == 'integer':
  124. schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=int, field_boost=field_class.boost)
  125. elif field_class.field_type == 'float':
  126. schema_fields[field_class.index_fieldname] = NUMERIC(stored=field_class.stored, numtype=float, field_boost=field_class.boost)
  127. elif field_class.field_type == 'boolean':
  128. # Field boost isn't supported on BOOLEAN as of 1.8.2.
  129. schema_fields[field_class.index_fieldname] = BOOLEAN(stored=field_class.stored)
  130. elif field_class.field_type == 'ngram':
  131. schema_fields[field_class.index_fieldname] = NGRAM(minsize=3, maxsize=15, stored=field_class.stored, field_boost=field_class.boost)
  132. elif field_class.field_type == 'edge_ngram':
  133. schema_fields[field_class.index_fieldname] = NGRAMWORDS(minsize=2, maxsize=15, at='start', stored=field_class.stored, field_boost=field_class.boost)
  134. else:
  135. schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyzer(), field_boost=field_class.boost, sortable=True)
  136. if field_class.document is True:
  137. content_field_name = field_class.index_fieldname
  138. schema_fields[field_class.index_fieldname].spelling = True
  139. # Fail more gracefully than relying on the backend to die if no fields
  140. # are found.
  141. if len(schema_fields) <= initial_key_count:
  142. raise SearchBackendError("No fields were found in any search_indexes. Please correct this before attempting to search.")
  143. return (content_field_name, Schema(**schema_fields))
  144. def update(self, index, iterable, commit=True):
  145. if not self.setup_complete:
  146. self.setup()
  147. self.index = self.index.refresh()
  148. writer = AsyncWriter(self.index)
  149. for obj in iterable:
  150. try:
  151. doc = index.full_prepare(obj)
  152. except SkipDocument:
  153. self.log.debug(u"Indexing for object `%s` skipped", obj)
  154. else:
  155. # Really make sure it's unicode, because Whoosh won't have it any
  156. # other way.
  157. for key in doc:
  158. doc[key] = self._from_python(doc[key])
  159. # Document boosts aren't supported in Whoosh 2.5.0+.
  160. if 'boost' in doc:
  161. del doc['boost']
  162. try:
  163. writer.update_document(**doc)
  164. except Exception as e:
  165. if not self.silently_fail:
  166. raise
  167. # We'll log the object identifier but won't include the actual object
  168. # to avoid the possibility of that generating encoding errors while
  169. # processing the log message:
  170. self.log.error(u"%s while preparing object for update" % e.__class__.__name__,
  171. exc_info=True, extra={"data": {"index": index,
  172. "object": get_identifier(obj)}})
  173. if len(iterable) > 0:
  174. # For now, commit no matter what, as we run into locking issues otherwise.
  175. writer.commit()
  176. def remove(self, obj_or_string, commit=True):
  177. if not self.setup_complete:
  178. self.setup()
  179. self.index = self.index.refresh()
  180. whoosh_id = get_identifier(obj_or_string)
  181. try:
  182. self.index.delete_by_query(q=self.parser.parse(u'%s:"%s"' % (ID, whoosh_id)))
  183. except Exception as e:
  184. if not self.silently_fail:
  185. raise
  186. self.log.error("Failed to remove document '%s' from Whoosh: %s", whoosh_id, e, exc_info=True)
  187. def clear(self, models=None, commit=True):
  188. if not self.setup_complete:
  189. self.setup()
  190. self.index = self.index.refresh()
  191. if models is not None:
  192. assert isinstance(models, (list, tuple))
  193. try:
  194. if models is None:
  195. self.delete_index()
  196. else:
  197. models_to_delete = []
  198. for model in models:
  199. models_to_delete.append(u"%s:%s" % (DJANGO_CT, get_model_ct(model)))
  200. self.index.delete_by_query(q=self.parser.parse(u" OR ".join(models_to_delete)))
  201. except Exception as e:
  202. if not self.silently_fail:
  203. raise
  204. if models is not None:
  205. self.log.error("Failed to clear Whoosh index of models '%s': %s", ','.join(models_to_delete),
  206. e, exc_info=True)
  207. else:
  208. self.log.error("Failed to clear Whoosh index: %s", e, exc_info=True)
  209. def delete_index(self):
  210. # Per the Whoosh mailing list, if wiping out everything from the index,
  211. # it's much more efficient to simply delete the index files.
  212. if self.use_file_storage and os.path.exists(self.path):
  213. shutil.rmtree(self.path)
  214. elif not self.use_file_storage:
  215. self.storage.clean()
  216. # Recreate everything.
  217. self.setup()
  218. def optimize(self):
  219. if not self.setup_complete:
  220. self.setup()
  221. self.index = self.index.refresh()
  222. self.index.optimize()
  223. def calculate_page(self, start_offset=0, end_offset=None):
  224. # Prevent against Whoosh throwing an error. Requires an end_offset
  225. # greater than 0.
  226. if end_offset is not None and end_offset <= 0:
  227. end_offset = 1
  228. # Determine the page.
  229. page_num = 0
  230. if end_offset is None:
  231. end_offset = 1000000
  232. if start_offset is None:
  233. start_offset = 0
  234. page_length = end_offset - start_offset
  235. if page_length and page_length > 0:
  236. page_num = int(start_offset / page_length)
  237. # Increment because Whoosh uses 1-based page numbers.
  238. page_num += 1
  239. return page_num, page_length
  240. @log_query
  241. def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
  242. fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
  243. narrow_queries=None, spelling_query=None, within=None,
  244. dwithin=None, distance_point=None, models=None,
  245. limit_to_registered_models=None, result_class=None, **kwargs):
  246. if not self.setup_complete:
  247. self.setup()
  248. # A zero length query should return no results.
  249. if len(query_string) == 0:
  250. return {
  251. 'results': [],
  252. 'hits': 0,
  253. }
  254. query_string = force_text(query_string)
  255. # A one-character query (non-wildcard) gets nabbed by a stopwords
  256. # filter and should yield zero results.
  257. if len(query_string) <= 1 and query_string != u'*':
  258. return {
  259. 'results': [],
  260. 'hits': 0,
  261. }
  262. reverse = False
  263. if sort_by is not None:
  264. # Determine if we need to reverse the results and if Whoosh can
  265. # handle what it's being asked to sort by. Reversing is an
  266. # all-or-nothing action, unfortunately.
  267. sort_by_list = []
  268. reverse_counter = 0
  269. for order_by in sort_by:
  270. if order_by.startswith('-'):
  271. reverse_counter += 1
  272. if reverse_counter and reverse_counter != len(sort_by):
  273. raise SearchBackendError("Whoosh requires all order_by fields"
  274. " to use the same sort direction")
  275. for order_by in sort_by:
  276. if order_by.startswith('-'):
  277. sort_by_list.append(order_by[1:])
  278. if len(sort_by_list) == 1:
  279. reverse = True
  280. else:
  281. sort_by_list.append(order_by)
  282. if len(sort_by_list) == 1:
  283. reverse = False
  284. sort_by = sort_by_list
  285. if facets is not None:
  286. warnings.warn("Whoosh does not handle faceting.", Warning, stacklevel=2)
  287. if date_facets is not None:
  288. warnings.warn("Whoosh does not handle date faceting.", Warning, stacklevel=2)
  289. if query_facets is not None:
  290. warnings.warn("Whoosh does not handle query faceting.", Warning, stacklevel=2)
  291. narrowed_results = None
  292. self.index = self.index.refresh()
  293. if limit_to_registered_models is None:
  294. limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
  295. if models and len(models):
  296. model_choices = sorted(get_model_ct(model) for model in models)
  297. elif limit_to_registered_models:
  298. # Using narrow queries, limit the results to only models handled
  299. # with the current routers.
  300. model_choices = self.build_models_list()
  301. else:
  302. model_choices = []
  303. if len(model_choices) > 0:
  304. if narrow_queries is None:
  305. narrow_queries = set()
  306. narrow_queries.add(' OR '.join(['%s:%s' % (DJANGO_CT, rm) for rm in model_choices]))
  307. narrow_searcher = None
  308. if narrow_queries is not None:
  309. # Potentially expensive? I don't see another way to do it in Whoosh...
  310. narrow_searcher = self.index.searcher()
  311. for nq in narrow_queries:
  312. recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),
  313. limit=None)
  314. if len(recent_narrowed_results) <= 0:
  315. return {
  316. 'results': [],
  317. 'hits': 0,
  318. }
  319. if narrowed_results:
  320. narrowed_results.filter(recent_narrowed_results)
  321. else:
  322. narrowed_results = recent_narrowed_results
  323. self.index = self.index.refresh()
  324. if self.index.doc_count():
  325. searcher = self.index.searcher()
  326. parsed_query = self.parser.parse(query_string)
  327. # In the event of an invalid/stopworded query, recover gracefully.
  328. if parsed_query is None:
  329. return {
  330. 'results': [],
  331. 'hits': 0,
  332. }
  333. page_num, page_length = self.calculate_page(start_offset, end_offset)
  334. search_kwargs = {
  335. 'pagelen': page_length,
  336. 'sortedby': sort_by,
  337. 'reverse': reverse,
  338. }
  339. # Handle the case where the results have been narrowed.
  340. if narrowed_results is not None:
  341. search_kwargs['filter'] = narrowed_results
  342. try:
  343. raw_page = searcher.search_page(
  344. parsed_query,
  345. page_num,
  346. **search_kwargs
  347. )
  348. except ValueError:
  349. if not self.silently_fail:
  350. raise
  351. return {
  352. 'results': [],
  353. 'hits': 0,
  354. 'spelling_suggestion': None,
  355. }
  356. # Because as of Whoosh 2.5.1, it will return the wrong page of
  357. # results if you request something too high. :(
  358. if raw_page.pagenum < page_num:
  359. return {
  360. 'results': [],
  361. 'hits': 0,
  362. 'spelling_suggestion': None,
  363. }
  364. results = self._process_results(raw_page, highlight=highlight, query_string=query_string, spelling_query=spelling_query, result_class=result_class)
  365. searcher.close()
  366. if hasattr(narrow_searcher, 'close'):
  367. narrow_searcher.close()
  368. return results
  369. else:
  370. if self.include_spelling:
  371. if spelling_query:
  372. spelling_suggestion = self.create_spelling_suggestion(spelling_query)
  373. else:
  374. spelling_suggestion = self.create_spelling_suggestion(query_string)
  375. else:
  376. spelling_suggestion = None
  377. return {
  378. 'results': [],
  379. 'hits': 0,
  380. 'spelling_suggestion': spelling_suggestion,
  381. }
  382. def more_like_this(self, model_instance, additional_query_string=None,
  383. start_offset=0, end_offset=None, models=None,
  384. limit_to_registered_models=None, result_class=None, **kwargs):
  385. if not self.setup_complete:
  386. self.setup()
  387. field_name = self.content_field_name
  388. narrow_queries = set()
  389. narrowed_results = None
  390. self.index = self.index.refresh()
  391. if limit_to_registered_models is None:
  392. limit_to_registered_models = getattr(settings, 'HAYSTACK_LIMIT_TO_REGISTERED_MODELS', True)
  393. if models and len(models):
  394. model_choices = sorted(get_model_ct(model) for model in models)
  395. elif limit_to_registered_models:
  396. # Using narrow queries, limit the results to only models handled
  397. # with the current routers.
  398. model_choices = self.build_models_list()
  399. else:
  400. model_choices = []
  401. if len(model_choices) > 0:
  402. if narrow_queries is None:
  403. narrow_queries = set()
  404. narrow_queries.add(' OR '.join(['%s:%s' % (DJANGO_CT, rm) for rm in model_choices]))
  405. if additional_query_string and additional_query_string != '*':
  406. narrow_queries.add(additional_query_string)
  407. narrow_searcher = None
  408. if narrow_queries is not None:
  409. # Potentially expensive? I don't see another way to do it in Whoosh...
  410. narrow_searcher = self.index.searcher()
  411. for nq in narrow_queries:
  412. recent_narrowed_results = narrow_searcher.search(self.parser.parse(force_text(nq)),
  413. limit=None)
  414. if len(recent_narrowed_results) <= 0:
  415. return {
  416. 'results': [],
  417. 'hits': 0,
  418. }
  419. if narrowed_results:
  420. narrowed_results.filter(recent_narrowed_results)
  421. else:
  422. narrowed_results = recent_narrowed_results
  423. page_num, page_length = self.calculate_page(start_offset, end_offset)
  424. self.index = self.index.refresh()
  425. raw_results = EmptyResults()
  426. searcher = None
  427. if self.index.doc_count():
  428. query = "%s:%s" % (ID, get_identifier(model_instance))
  429. searcher = self.index.searcher()
  430. parsed_query = self.parser.parse(query)
  431. results = searcher.search(parsed_query)
  432. if len(results):
  433. raw_results = results[0].more_like_this(field_name, top=end_offset)
  434. # Handle the case where the results have been narrowed.
  435. if narrowed_results is not None and hasattr(raw_results, 'filter'):
  436. raw_results.filter(narrowed_results)
  437. try:
  438. raw_page = ResultsPage(raw_results, page_num, page_length)
  439. except ValueError:
  440. if not self.silently_fail:
  441. raise
  442. return {
  443. 'results': [],
  444. 'hits': 0,
  445. 'spelling_suggestion': None,
  446. }
  447. # Because as of Whoosh 2.5.1, it will return the wrong page of
  448. # results if you request something too high. :(
  449. if raw_page.pagenum < page_num:
  450. return {
  451. 'results': [],
  452. 'hits': 0,
  453. 'spelling_suggestion': None,
  454. }
  455. results = self._process_results(raw_page, result_class=result_class)
  456. if searcher:
  457. searcher.close()
  458. if hasattr(narrow_searcher, 'close'):
  459. narrow_searcher.close()
  460. return results
  461. def _process_results(self, raw_page, highlight=False, query_string='', spelling_query=None, result_class=None):
  462. from haystack import connections
  463. results = []
  464. # It's important to grab the hits first before slicing. Otherwise, this
  465. # can cause pagination failures.
  466. hits = len(raw_page)
  467. if result_class is None:
  468. result_class = SearchResult
  469. facets = {}
  470. spelling_suggestion = None
  471. unified_index = connections[self.connection_alias].get_unified_index()
  472. indexed_models = unified_index.get_indexed_models()
  473. for doc_offset, raw_result in enumerate(raw_page):
  474. score = raw_page.score(doc_offset) or 0
  475. app_label, model_name = raw_result[DJANGO_CT].split('.')
  476. additional_fields = {}
  477. model = haystack_get_model(app_label, model_name)
  478. if model and model in indexed_models:
  479. for key, value in raw_result.items():
  480. index = unified_index.get_index(model)
  481. string_key = str(key)
  482. if string_key in index.fields and hasattr(index.fields[string_key], 'convert'):
  483. # Special-cased due to the nature of KEYWORD fields.
  484. if index.fields[string_key].is_multivalued:
  485. if value is None or len(value) is 0:
  486. additional_fields[string_key] = []
  487. else:
  488. additional_fields[string_key] = value.split(',')
  489. else:
  490. additional_fields[string_key] = index.fields[string_key].convert(value)
  491. else:
  492. additional_fields[string_key] = self._to_python(value)
  493. del(additional_fields[DJANGO_CT])
  494. del(additional_fields[DJANGO_ID])
  495. if highlight:
  496. sa = StemmingAnalyzer()
  497. formatter = WhooshHtmlFormatter('em')
  498. terms = [token.text for token in sa(query_string)]
  499. whoosh_result = whoosh_highlight(
  500. additional_fields.get(self.content_field_name),
  501. terms,
  502. sa,
  503. ContextFragmenter(),
  504. formatter
  505. )
  506. additional_fields['highlighted'] = {
  507. self.content_field_name: [whoosh_result],
  508. }
  509. result = result_class(app_label, model_name, raw_result[DJANGO_ID], score, **additional_fields)
  510. results.append(result)
  511. else:
  512. hits -= 1
  513. if self.include_spelling:
  514. if spelling_query:
  515. spelling_suggestion = self.create_spelling_suggestion(spelling_query)
  516. else:
  517. spelling_suggestion = self.create_spelling_suggestion(query_string)
  518. return {
  519. 'results': results,
  520. 'hits': hits,
  521. 'facets': facets,
  522. 'spelling_suggestion': spelling_suggestion,
  523. }
  524. def create_spelling_suggestion(self, query_string):
  525. spelling_suggestion = None
  526. reader = self.index.reader()
  527. corrector = reader.corrector(self.content_field_name)
  528. cleaned_query = force_text(query_string)
  529. if not query_string:
  530. return spelling_suggestion
  531. # Clean the string.
  532. for rev_word in self.RESERVED_WORDS:
  533. cleaned_query = cleaned_query.replace(rev_word, '')
  534. for rev_char in self.RESERVED_CHARACTERS:
  535. cleaned_query = cleaned_query.replace(rev_char, '')
  536. # Break it down.
  537. query_words = cleaned_query.split()
  538. suggested_words = []
  539. for word in query_words:
  540. suggestions = corrector.suggest(word, limit=1)
  541. if len(suggestions) > 0:
  542. suggested_words.append(suggestions[0])
  543. spelling_suggestion = ' '.join(suggested_words)
  544. return spelling_suggestion
  545. def _from_python(self, value):
  546. """
  547. Converts Python values to a string for Whoosh.
  548. Code courtesy of pysolr.
  549. """
  550. if hasattr(value, 'strftime'):
  551. if not hasattr(value, 'hour'):
  552. value = datetime(value.year, value.month, value.day, 0, 0, 0)
  553. elif isinstance(value, bool):
  554. if value:
  555. value = 'true'
  556. else:
  557. value = 'false'
  558. elif isinstance(value, (list, tuple)):
  559. value = u','.join([force_text(v) for v in value])
  560. elif isinstance(value, (six.integer_types, float)):
  561. # Leave it alone.
  562. pass
  563. else:
  564. value = force_text(value)
  565. return value
  566. def _to_python(self, value):
  567. """
  568. Converts values from Whoosh to native Python values.
  569. A port of the same method in pysolr, as they deal with data the same way.
  570. """
  571. if value == 'true':
  572. return True
  573. elif value == 'false':
  574. return False
  575. if value and isinstance(value, six.string_types):
  576. possible_datetime = DATETIME_REGEX.search(value)
  577. if possible_datetime:
  578. date_values = possible_datetime.groupdict()
  579. for dk, dv in date_values.items():
  580. date_values[dk] = int(dv)
  581. return datetime(date_values['year'], date_values['month'], date_values['day'], date_values['hour'], date_values['minute'], date_values['second'])
  582. try:
  583. # Attempt to use json to load the values.
  584. converted_value = json.loads(value)
  585. # Try to handle most built-in types.
  586. if isinstance(converted_value, (list, tuple, set, dict, six.integer_types, float, complex)):
  587. return converted_value
  588. except:
  589. # If it fails (SyntaxError or its ilk) or we don't trust it,
  590. # continue on.
  591. pass
  592. return value
  593. class WhooshSearchQuery(BaseSearchQuery):
  594. def _convert_datetime(self, date):
  595. if hasattr(date, 'hour'):
  596. return force_text(date.strftime('%Y%m%d%H%M%S'))
  597. else:
  598. return force_text(date.strftime('%Y%m%d000000'))
  599. def clean(self, query_fragment):
  600. """
  601. Provides a mechanism for sanitizing user input before presenting the
  602. value to the backend.
  603. Whoosh 1.X differs here in that you can no longer use a backslash
  604. to escape reserved characters. Instead, the whole word should be
  605. quoted.
  606. """
  607. words = query_fragment.split()
  608. cleaned_words = []
  609. for word in words:
  610. if word in self.backend.RESERVED_WORDS:
  611. word = word.replace(word, word.lower())
  612. for char in self.backend.RESERVED_CHARACTERS:
  613. if char in word:
  614. word = "'%s'" % word
  615. break
  616. cleaned_words.append(word)
  617. return ' '.join(cleaned_words)
  618. def build_query_fragment(self, field, filter_type, value):
  619. from haystack import connections
  620. query_frag = ''
  621. is_datetime = False
  622. if not hasattr(value, 'input_type_name'):
  623. # Handle when we've got a ``ValuesListQuerySet``...
  624. if hasattr(value, 'values_list'):
  625. value = list(value)
  626. if hasattr(value, 'strftime'):
  627. is_datetime = True
  628. if isinstance(value, six.string_types) and value != ' ':
  629. # It's not an ``InputType``. Assume ``Clean``.
  630. value = Clean(value)
  631. else:
  632. value = PythonData(value)
  633. # Prepare the query using the InputType.
  634. prepared_value = value.prepare(self)
  635. if not isinstance(prepared_value, (set, list, tuple)):
  636. # Then convert whatever we get back to what pysolr wants if needed.
  637. prepared_value = self.backend._from_python(prepared_value)
  638. # 'content' is a special reserved word, much like 'pk' in
  639. # Django's ORM layer. It indicates 'no special field'.
  640. if field == 'content':
  641. index_fieldname = ''
  642. else:
  643. index_fieldname = u'%s:' % connections[self._using].get_unified_index().get_index_fieldname(field)
  644. filter_types = {
  645. 'content': '%s',
  646. 'contains': '*%s*',
  647. 'endswith': "*%s",
  648. 'startswith': "%s*",
  649. 'exact': '%s',
  650. 'gt': "{%s to}",
  651. 'gte': "[%s to]",
  652. 'lt': "{to %s}",
  653. 'lte': "[to %s]",
  654. 'fuzzy': u'%s~',
  655. }
  656. if value.post_process is False:
  657. query_frag = prepared_value
  658. else:
  659. if filter_type in ['content', 'contains', 'startswith', 'endswith', 'fuzzy']:
  660. if value.input_type_name == 'exact':
  661. query_frag = prepared_value
  662. else:
  663. # Iterate over terms & incorportate the converted form of each into the query.
  664. terms = []
  665. if isinstance(prepared_value, six.string_types):
  666. possible_values = prepared_value.split(' ')
  667. else:
  668. if is_datetime is True:
  669. prepared_value = self._convert_datetime(prepared_value)
  670. possible_values = [prepared_value]
  671. for possible_value in possible_values:
  672. terms.append(filter_types[filter_type] % self.backend._from_python(possible_value))
  673. if len(terms) == 1:
  674. query_frag = terms[0]
  675. else:
  676. query_frag = u"(%s)" % " AND ".join(terms)
  677. elif filter_type == 'in':
  678. in_options = []
  679. for possible_value in prepared_value:
  680. is_datetime = False
  681. if hasattr(possible_value, 'strftime'):
  682. is_datetime = True
  683. pv = self.backend._from_python(possible_value)
  684. if is_datetime is True:
  685. pv = self._convert_datetime(pv)
  686. if isinstance(pv, six.string_types) and not is_datetime:
  687. in_options.append('"%s"' % pv)
  688. else:
  689. in_options.append('%s' % pv)
  690. query_frag = "(%s)" % " OR ".join(in_options)
  691. elif filter_type == 'range':
  692. start = self.backend._from_python(prepared_value[0])
  693. end = self.backend._from_python(prepared_value[1])
  694. if hasattr(prepared_value[0], 'strftime'):
  695. start = self._convert_datetime(start)
  696. if hasattr(prepared_value[1], 'strftime'):
  697. end = self._convert_datetime(end)
  698. query_frag = u"[%s to %s]" % (start, end)
  699. elif filter_type == 'exact':
  700. if value.input_type_name == 'exact':
  701. query_frag = prepared_value
  702. else:
  703. prepared_value = Exact(prepared_value).prepare(self)
  704. query_frag = filter_types[filter_type] % prepared_value
  705. else:
  706. if is_datetime is True:
  707. prepared_value = self._convert_datetime(prepared_value)
  708. query_frag = filter_types[filter_type] % prepared_value
  709. if len(query_frag) and not isinstance(value, Raw):
  710. if not query_frag.startswith('(') and not query_frag.endswith(')'):
  711. query_frag = "(%s)" % query_frag
  712. return u"%s%s" % (index_fieldname, query_frag)
  713. class WhooshEngine(BaseEngine):
  714. backend = WhooshSearchBackend
  715. query = WhooshSearchQuery

再新建一个ChineseAnalyzer.py

  1. import jieba
  2. from whoosh.analysis import Tokenizer, Token
  3. class ChineseTokenizer(Tokenizer):
  4. def __call__(self, value, positions=False, chars=False,
  5. keeporiginal=False, removestops=True,
  6. start_pos=0, start_char=0, mode='', **kwargs):
  7. t = Token(positions, chars, removestops=removestops, mode=mode,
  8. **kwargs)
  9. seglist = jieba.cut(value, cut_all=True)
  10. for w in seglist:
  11. t.original = t.text = w
  12. t.boost = 1.0
  13. if positions:
  14. t.pos = start_pos + value.find(w)
  15. if chars:
  16. t.startchar = start_char + value.find(w)
  17. t.endchar = start_char + value.find(w) + len(w)
  18. yield t
  19. def ChineseAnalyzer():
  20. return ChineseTokenizer()

在templates下面建文件夹,包含如下

这个JiaBlog是你的应用名称

txt文件如下:

  1. {{ object.title }}
  2. {{ object.authorname }}
  3. {{ object.body }}

title,authorname,body分别是我需要加索引的字段

search.html如下:记得自己定制一下展示的字段。

  1. <!DOCTYPE html>
  2. <html>
  3. <head>
  4. <title></title>
  5. </head>
  6. <body>
  7. {% if query %}
  8. <h3>搜索结果如下:</h3>
  9. {% for result in page.object_list %}
  10. <a href="/JiaBlog/article/{{ result.object.id }}/">{{ result.object.title }}</a><br/>
  11. {% empty %}
  12. <p>啥也没找到</p>
  13. {% endfor %}
  14. {% if page.has_previous or page.has_next %}
  15. <div>
  16. {% if page.has_previous %}<a href="?q={{ query }}&amp;page={{ page.previous_page_number }}">{% endif %}&laquo; 上一页{% if page.has_previous %}</a>{% endif %}
  17. |
  18. {% if page.has_next %}<a href="?q={{ query }}&amp;page={{ page.next_page_number }}">{% endif %}下一页 &raquo;{% if page.has_next %}</a>{% endif %}
  19. </div>
  20. {% endif %}
  21. {% endif %}
  22. </body>
  23. </html>

接下来手动新建索引

python manage.py rebuild_index

如果你完全照我的做,就可以新建索引了

如果你在linux环境下找不到python安装的第三方包的位置,可以这样:

  1. root@iZuf647of4lcxljq1unaeeZ:/home/MyBlog# python -c "import django;print(django)"
  2. <module 'django' from '/usr/local/lib/python3.6/dist-packages/django/__init__.py'>

再写一个坑吧

新建索引的时候报错:

django.template.exceptions.TemplateDoesNotExist: search/indexes/JiaBlog/articles_text.txt

明明本地是ok的啊,结果是因为服务器上的文件是

Articles_text.txt,是大写,把A改成a就可以了,卧槽。。。

然后就成功了

  1. root@iZuf647of4lcxljq1unaeeZ:/home/MyBlog# python manage.py rebuild_index
  2. System check identified some issues:
  3. WARNINGS:
  4. Blog.Articles.tags: (fields.W340) null has no effect on ManyToManyField.
  5. JiaBlog.Articles.tags: (fields.W340) null has no effect on ManyToManyField.
  6. WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
  7. Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
  8. Are you sure you wish to continue? [y/N] y
  9. Removing all documents from your index because you said so.
  10. All documents removed.
  11. Indexing 50 articless
  12. Building prefix dict from the default dictionary ...
  13. Dumping model to file cache /tmp/jieba.cache
  14. Loading model cost 1.046 seconds.
  15. Prefix dict has been built succesfully.

好了,效果如下

我们再美化一下search.html页面,用上原来的风格

注意url

http://www.guanacossj.com/search/?q=python

 hhh,大功告成!!!

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
  

闽ICP备14008679号