赞
踩
django中如何用es
by Adam Wattis
通过亚当·沃蒂斯(Adam Wattis)
A while back I was working on a Django project and wanted to implement fast free text search. Instead of using a regular database for this search function — such as MySQL or PostgreSQL — I decided to use a NoSQL database. That is when I discovered ElasticSearch.
前一段时间,我在Django项目上工作,想实现快速的自由文本搜索。 我决定使用NoSQL数据库,而不是使用常规数据库来执行此搜索功能(例如MySQL或PostgreSQL)。 那就是我发现ElasticSearch的时候 。
ElasticSearch indexes documents for your data instead of using data tables like a regular relational database does. This speeds up search, and offers a lot of other benefits that you don’t get with a regular database. I kept a regular relational database as well for storing user details, logins, and other data that ElasticSearch didn’t need to index.
ElasticSearch为您的数据索引文档,而不是像常规关系数据库那样使用数据表。 这可以加快搜索速度,并提供其他常规数据库无法获得的其他好处。 我还保留了一个常规的关系数据库,用于存储用户详细信息,登录名和其他不需要ElasticSearch索引的数据。
After searching for a long time on how to properly implement ElasticSearch with Django, I didn’t really find any satisfying answers. Some guides or tutorials were convoluted and seemed to be taking unnecessary steps in order to index the data into ElasticSearch. There was quite a bit of information on how to perform searching, but not as much about how the indexing should be done. I felt like there must be a simpler solution out there, so I decided to give it a try myself.
在搜索了如何使用Django正确实现ElasticSearch的很长时间之后,我并没有真正找到令人满意的答案。 一些指南或教程令人费解,似乎正在采取不必要的步骤来将数据索引到ElasticSearch中。 关于如何执行搜索的信息很多,但是关于如何完成索引的信息却不多。 我觉得那里肯定有一个更简单的解决方案,所以我决定自己尝试一下。
I wanted to keep it as simple as possible, because simple solutions tend to be the best ones in my opinion. KISS (Keep It Simple Stupid), Less is More and all of that stuff is something that resonates with me a lot, especially when every other solution out there is complex. I decided to use Honza Král’s example in this video to have something to base my code on. I recommend watching it, although it is a bit outdated at this point.
我想使它尽可能简单,因为在我看来,简单的解决方案往往是最好的解决方案。 KISS(保持简单愚蠢),少即是多,所有这些东西都引起了我的共鸣,特别是当其他解决方案非常复杂时。 我决定在此视频中使用HonzaKrál的示例来为我的代码提供基础。 我建议您观看,尽管此时它已过时。
Since I was using Django — which is written in Python — it was easy to interact with ElasticSearch. There are two client libraries to interact with ElasticSearch with Python. There’s elasticsearch-py, which is the official low-level client. And there’s elasticsearch-dsl, which is build upon the former but gives a higher-level abstraction with a bit less functionality.
由于我使用的是用Python编写的Django,因此与ElasticSearch进行交互非常容易。 有两个客户端库可通过Python与ElasticSearch进行交互。 有elasticsearch-py ,这是官方的低级客户端。 还有elasticsearch-dsl ,它是在前者的基础上构建的,但是它提供了更高层次的抽象,但功能却有所减少。
We will get into some example soon, but first I need to clarify what we want to accomplish:
我们将很快讨论一些示例,但首先我需要阐明我们要完成的工作:
All right, that seems simple enough. Lets get started by installing ElasticSearch on our machine. Also, all the code will be available on my GitHub so that you can easily follow the examples.
好吧,这似乎很简单。 让我们开始在我们的机器上安装ElasticSearch。 另外,所有代码都将在我的GitHub上可用,因此您可以轻松地遵循示例。
Since ElasticSearch runs on Java you must ensure you have an updated JVM version. Check what version you have with java -version
in the terminal. Then you run the following commands to create a new directory, download, extract and start ElasticSearch:
由于ElasticSearch在Java上运行,因此必须确保您具有更新的JVM版本。 在终端中使用java -version
检查您拥有的版本。 然后运行以下命令来创建新目录,下载,解压缩并启动ElasticSearch:
mkdir elasticsearch-example
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.tar.gz
tar -xzf elasticsearch-5.1.1.tar.gz
./elasticsearch-5.1.1/bin/elasticsearch
When ElasticSearch starts up there should be a lot of output printed to the terminal window. To check that its up and running correctly open up a new terminal window and run this curl
command:
当ElasticSearch启动时,应该在终端窗口上打印很多输出。 要检查其启动和运行是否正确,请打开一个新的终端窗口并运行以下curl
命令:
curl -XGET http://localhost:9200
The response should be something like this:
响应应该是这样的:
{ "name" : "6xIrzqq", "cluster_name" : "elasticsearch", "cluster_uuid" : "eUH9REKyQOy4RKPzkuRI1g", "version" : { "number" : "5.1.1", "build_hash" : "5395e21", "build_date" : "2016-12-06T12:36:15.409Z", "build_snapshot" : false, "lucene_version" : "6.3.0" }, "tagline" : "You Know, for Search"
Great, you now have ElasticSearch running on your local machine! It’s time to set up your Django project.
太好了,您现在已经在本地计算机上运行了ElasticSearch! 现在该设置您的Django项目了。
First you create a virtual environment with virtualenv venv
and enter it with source venv/bin/activate
in order to keep everything contained. Then you install some packages:
首先,您使用virtualenv venv
创建一个虚拟环境,并使用source venv/bin/activate
输入它,以保留所有内容。 然后安装一些软件包:
pip install djangopip install elasticsearch-dsl
To start a new Django project you run:
要启动一个新的Django项目,请运行:
django-admin startproject elasticsearchprojectcd elasticsearchprojectpython manage.py startapp elasticsearchapp
After you created your new Django projects you need to create a model that you will use. For this guide I chose to go with a good old fashioned blog post example. In models.py
you place the following code:
创建新的Django项目后,您需要创建一个将要使用的模型。 在本指南中,我选择了一个很好的老式博客文章示例。 在models.py
,放置以下代码:
from django.db import modelsfrom django.utils import timezonefrom django.contrib.auth.models import User# Create your models here.# Blogpost to be indexed into ElasticSearchclass BlogPost(models.Model): author = models.ForeignKey(User, on_delete=models.CASCADE, related_name='blogpost') posted_date = models.DateField(default=timezone.now) title = models.CharField(max_length=200) text = models.TextField(max_length=1000)
Pretty straight forward, so far. Don’t forget to add elasticsearchapp
to INSTALLED_APPS
in settings.py
and register your new BlogPost model in admin.py
like this:
到目前为止,还挺简单的。 不要忘记添加elasticsearchapp
到INSTALLED_APPS
中的settings.py
和注册新的博客帖子模型admin.py
是这样的:
from django.contrib import adminfrom .models import BlogPost# Register your models here.# Need to register my BlogPost so it shows up in the adminadmin.site.register(BlogPost)
You must also python manage.py makemigrations
, python manage.py migrate
and python manage.py createsuperuser
to create the database and an admin account. Now, python manage.py runserver
, go to http://localhost:8000/admin/
and login. You should now be able to see your Blog posts model there. Go ahead and create your first blog post in the admin.
您还必须python manage.py makemigrations
, python manage.py migrate
和python manage.py createsuperuser
创建数据库和管理员帐户。 现在,运行python manage.py runserver
,转到http://localhost:8000/admin/
并登录。 现在,您应该可以在那里看到您的Blog帖子模型。 继续并在管理员中创建您的第一篇博客文章。
Congratulations, you now have a functioning Django project! It’s finally time to get into the fun stuff — connecting ElasticSearch.
恭喜,您现在有了一个可正常运行的Django项目! 终于是时候玩有趣的东西了–连接ElasticSearch。
You begin by creating a new file called search.py
in our elasticsearchapp
directory. This is where the ElasticSearch code will live. The first thing you need to do here is to create a connection from your Django application to ElasticSearch. You do this in your search.py
file:
首先,在我们的elasticsearchapp
目录中创建一个名为search.py
的新文件。 这是ElasticSearch代码的所在地。 您要做的第一件事是创建从Django应用程序到ElasticSearch的连接。 您可以在search.py
文件中执行此操作:
from elasticsearch_dsl.connections import connectionsconnections.create_connection()
Now that you have a global connection to your ElasticSearch set-up you need to define what you want to index into it. Write this code:
现在,您已经与ElasticSearch设置建立了全局连接,您需要定义要索引到其中的内容。 编写此代码:
from elasticsearch_dsl.connections import connectionsfrom elasticsearch_dsl import DocType, Text, Dateconnections.create_connection()class BlogPostIndex(DocType): author = Text() posted_date = Date() title = Text() text = Text() class Meta: index = 'blogpost-index'
It looks pretty similar to your model, right? The DocType
works as a wrapper to enable you to write an index like a model, and the Text
and Date
are the fields so that they get the correct format when they get indexed.
它看起来与您的模型非常相似,对吧? DocType
用作包装器,使您能够像模型一样编写索引,而Text
和Date
是字段,以便在建立索引时它们具有正确的格式。
Inside the Meta you tell ElasticSearch what you want the index to be named. This will be a point of reference for ElasticSearch so that it knows what index it’s dealing with when initializing it in the database and saving each new object instance created.
在Meta内部,您告诉ElasticSearch您希望索引被命名为什么。 这将是ElasticSearch的参考点,以便当在数据库中初始化索引并保存每个创建的新对象实例时,它知道要处理的索引。
Now you need to actually create the mapping of your newly created BlogPostIndex
in ElasticSearch. You can do this and also create a way to do the bulk indexing at the same time — how convenient right?
现在,您需要在BlogPostIndex
中实际创建新创建的BlogPostIndex
的映射。 您可以执行此操作,还可以创建一种同时进行批量索引的方法-多么方便?
The bulk
command is located in elasticsearch.helpers
which is included when you installed elasticsearch_dsl
since it is built on top of that library. Do the following in search.py
:
bulk
命令位于elasticsearch.helpers
,该命令是在该库之上构建的,因此在安装elasticsearch_dsl
时会包含该命令。 在search.py
执行以下操作:
...from elasticsearch.helpers import bulkfrom elasticsearch import Elasticsearchfrom . import models...
...def bulk_indexing(): BlogPostIndex.init() es = Elasticsearch() bulk(client=es, actions=(b.indexing() for b in models.BlogPost.objects.all().iterator()))
“What is going on here?” you might be thinking. It’s not that complicated, actually.
“这里发生了什么?” 你可能在想。 实际上,它并不那么复杂。
Since you only want to do bulk indexing whenever you change something in our model you init()
the model which maps it into ElasticSearch. Then, you use the bulk
and pass it an instance of Elasticsearch()
which will create a connection to ElasticSearch. You then pass a generator to actions=
and iterate over all the BlogPost
objects you have in your regular database and call the .indexing()
method on each object. Why a generator? Because if you had a lot of objects to iterate over a generator would not have to first load them into memory.
由于只要在模型中进行更改,您就只想进行批量索引编制,因此您可以使用init()
模型将其映射到ElasticSearch中。 然后,您使用bulk
并将其传递给Elasticsearch()
实例,该实例将创建到ElasticSearch的连接。 然后,您将生成器传递给actions=
并遍历常规数据库中所有的BlogPost
对象,并在每个对象上调用.indexing()
方法。 为什么要使用发电机? 因为如果要在生成器上迭代的对象很多,则不必先将它们加载到内存中。
There is just one problem with the above code. You don’t have an .indexing()
method on your model yet. Lets fix that:
上面的代码只有一个问题。 您的模型上还没有.indexing()
方法。 让我们修复:
...from .search import BlogPostIndex...
...# Add indexing method to BlogPostdef indexing(self): obj = BlogPostIndex( meta={'id': self.id}, author=self.author.username, posted_date=self.posted_date, title=self.title, text=self.text ) obj.save() return obj.to_dict(include_meta=True)
You add the indexing method to the BlogPost
model. It returns a BlogPostIndex
and gets saved to ElasticSearch.
您将索引方法添加到BlogPost
模型。 它返回BlogPostIndex
并保存到ElasticSearch。
Lets try this out now and see if you can bulk index the blog post you previously created. By running python manage.py shell
you go into the Django shell and import your search.py
with from elasticsearchapp.search import *
and then run bulk_indexing()
to index all the blog posts in your database. To see if it worked you run the following curl command:
现在让我们尝试一下,看看是否可以对以前创建的博客文章进行批量索引。 通过运行python manage.py shell
您将进入Django shell,并from elasticsearchapp.search import *
bulk_indexing()
from elasticsearchapp.search import *
导入search.py
,然后运行bulk_indexing()
为数据库中的所有博客文章建立索引。 要查看它是否起作用,请运行以下curl命令:
curl -XGET 'localhost:9200/blogpost-index/blog_post_index/1?pretty'
You should get back your first blog post in the terminal.
您应该在终端上找回第一篇博客文章。
Next you need to add a signal that fires the .indexing()
on each new instance that is saved every time a user saves a new blog post. In elasticsearchapp
create a new file called signals.py
and add this code:
接下来,您需要添加一个信号,以在用户每次保存新博客帖子时保存的每个新实例上触发.indexing()
。 在elasticsearchapp
创建一个名为signals.py
的新文件,并添加以下代码:
from .models import BlogPostfrom django.db.models.signals import post_savefrom django.dispatch import receiver@receiver(post_save, sender=BlogPost)def index_post(sender, instance, **kwargs): instance.indexing()
The post_save
signal will ensure that the saved instance will get indexed with the .indexing()
method after it is saved.
post_save
信号将确保保存后的实例在保存后将通过.indexing()
方法建立索引。
In order for this to work we also need to register Django that we’re using signals. We do this opening apps.py
and adding the following code:
为了使它起作用,我们还需要注册使用信号的Django。 我们这样做是打开apps.py
并添加以下代码:
from django.apps import AppConfigclass ElasticsearchappConfig(AppConfig): name = 'elasticsearchapp' def ready(self): import elasticsearchapp.signals
To to complete this we also need to tell Django that we’re using this new configuration. We do this inside the __init__.py
inside our elasticsearchapp
directory by adding:
要完成此操作,我们还需要告诉Django我们正在使用此新配置。 我们通过添加以下内容在我们的elasticsearchapp
目录中的__init__.py
进行此操作:
default_app_config = 'elasticsearchapp.apps.ElasticsearchappConfig'
Now the post_save
signal is registered with Django and is ready to listen for whenever a new blogpost is being saved.
现在, post_save
信号已在Django中注册,并且随时可以在保存新博客文章时进行监听。
Try it our by going into the Django admin again and saving a new blogpost. Then check with a curl
command if it was successfully indexed into ElasticSearch.
再次进入Django管理员并保存一个新博客,尝试一下。 然后使用curl
命令检查它是否已成功索引到ElasticSearch中。
Now lets make a simple search function in search.py
to find all posts filtered by author:
现在,让我们在search.py
创建一个简单的搜索功能,以查找按作者过滤的所有帖子:
...from elasticsearch_dsl import DocType, Text, Date, Search...
...def search(author): s = Search().filter('term', author=author) response = s.execute() return response
Lets try the search out. In the shell: from elasticsearchapp.search import *
and run print(search(author="<author name&
gt;")) :
让我们尝试搜索。 在外壳中: from elasticsearchapp.search import *
并运行print(search(author="<author name&
gt;”)):
>>> print(search(author="home"))<Response: [<Result(blogpost-index/blog_post_index/1): {'text': 'Hello world, this is my first blog post', 'title':...}>]>
There you have it! You have now successfully indexed all your instances into ElasticSearch, created a post_save
signal that indexes each newly saved instance, and created a function to search our ElasticSearch database for your data.
你有它! 现在,您已成功将所有实例索引到ElasticSearch中,创建了一个post_save
信号来索引每个新保存的实例,并创建了一个函数在我们的ElasticSearch数据库中搜索您的数据。
This was a quite lengthy article but I hope it is written simple enough for even the beginner to be able to understand.
这是一篇冗长的文章,但我希望它写得足够简单,即使是初学者也可以理解。
I explained how to connect a Django model to ElasticSearch for indexing and searching, but there is so much more that ElasticSearch can do. I recommend reading on their website and exploring what other possibilities exist, such as spatial operations and full text search with intelligent highlighting. Its a great tool and I will be sure to use it in future projects!
我解释了如何将Django模型连接到ElasticSearch进行索引和搜索,但是ElasticSearch可以做很多事情。 我建议在他们的网站上阅读并探索还有其他可能性,例如空间操作和带有智能突出显示的全文本搜索。 它是一个很棒的工具,我一定会在以后的项目中使用它!
If you liked this article or have a comment or suggestion, please feel free to leave a message below. And stay tuned for more interesting stuff!
如果您喜欢本文或有任何评论或建议,请随时在下面留言。 敬请期待更多有趣的东西!
翻译自: https://www.freecodecamp.org/news/elasticsearch-with-django-the-easy-way-909375bc16cb/
django中如何用es
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。