赞
踩
在es中每个节点默认都是ingest节点,可以对即将索引的文档数据进行预处理,比如修改文档字段的默认值、添加新的字段、字符串转数组等,es可以通过pipeline或者painless方式对数据进行预处理。Pipeline:数据管道,对于进入管道的数据进行有序的加工,它由一系列的processer处理器组成,每个处理器都可以单独做预处理,在es中内置了一系列的处理器,也可以扩展,pipeline的使用需要先在ES订阅pipeline,在更新文档或者重建索引的时候指定pipeline;painless脚本,在数据更新时候,可以用painless脚本进行预处理,在数据查询时,可以拦截数据修改返回值。
-
- 1 初始化数据
- DELETE tech_blogs
-
- PUT tech_blogs/_doc/1
- {
- "title":"Introducing big data......",
- "tags":"hadoop,elasticsearch,spark",
- "content":"You konw, for big data"
- }
-
-
-
- 2 pipeline 测试 tags字段转数组
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description": "to split blog tags",
- "processors": [
- {
- "split": {
- "field": "tags",
- "separator": ","
- }
- }
- ]
- },
- "docs": [
- {
- "_index": "index",
- "_id": "id",
- "_source": {
- "title": "Introducing big data......",
- "tags": "hadoop,elasticsearch,spark",
- "content": "You konw, for big data"
- }
- },
- {
- "_index": "index",
- "_id": "idxx",
- "_source": {
- "title": "Introducing cloud computering",
- "tags": "openstack,k8s",
- "content": "You konw, for cloud"
- }
- }
- ]
- }
-
-
- 3 同时为文档,增加一个字段。blog查看量
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description": "to split blog tags",
- "processors": [
- {
- "split": {
- "field": "tags",
- "separator": ","
- }
- },
-
- {
- "set":{
- "field": "views",
- "value": 0
- }
- }
- ]
- },
-
- "docs": [
- {
- "_index":"index",
- "_id":"id",
- "_source":{
- "title":"Introducing big data......",
- "tags":"hadoop,elasticsearch,spark",
- "content":"You konw, for big data"
- }
- },
-
-
- {
- "_index":"index",
- "_id":"idxx",
- "_source":{
- "title":"Introducing cloud computering",
- "tags":"openstack,k8s",
- "content":"You konw, for cloud"
- }
- }
-
- ]
- }
-
-
- 4 在ES添加注册一个 Pipeline
- PUT _ingest/pipeline/blog_pipeline
- {
- "description": "a blog pipeline",
- "processors": [
- {
- "split": {
- "field": "tags",
- "separator": ","
- }
- },
-
- {
- "set":{
- "field": "views",
- "value": 0
- }
- }
- ]
- }
-
- 5查看Pipleline
- GET _ingest/pipeline/blog_pipeline
-
- 6测试pipeline
- POST _ingest/pipeline/blog_pipeline/_simulate
- {
- "docs": [
- {
- "_source": {
- "title": "Introducing cloud computering",
- "tags": "openstack,k8s",
- "content": "You konw, for cloud"
- }
- }
- ]
- }
-
- 7不使用pipeline更新数据
- PUT tech_blogs/_doc/1
- {
- "title":"Introducing big data......",
- "tags":"hadoop,elasticsearch,spark",
- "content":"You konw, for big data"
- }
-
- 8使用pipeline更新数据
- PUT tech_blogs/_doc/2?pipeline=blog_pipeline
- {
- "title": "Introducing cloud computering",
- "tags": "openstack,k8s",
- "content": "You konw, for cloud"
- }
-
- 9查看两条数据,一条被处理,一条未被处理
- POST tech_blogs/_search
- {}
-
-
- 10 重建索引update_by_query 会导致错误
- POST tech_blogs/_update_by_query?pipeline=blog_pipeline
- {
- }
-
-
- 11增加update_by_query的条件(对没预处理的文档重建)
- POST tech_blogs/_update_by_query?pipeline=blog_pipeline
- {
- "query": {
- "bool": {
- "must_not": {
- "exists": {
- "field": "views"
- }
- }
- }
- }
- }
-
-
- 12 增加一个 通过painless脚本添加content_length字段
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description": "to split blog tags",
- "processors": [
- {
- "split": {
- "field": "tags",
- "separator": ","
- }
- },
- {
- "script": {
- "source": """
- if(ctx.containsKey("content")){
- ctx.content_length = ctx.content.length();
- }else{
- ctx.content_length=0;
- }
- """
- }
- },
-
- {
- "set":{
- "field": "views",
- "value": 0
- }
- }
- ]
- },
-
- "docs": [
- {
- "_index":"index",
- "_id":"id",
- "_source":{
- "title":"Introducing big data......",
- "tags":"hadoop,elasticsearch,spark",
- "content":"You konw, for big data"
- }
- },
-
-
- {
- "_index":"index",
- "_id":"idxx",
- "_source":{
- "title":"Introducing cloud computering",
- "tags":"openstack,k8s",
- "content":"You konw, for cloud"
- }
- }
-
- ]
- }
-
- 13 重新初始化数据
- DELETE tech_blogs
- PUT tech_blogs/_doc/1
- {
- "title":"Introducing big data......",
- "tags":"hadoop,elasticsearch,spark",
- "content":"You konw, for big data",
- "views":0
- }
-
- 14 通过painless脚本给view字段值+100
- POST tech_blogs/_update/1
- {
- "script": {
- "source": "ctx._source.views += params.new_views",
- "params": {
- "new_views":100
- }
- }
- }
-
-
- 15 查看views计数
- POST tech_blogs/_search
- {
-
- }
-
-
- 16 保存painless脚本在 Cluster State
- POST _scripts/update_views
- {
- "script":{
- "lang": "painless",
- "source": "ctx._source.views += params.new_views"
- }
- }
-
- 17 测试对其值增加1000
- POST tech_blogs/_update/1
- {
- "script": {
- "id": "update_views",
- "params": {
- "new_views":1000
- }
- }
- }
-
-
- 18 在查询时通过painless脚本对查询结果view字段添加一个随机值
- GET tech_blogs/_search
- {
- "script_fields": {
- "rnd_views": {
- "script": {
- "lang": "painless",
- "source": """
- java.util.Random rnd = new Random();
- doc['views'].value+rnd.nextInt(1000);
- """
- }
- }
- },
- "query": {
- "match_all": {}
- }
- }
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。