酷酷是懒虫

这个屌丝很懒，什么也没留下！

热门标签

6.ElasticSearch 7.15 索引字段的数据类型（常用数据类型）_elasticsearch索引字段类型

作者：酷酷是懒虫 | 2024-08-15 00:12:00

踩

elasticsearch索引字段类型

文章目录

数据类型概览
字符串类型
- keyword
- text
数值类型
日期类型
- 支持多种格式日期
布尔类型
Object类型
nested类型
- nested字段限制
alias 别名
- 别名的限制
地理空间类型
其他类型

数据类型概览

每个字段都有一个字段数据类型。此类型指示字段包含的数据类型(如字符串或布尔值)及其预期用途。例如，您可以将字符串索引到text和keyword字段。但是，text字段会被分词，用于全文搜索，而keyword字符串保持原样用于过滤和排序。

字段类型按族进行分组。同一家族的类型支持相同的搜索功能，但可能有不同的存储空间或性能。

例如keyword类型族，由keyword、constant_keyword和wildcard 类型组成。
例如boolean类型族由一个字段类型组成:boolean。

字符串类型

keyword

keyword族包括以下字段类型:

keyword，用于结构化内容，如id、电子邮件地址、主机名、状态码、邮政编码或标签。
constant_keyword，表示始终包含相同值(常量)的keyword字段。
wildcard，用于机器生成的非结构化的内容。wildcard字段类型针对大值或高基数的字段进行了优化。

Keyword 字段经常用于结构化内容的查询，通常用于过滤、排序、聚合和 term-level queries,例如term（用于精确查询）。例如，id、电子邮件地址、主机名、状态码、邮政编码或标签等。

例如，定义一个keyword字段：

PUT my-index-000001
{
  "mappings": {
	"properties": {
	  "tags": {
		"type":  "keyword"
	  }
	}
  }
}
1
2
3
4
5
6
7
8
9
10

keyword字段接受以下参数:

参数	说明
null_value	接受替换为任何显式null的字符串值。默认为null，这意味着该字段被视为缺失。
index	该字段是否可以搜索?接受true(默认值)或false。
等等

例如，定义一个常量keyword字段：

PUT logs-debug
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "message": {
        "type": "text"
      },
      "level": {
        "type": "constant_keyword",
        "value": "debug"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

如果插入索引时，level字段没有提供值，则该字段将根据第一个索引文档中包含的值（debug）自动设置value.

text

text，最适合于非结构化的内容（全文索引），text字段不用于排序，也很少用于聚合，text字段会被分析（分词），也就是说，在建立索引之前，它们会通过分析器（分词器）将字符串转换为单个词的列表，例如，电子邮件的正文或产品的描述。

例如，定义个text字段：

PUT my-index-000001
{
  "mappings": {
	"properties": {
	  "full_name": {
		"type":  "text"
	  }
	}
  }
}
1
2
3
4
5
6
7
8
9
10

文本字段在默认情况下是可搜索的，但在默认情况下不能用于聚合、排序或脚本。如果你试图在一个文本字段上排序、聚合或访问一个脚本中的值，会抛异常。

数值类型

byte、short、integer、long、float、double、unsigned_long、half_float
scaled_float，一个使用long表示的浮点数，精度由一个换算系数决定，比如俩位小数，3.14 缩放因子是100的话，在文档中就存储314，取的时候除以100，表示金额时可以使用)
例如：

PUT my-index-000001
{
  "mappings": {
	"properties": {
	  "number_of_bytes": {
		"type": "integer"
	  },
	  "time_in_seconds": {
		"type": "float"
	  },
	  "price": {
		"type": "scaled_float",
		"scaling_factor": 100
	  }
	}
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

日期类型

epoch代表UNIX诞生的UTC时间1970年1月1日0时0分0秒。

JSON没有日期数据类型，所以Elasticsearch中的日期可以是:

格式化日期的字符串, 例如：“2015-01-01” or “2015/01/01 12:10:30”.
表示从epoch开始的毫秒数.
表示从epoch开始的秒数 (需配置).

在内部，日期被转换为UTC(如果指定了时区)并存储为表示从epoch开始的毫秒数的long数字。

查询日期字段时，ES在内部转换为对这种long 数字的范围查询，返回时，根据与该字段关联的日期格式将结果转换回字符串。

日期格式可以自定义，但如果没有指定格式，则使用默认:

 "strict_date_optional_time||epoch_millis"
1

strict_date_optional_time ，是 ISO datetime格式，代表以下俩种的任意一种:

yyyy-MM-dd’T’HH:mm:ss.SSSZ
yyyy-MM-dd

epoch_millis ，表示从epoch开始的毫秒数，受到Java中的Long.MIN_VALUE 和 Long.MAX_VALUE 的限制。

例如：

# date字段使用默认格式
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date" 
      }
    }
  }
}

# 只包含了日期
PUT my-index-000001/_doc/1
{ "date": "2015-01-01" } 

# 包含了时间
PUT my-index-000001/_doc/2
{ "date": "2015-01-01T12:10:30Z" } 

# 本文档使用的是毫秒
PUT my-index-000001/_doc/3
{ "date": 1420070400001 } 

# 注意，返回的排序值都是以毫秒为单位的。
GET my-index-000001/_search
{
  "sort": { "date": "asc"} 
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

支持多种格式日期

可以使用||作为分隔符来分隔多种格式。将依次尝试每种格式，直到找到匹配的格式。指定的第一种格式将用于将ES底层存储的毫秒值转换回字符串。

例如：

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11

布尔类型

布尔字段接受JSON的true和false值，但也可以接受被解释为true或false的字符串:

类型	值
False values	false, “false”, “” (空字符串)
True values	true, “true”

例如：

# 新建一个类型为boolean的索引
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "is_published": {
        "type": "boolean"
      }
    }
  }
}

# 以字符串类型插入
POST my-index-000001/_doc/1?refresh
{
  "is_published": "true" 
}

# 以boolean类型查询
GET my-index-000001/_search
{
  "query": {
    "term": {
      "is_published": true 
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Object类型

JSON文档本质上是分层的:文档可能包含内部对象，而内部对象可能包含又包含对象。

例如：

PUT my-index-000001/_doc/1
{ 
  "region": "US",
  "manager": { 
    "age": 30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11

在内部，这个文档被索引为一个简单的平铺的键-值对列表，类似如下:

{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith"
}
1
2
3
4
5
6

您不需要显式地将字段类型设置为object，因为这是默认值。上述文档的显式映射看起来像这样:

PUT my-index-000001
{
  "mappings": {
    "properties": { 
      "region": {
        "type": "keyword"
      },
      "manager": { 
        "properties": {
          "age":  { "type": "integer" },
          "name": { 
            "properties": {
              "first": { "type": "text" },
              "last":  { "type": "text" }
            }
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

nested类型

nested类型是Object数据类型的特殊化版本，它允许以一种可以彼此独立查询的方式对对象数组进行索引。

Elasticsearch 没有内部对象的概念，因此，它将对象层次结构扁平化为一个简单的字段名和值的列表。

例如（反例，如果使用Object类型）：

PUT my-index-000001/_doc/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

上面的文档将在内部转换为一个文档，看起来像这样:

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}
1
2
3
4
5

这个文档将不正确地匹配alice AND smith的查询（alice和smith不在同一个嵌套文档中，筛选出来俩条，但你的本意是筛选一条名字是alice smith）:

GET my-index-000001/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11

如果需要索引对象数组并维护数组中每个对象的独立性，请使用nested数据类型而不是Object数据类型。

例如（使用nested类型）：

# user字段的类型为nested
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "user": {
        "type": "nested" 
      }
    }
  }
}

# 插入一条文档
PUT my-index-000001/_doc/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

# 查询user中名字为 alice smith的用户，查询不到，原因是因为这俩单词不在同一个内嵌文档
GET my-index-000001/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

# 匹配成功，inner_hits允许我们高亮匹配的嵌套文档。
GET my-index-000001/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

nested字段限制

由于与nested映射相关的开销，Elasticsearch有以下参数设置，以防止性能问题:

参数名	说明
index.mapping.nested_fields.limit	一个索引中最多能有多少个nested类型字段，默认是50个
index.mapping.nested_objects.limit	一个索引中最多能有多少个内嵌对象（所有nested字段中的所有内嵌对象），默认是10000个

alias 别名

别名映射定义索引中字段的替代名称。别名可以在_search请求中替代目标字段，也可以选择其他api，比如field capabilities。

例如：

# path必须指定的是引用字段的全路径，例如：object1.object2.field
PUT trips
{
  "mappings": {
    "properties": {
      "distance": {
        "type": "long"
      },
      "route_length_miles": {
        "type": "alias",
        "path": "distance" 
      },
      "transit_mode": {
        "type": "keyword"
      }
    }
  }
}

GET _search
{
  "query": {
    "range" : {
      "route_length_miles" : {
        "gte" : 39
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

别名的限制

别名的目标字段有一些限制：

目标字段必须是一个具体的字段，而不是一个对象或另一个字段别名。
在创建别名时，目标字段必须存在。
如果定义了nested对象，则字段别名必须具有与其目标字段相同的嵌套范围。
一个字段别名只能有一个目标字段。
如果任何存储的过滤器查询包含字段别名，它们仍将引用其原始目标。
不支持对字段别名的写入:尝试在index或update请求中使用别名将导致失败。

地理空间类型

Point、Shape、Geopoint、Geoshape

其他类型

Binary、Version、IP等。

参考官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】