目录
Date range aggregation(日期范围聚合)
Multi Terms aggregation(多字段聚合)
目标
掌握Bucket aggregations(分桶聚合,相当于MySQL中的分组聚合 )语法,通过本文列举的各种案例举一反三。具体会涉及以下内容:
- Multi Terms aggregation(多字段聚合);
- 分桶聚合排序;
- 分桶聚合前先过滤数据;
- 根据范围分桶聚合;
- Histogram(直方图/柱状图);
- 嵌套分桶聚合;
- 对日期分桶聚合;
- 单过滤器和多过滤器;
- missing聚合。
ES版本信息
7.17.5
官方文档
实战
新增测试数据
PUT /library_db
{
"settings": {
"index": {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
PUT /library_db/_bulk
{"index":{"_id":"1"}}
{"id":1,"type":"玄幻","name":"诛仙","words_num":120,"chapter_num":600,"completion_time":"2000-09-01","author":"萧鼎","prices":32.12}
{"index":{"_id":"2"}}
{"id":2,"type":"玄幻","name":"诛仙前传:蛮荒行","words_num":30,"chapter_num":67,"completion_time":"2020-09-01","author":"萧鼎","prices":23.12}
{"index":{"_id":"3"}}
{"id":3,"type":"武侠","name":"天龙八部","words_num":80,"chapter_num":120,"completion_time":"1995-09-01","author":"金庸","prices":52.1}
{"index":{"_id":"4"}}
{"id":4,"type":"武侠","name":"射雕英雄传","words_num":67,"chapter_num":95,"completion_time":"1998-01-01","author":"金庸","prices":4.12}
{"index":{"_id":"5"}}
{"id":5,"type":"武侠","name":"神雕侠侣","words_num":75,"chapter_num":76,"completion_time":"2000-01-01","author":"金庸","prices":32.8}
{"index":{"_id":"6"}}
{"id":5,"type":"武侠","name":"倚天屠龙记","words_num":83,"chapter_num":130,"completion_time":"2003-01-01","author":"金庸","prices":100.12}
{"index":{"_id":"7"}}
{"id":7,"type":"玄幻","name":"凡人修仙传","words_num":600,"chapter_num":3000,"completion_time":"2018-01-01","author":"忘语","prices":120.12}
{"index":{"_id":"8"}}
{"id":8,"type":"玄幻","name":"魔天记","words_num":159,"chapter_num":400,"completion_time":"2019-01-01","author":"忘语","prices":11.12}
{"index":{"_id":"9"}}
{"id":9,"type":"都市异能","name":"黄金瞳","words_num":220,"chapter_num":400,"completion_time":"2019-01-01","author":"打眼","prices":74.5}
{"index":{"_id":"10"}}
{"id":10,"type":"玄幻","name":"将夜","words_num":210,"chapter_num":600,"completion_time":"2014-01-01","author":"血红","prices":32.0}
{"index":{"_id":"11"}}
{"id":11,"type":"军事","name":"亮剑","words_num":120,"chapter_num":100,"completion_time":"2012-01-01","author":"都梁","prices":15.0}
基本语法
需求一:求图书馆中每个小说类型的小说数量。
#size=10,表示聚合后展示10条数据。
GET /library_db/_search
{
"size": 0,
"aggs": {
"type_count": {
"terms": {
"field": "type.keyword",
"size": 10
}
}
}
}
需求二:求图书馆中每个小说类型的小说数量。按照升序排序。
#同理,降序排序就用desc,这和MySQL是一样的语法。
GET /library_db/_search
{
"size": 0,
"aggs": {
"type_count": {
"terms": {
"field": "type.keyword",
"size": 10,
"order": {
"_count": "asc"
}
}
}
}
}
先过滤再分桶聚合
需求:求图书馆中每个小说类型的小说数量,要求小说字数大于等于100万。
GET /library_db/_search
{
"size": 0,
"query": {
"range": {
"words_num": {
"gte": 100
}
}
},
"aggs": {
"type_count": {
"terms": {
"field": "type.keyword",
"size": 10
}
}
}
}
按照范围聚合
需求:求小说字数在0-30万、30万-50万、50万-100万、100万-200万、大于等于200万区间的数量。
GET /library_db/_search
{
"size": 0,
"aggs": {
"words_num_count": {
"range": {
"field": "words_num",
"ranges": [
{
"from": 0,
"to": 30
},
{
"to": 50,
"from": 30
},
{
"to": 100,
"from": 50
},
{
"to": 200,
"from": 100
},
{
"key": ">200",
"from": 200
}
]
}
}
}
}
Histogram(直方图/柱状图)
需求:求图书馆中,各个字数区间的小说数量,每个区间50万字。
#这里会查出来很多空区间。设置min_doc_count=1表示为空区间的数据不返回。
GET /library_db/_search
{
"size": 0,
"aggs": {
"type_count": {
"histogram": {
"field": "words_num",
"interval": 50,
"min_doc_count": 1
}
}
}
}
嵌套分桶聚合
需求:求图书馆中,每种类型的小说的平均价格。
GET /library_db/_search
{
"size": 0,
"aggs": {
"type_group": {
"terms": {
"field": "type.keyword"
},
"aggs": {
"prices_avg": {
"avg": {
"field": "prices"
}
}
}
}
}
}
Date range aggregation(日期范围聚合)
分析
- from和to分别表示开始时间和结束时间;
- from相当于>=;
- to相当于<;
- missing:为缺少的字段设置代替值的作用。
需求一:求图书馆中,以当前时间后推一年的时间间隔,完本的小说数量。注意:这里按照东八区(北京时间)来计算,所以+8h。
GET /library_db/_search
{
"size": 0,
"aggs": {
"range": {
"date_range": {
"field": "completion_time",
"ranges": [
{
"from": "now+8h-2y/d",
"to": "now+8h-1y/d"
}
]
}
}
}
}
需求二:求图书馆中,以2019年为分界线,完本的小说的数据量。没有时间字段,则默认时间字段值为1976-11-30,它们会被划分到Older中。
GET /library_db/_search
{
"size": 0,
"aggs": {
"range": {
"date_range": {
"field": "completion_time",
"missing": "1976-11-30",
"ranges": [
{
"key": "Older",
"to": "2019-01-01"
},
{
"key": "Newer",
"from": "2019-01-01",
"to": "now+8h/d"
}
]
}
}
}
}
Filter aggregation
需求:查询所有小说的平均价格和武侠小说的平均价格。
POST /library_db/_search?size=0&filter_path=aggregations
{
"aggs": {
"avg_price": {
"avg": {
"field": "prices"
}
},
"wx": {
"filter": {
"term": {
"type": "武侠"
}
},
"aggs": {
"wx_avg_prices": {
"avg": {
"field": "prices"
}
}
}
}
}
}
#如果单纯地查询武侠小说的平均价格,query比filter效率更高。
POST /library_db/_search?size=0&filter_path=aggregations
{
"query": { "term": { "type": "武侠" } },
"aggs": {
"wx": { "avg": { "field": "prices" } }
}
}
#不推荐
POST /library_db/_search?size=0&filter_path=aggregations
{
"aggs": {
"wx": {
"filter": { "term": { "type": "武侠" } },
"aggs": {
"wx_avg_prices": { "avg": { "field": "prices" } }
}
}
}
}
Filters aggregation
分析
- 多个过滤条件时,考虑到效率问题,推荐用多过滤器而不是单过滤器。
- other_bucket_key表示其他,相当于java中的else。
需求一:查询武侠、玄幻、军事、都市类小说的数量。
GET library_db/_search
{
"size": 0,
"aggs": {
"type_avg": {
"filters": {
"filters": {
"wx": {
"term": {
"type.keyword": "武侠"
}
},
"xh": {
"term": {
"type.keyword": "玄幻"
}
},
"js": {
"term": {
"type.keyword": "军事"
}
},
"ds": {
"term": {
"type.keyword": "都市"
}
}
}
}
}
}
}
需求二:查询武侠、玄幻、其他类小说的数量。这里的其他类是只除了玄幻和武侠小说的类别。
GET library_db/_search
{
"size": 0,
"aggs": {
"type_avg": {
"filters": {
"filters": {
"wx": {
"term": {
"type.keyword": "武侠"
}
},
"xh": {
"term": {
"type.keyword": "玄幻"
}
}
},
"other_bucket_key": "other_type"
}
}
}
}
Missing aggregation
需求:搜索没有价格字段的小说数量。
POST /library_db/_search?size=0
{
"aggs": {
"without_prices": {
"missing": { "field": "prices" }
}
}
}
Multi Terms aggregation(多字段聚合)
需求:根据作者和小说类型字段分组,统计小说数量。
分析:相当于MySQL中对多个字段进行分组。Multi Terms aggregation与terms aggregation相似,但是前者效率低于后者,所以官方文档中建议:如果经常使用的同一组字段,则将这些字段的组合键索引设置为单独的字段并在此字段上使用terms aggregation。
GET /library_db/_search
{
"size": 0,
"aggs": {
"genres_and_products": {
"multi_terms": {
"terms": [{
"field": "type.keyword"
}, {
"field": "author.keyword"
}]
}
}
}
}
本文含有隐藏内容,请 开通VIP 后查看