ES聚合之Bucket聚合语法讲解

发布于:2022-12-26 ⋅ 阅读:(350) ⋅ 点赞:(0)

目录

目标

ES版本信息

官方文档

实战

新增测试数据

基本语法

先过滤再分桶聚合

按照范围聚合

Histogram(直方图/柱状图)

嵌套分桶聚合

Date range aggregation(日期范围聚合)

Filter aggregation

Filters aggregation

Missing aggregation

Multi Terms aggregation(多字段聚合)


目标

掌握Bucket aggregations(分桶聚合,相当于MySQL中的分组聚合 )语法,通过本文列举的各种案例举一反三。具体会涉及以下内容:

  • Multi Terms aggregation(多字段聚合);
  • 分桶聚合排序;
  • 分桶聚合前先过滤数据;
  • 根据范围分桶聚合;
  • Histogram(直方图/柱状图);
  • 嵌套分桶聚合;
  • 对日期分桶聚合;
  • 单过滤器和多过滤器;
  • missing聚合。

ES版本信息

7.17.5


官方文档

Bucket aggregationshttps://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-aggregations-bucket.html


实战

新增测试数据

PUT /library_db
{
  "settings": {
    "index": {
      "analysis.analyzer.default.type": "ik_max_word"
    }
  }
}
 
PUT /library_db/_bulk
{"index":{"_id":"1"}}
{"id":1,"type":"玄幻","name":"诛仙","words_num":120,"chapter_num":600,"completion_time":"2000-09-01","author":"萧鼎","prices":32.12}
{"index":{"_id":"2"}}
{"id":2,"type":"玄幻","name":"诛仙前传:蛮荒行","words_num":30,"chapter_num":67,"completion_time":"2020-09-01","author":"萧鼎","prices":23.12}
{"index":{"_id":"3"}}
{"id":3,"type":"武侠","name":"天龙八部","words_num":80,"chapter_num":120,"completion_time":"1995-09-01","author":"金庸","prices":52.1}
{"index":{"_id":"4"}}
{"id":4,"type":"武侠","name":"射雕英雄传","words_num":67,"chapter_num":95,"completion_time":"1998-01-01","author":"金庸","prices":4.12}
{"index":{"_id":"5"}}
{"id":5,"type":"武侠","name":"神雕侠侣","words_num":75,"chapter_num":76,"completion_time":"2000-01-01","author":"金庸","prices":32.8}
{"index":{"_id":"6"}}
{"id":5,"type":"武侠","name":"倚天屠龙记","words_num":83,"chapter_num":130,"completion_time":"2003-01-01","author":"金庸","prices":100.12}
{"index":{"_id":"7"}}
{"id":7,"type":"玄幻","name":"凡人修仙传","words_num":600,"chapter_num":3000,"completion_time":"2018-01-01","author":"忘语","prices":120.12}
{"index":{"_id":"8"}}
{"id":8,"type":"玄幻","name":"魔天记","words_num":159,"chapter_num":400,"completion_time":"2019-01-01","author":"忘语","prices":11.12}
{"index":{"_id":"9"}}
{"id":9,"type":"都市异能","name":"黄金瞳","words_num":220,"chapter_num":400,"completion_time":"2019-01-01","author":"打眼","prices":74.5}
{"index":{"_id":"10"}}
{"id":10,"type":"玄幻","name":"将夜","words_num":210,"chapter_num":600,"completion_time":"2014-01-01","author":"血红","prices":32.0}
{"index":{"_id":"11"}}
{"id":11,"type":"军事","name":"亮剑","words_num":120,"chapter_num":100,"completion_time":"2012-01-01","author":"都梁","prices":15.0}

基本语法

需求一:求图书馆中每个小说类型的小说数量。

#size=10,表示聚合后展示10条数据。
GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "type_count": {
     "terms": {
       "field": "type.keyword",
       "size": 10
     }
    }
  }
}

需求二:求图书馆中每个小说类型的小说数量。按照升序排序。

#同理,降序排序就用desc,这和MySQL是一样的语法。
GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "type_count": {
      "terms": {
        "field": "type.keyword",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

先过滤再分桶聚合

需求:求图书馆中每个小说类型的小说数量,要求小说字数大于等于100万。

GET /library_db/_search
{
  "size": 0,
  "query": {
    "range": {
      "words_num": {
        "gte": 100
      }
    }
  }, 
  "aggs": {
    "type_count": {
      "terms": {
        "field": "type.keyword",
        "size": 10
      }
    }
  }
}

按照范围聚合

需求:求小说字数在0-30万、30万-50万、50万-100万、100万-200万、大于等于200万区间的数量。

 
GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "words_num_count": {
      "range": {
        "field": "words_num",
        "ranges": [
          {
            "from": 0, 
            "to": 30
          },
          {
            "to": 50,
            "from": 30
          },
          {
            "to": 100,
            "from": 50
          },
          {
            "to": 200,
            "from": 100
          },
          {
            "key": ">200",
            "from": 200
          }
        ]
      }
    }
  }
}

Histogram(直方图/柱状图)

需求:求图书馆中,各个字数区间的小说数量,每个区间50万字。

#这里会查出来很多空区间。设置min_doc_count=1表示为空区间的数据不返回。
GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "type_count": {
      "histogram": {
        "field": "words_num",
        "interval": 50,
        "min_doc_count": 1
      }
    }
  }
}

嵌套分桶聚合

需求:求图书馆中,每种类型的小说的平均价格。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "type_group": {
      "terms": {
        "field": "type.keyword"
      },
      "aggs": {
        "prices_avg": {
          "avg": {
            "field": "prices"
          }
        }
      }
    }
  }
}

Date range aggregation(日期范围聚合)

分析

  • from和to分别表示开始时间和结束时间;
  • from相当于>=;
  • to相当于<;
  • missing:为缺少的字段设置代替值的作用。

需求一:求图书馆中,以当前时间后推一年的时间间隔,完本的小说数量。注意:这里按照东八区(北京时间)来计算,所以+8h。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "range": {
      "date_range": {
        "field": "completion_time",
        "ranges": [
          {
            "from": "now+8h-2y/d",
            "to": "now+8h-1y/d"
          }
        ]
      }
    }
  }
}

需求二:求图书馆中,以2019年为分界线,完本的小说的数据量。没有时间字段,则默认时间字段值为1976-11-30,它们会被划分到Older中。

GET /library_db/_search
{
  "size": 0,
  "aggs": {
    "range": {
      "date_range": {
        "field": "completion_time",
        "missing": "1976-11-30",
        "ranges": [
          {
            "key": "Older",
            "to": "2019-01-01"
          },
          {
            "key": "Newer",
            "from": "2019-01-01",
            "to": "now+8h/d"
          }
        ]
      }
    }
  }
}

Filter aggregation

需求:查询所有小说的平均价格和武侠小说的平均价格。

POST /library_db/_search?size=0&filter_path=aggregations
{
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "prices"
      }
    },
    "wx": {
      "filter": {
        "term": {
          "type": "武侠"
        }
      },
      "aggs": {
        "wx_avg_prices": {
          "avg": {
            "field": "prices"
          }
        }
      }
    }
  }
}

#如果单纯地查询武侠小说的平均价格,query比filter效率更高。
POST /library_db/_search?size=0&filter_path=aggregations
{
  "query": { "term": { "type": "武侠" } },
  "aggs": {
    "wx": { "avg": { "field": "prices" } }
  }
}

#不推荐
POST /library_db/_search?size=0&filter_path=aggregations
{
  "aggs": {
    "wx": {
      "filter": { "term": { "type": "武侠" } },
      "aggs": {
        "wx_avg_prices": { "avg": { "field": "prices" } }
      }
    }
  }
}

Filters aggregation

分析

  • 多个过滤条件时,考虑到效率问题,推荐用多过滤器而不是单过滤器。
  • other_bucket_key表示其他,相当于java中的else。

需求一:查询武侠、玄幻、军事、都市类小说的数量。

GET library_db/_search
{
  "size": 0,
  "aggs": {
    "type_avg": {
      "filters": {
        "filters": {
          "wx": {
            "term": {
              "type.keyword": "武侠"
            }
          },
          "xh": {
            "term": {
              "type.keyword": "玄幻"
            }
          },
          "js": {
            "term": {
              "type.keyword": "军事"
            }
          },
          "ds": {
            "term": {
              "type.keyword": "都市"
            }
          }
        }
      }
    }
  }
}

需求二:查询武侠、玄幻、其他类小说的数量。这里的其他类是只除了玄幻和武侠小说的类别。

GET library_db/_search
{
  "size": 0,
  "aggs": {
    "type_avg": {
      "filters": {
        "filters": {
          "wx": {
            "term": {
              "type.keyword": "武侠"
            }
          },
          "xh": {
            "term": {
              "type.keyword": "玄幻"
            }
          }
        },
        "other_bucket_key": "other_type"
      }
    }
  }
}

Missing aggregation

需求:搜索没有价格字段的小说数量。

POST /library_db/_search?size=0
{
  "aggs": {
    "without_prices": {
      "missing": { "field": "prices" }
    }
  }
}

Multi Terms aggregation(多字段聚合)

需求:根据作者和小说类型字段分组,统计小说数量。

分析:相当于MySQL中对多个字段进行分组。Multi Terms aggregation与terms aggregation相似,但是前者效率低于后者,所以官方文档中建议:如果经常使用的同一组字段,则将这些字段的组合键索引设置为单独的字段并在此字段上使用terms aggregation。

GET /library_db/_search
{
  "size": 0, 
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [{
          "field": "type.keyword" 
        }, {
          "field": "author.keyword"
        }]
      }
    }
  }
}

本文含有隐藏内容,请 开通VIP 后查看

网站公告

今日签到

点亮在社区的每一天
去签到