核心概念
一、倒排索引原理
倒排索引又叫反向索引(inverted index),既然有反向索引那就有正向索引(forward index)了。
正向索引:当用户发起查询时(假设查询为一个关键词),搜索引擎会扫描索引库中的所有文档,找出所有包含关键词的文档,这样依次从文档中去查找是否含有关键词的方法叫做正向索引。
文档1的ID→单词1的信息;单词2的信息;单词3的信息…
文档2的ID→单词3的信息;单词2的信息;单词4的信息…
反向索引:搜索引擎会把正向索引变为反向索引(倒排索引)即把“文档→单词”的形式变为“单词→文档”的形式。
单词1→文档1的ID;文档2的ID;文档3的ID…
单词2→文档1的ID;文档4的ID;文档7的ID…
单词-文档矩阵:
D1:乔布斯去了中国。
D2:苹果今年仍能占据大多数触摸屏产能。
D3:苹果公司首席执行官史蒂夫·乔布斯宣布,iPad2将于3月11日在美国上市。
D4:乔布斯推动了世界,iPhone、iPad、iPad2,一款一款接连不断。
D5:乔布斯吃了一个苹果。
倒排索引
1、概念:倒排索引是实现“单词-文档矩阵”的一种具体存储形式,通过倒排索引,可以根据单词快速获取包含这个单词的文档列表。倒排索引主要由两个部分组成:“单词词典”和“倒排文件”。
单词词典(Lexicon):搜索引擎的通常索引单位是单词,单词词典是由文档集合中出现过的所有单词构成的字符串集合,单词词典内每条索引项记载单词本身的一些信息以及指向“倒排列表”的指针。
倒排列表(PostingList):倒排列表记载了出现过某个单词的所有文档的文档列表及单词在该文档中出现的位置信息,每条记录称为一个倒排项(Posting)。根据倒排列表,即可获知哪些文档包含某个单词。
倒排文件(Inverted File):所有单词的倒排列表往往顺序地存储在磁盘的某个文件里,这个文件即被称之为倒排文件,倒排文件是存储倒排索引的物理文件。
2、倒排索引简单实例
Doc1:乔布斯去了中国。
Doc2:苹果今年仍能占据大多数触摸屏产能。
Doc3:苹果公司首席执行官史蒂夫·乔布斯宣布,iPad2将于3月11日在美国上市。
Doc4:乔布斯推动了世界,iPhone、iPad、iPad2,一款一款接连不断。
Doc5:乔布斯吃了一个苹果。
这5个文档建立简单的倒排索引:
假设这五个文档中的数字代表文档的ID,比如"Doc1"中的“1”。
单词ID(WordID) | 单词(Word) | 倒排列表(DocID) |
---|---|---|
1 | 乔布斯 | 1,3,4,5 |
2 | 苹果 | 2,3,5 |
3 | iPad2 | 3,4 |
4 | 宣布 | 3 |
5 | 了 | 1,4,5 |
… | … | … |
首先要用分词系统将文档自动切分成单词序列,这样就让文档转换为由单词序列构成的数据流,并对每个不同的单词赋予唯一的单词编号(WordID),并且每个单词都有对应的含有该单词的文档列表即倒排列表。
单词ID(WordID) | 单词(Word) | 倒排列表(DocID;TF;)(文档ID,单词频次,<单词位置>) |
---|---|---|
1 | 乔布斯 | (1;1;<1>),(3;1;<6>),(4;1;<1>),(5;1;<1>) |
2 | 苹果 | (2;1;<1>),(3;1;<1>),(5;1;<5>) |
3 | iPad2 | (3;1;<8>),(4;1;<7>) |
4 | 宣布 | (3;1;<7>) |
5 | 了 | (1;1;<3>),(4;1;<3>)(5;1;<3>) |
二、网页、索引、类型(Type)的区别
概念 | 说明 | 版本变化 |
---|---|---|
文档(Document) | 数据基本单元,JSON格式存储(如一条用户记录) | 始终存在 |
索引(Index) | 文档的集合,包含字段定义(Mapping)和配置(Settings) | 核心概念,持续存在 |
类型(Type) | 旧版本中用于逻辑分类(类似数据库的“表分区”),Elasticsearch 7.x后已弃用 | 7.x默认移除,8.x彻底删除(建议使用独立索引替代) |
三、分片(Shard)和副本(Replica)的作用
概念 | 作用 | 设计目标 |
---|---|---|
分片(Shard) | 1. 横向扩展:将索引拆分为多个子集,分布到不同节点 2. 提升写入性能:并行处理数据 | 支持大数据量和高并发 |
副本(Replica) | 1. 高可用:主分片的复制品,防止数据丢失 2. 提升查询性能:负载均衡读请求 | 保障容灾能力和查询吞吐量 |
四、CRUD操作
CRUD操作与REST API
1、创建文档
语法:PUT /索引名/_doc/文档ID(指定ID)或 POST /索引名/_doc(自动生成ID)。
PUT /blogs/_doc/1
{
"title": "Elasticsearch实战应用",
"author": "张三",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
PUT /blogs/_doc/2
{
"title": "Elasticsearch理论基础",
"author": "李四",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
2、查询文档
语法:GET /索引名/_doc/文档ID 或 GET /索引名/_search(全文检索)
GET /blogs/_search
{
"query": {"match": {"content": "Elasticsearch"}}
}
query:这是一个顶级字段,用于指定搜索的查询条件。
match:这是一个查询类型,表示使用“匹配查询”。match 查询会根据字段中的内容返回匹配的文档。
content:这是要搜索的字段名称,表示要在 content 字段中查找内容。
Elasticsearch:这是要搜索的实际内容,表示要在 content 字段中查找包含 Elasticsearch 这个关键字的文档。
{
"took" : 340,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.13353139,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.13353139,
"_source" : {
"title" : "Elasticsearch实战应用",
"author" : "张三",
"content" : "Elasticsearch是一个分布式搜索引擎..."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.13353139,
"_source" : {
"title" : "Elasticsearch理论基础",
"author" : "李四",
"content" : "Elasticsearch是一个分布式搜索引擎..."
}
}
]
}
}
3、更新文档
全量更新:PUT /索引名/_doc/文档ID 覆盖原内容
部分更新:POST /索引名/_doc/文档ID/_update 使用doc字段修改
POST /blogs/_doc/1/_update
{
"doc": {"title": "Elasticsearch实战"}
}
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
4、删除文档
语法:DELETE /索引名/_doc/文档ID
DELETE /blogs/_doc/2
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_version" : 6,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}
Bulk API批量操作
作用:单次请求处理多个增删改操作
POST _bulk
{"index": {"_index": "logs", "_id": "1"}}
{"message": "日志1"}
{"index": {"_index": "logs", "_id": "2"}}
{"message": "日志2"}
{"delete": {"_index": "logs", "_id": "3"}}
index:表示一个索引操作,用于将文档添加到指定的索引中。
_index: "logs":指定要操作的索引为 logs。
_id: "1":指定文档的ID为 1。
{"message": "日志1"}:这是文档的内容,表示一个包含 message 字段的日志。
delete:表示一个删除操作,用于从指定的索引中删除指定ID的文档。
_index": "logs":指定要操作的索引为 logs。
_id": "3":指定要删除的文档ID为 3。
GET /logs/_search
{
"query": {"match_all": {}}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "logs",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"message" : "日志1"
}
},
{
"_index" : "logs",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"message" : "日志2"
}
}
]
}
}
搜索与查询
1、Query DSL结构
叶子查询
直接对字段进行查询,如 match、term、range 等,适用于单字段检索,支持全文或精确匹配。
{"match": {"title": "Elasticsearch"}}
在content字段中搜索包含Elasticsearch这个关键字的文档
GET /blogs/_search
{
"query": {
"match": {
"content": "Elasticsearch"
}
}
}
在logs索引中,查找timestamp字段值在2023年1月1日至2023年12月31日期间内的文档
GET /logs/_search
{
"query": {
"range": {
"timestamp": {
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
}
}
在特定字段中查找确切的值。该查询不会分析字段内容,适用于精确匹配。
GET /products/_search
{
"query": {
"term": {
"category": "electronics"
}
}
}
查找包含特定字段的文档。
在users索引中,查找包含email字段的文档
GET /users/_search
{
"query": {
"exists": {
"field": "email"
}
}
}
允许在字段中进行前缀匹配,同时支持多个词项。
在title字段中,查找包含生产和实战的文档。
GET /blogs/_search
{
"query": {"match_bool_prefix": {
"title": "生产 实战"
}}
}
响应结果
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.5442266,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.5442266,
"_source" : {
"title" : "Elasticsearch实战",
"author" : "张三",
"content" : "Elasticsearch是一个分布式搜索引擎..."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "WEzTrJUBumy3Y6iJ6MEO",
"_score" : 2.5153382,
"_source" : {
"title" : "Elasticsearch生产测试",
"author" : "李四五",
"content" : "Elasticsearch是一个分布式搜索引擎..."
}
}
]
}
}
在title字段中查找包含"开发"的文档
GET /blogs/_search
{
"query": {"match_phrase": {
"title": "开发"
}}
}
在特定字段中查找以指定前缀开头的值。
GET /blogs/_search
{
"query": {
"prefix": {
"title.keyword": "生产"
}
}
}
复合查询
通过逻辑操作组合多个查询条件,如 bool、dis_max 等,支持 must(AND)、should(OR)、must_not(NOT)等逻辑运算。
布尔逻辑:
bool 查询:这是一个复合查询,允许我们组合多个查询使用布尔逻辑。
must (与)条件:
GET /books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Elasticsearch" // 必须包含"如家"的文档
}
},
{
"term": {
"author.keyword": "吴十一" // 精确匹配城市为北京
}
}
]
}
}
}
响应数据
{
-----
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "9",
"_score" : 2.6632528,
"_source" : {
"title" : "Elasticsearch优化实践",
"author" : "吴十一",
"price" : 107.0,
"publish_date" : "2023-09-09"
}
}
]
}
}
GET /books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Elasticsearch"
}
},
{
"term": {
"price": 99.9
}
}
]
}
}
}
{
----
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.023975,
"_source" : {
"title" : "Elasticsearch实战",
"author" : "张三",
"price" : 99.9,
"publish_date" : "2023-01-01"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "21",
"_score" : 1.023975,
"_source" : {
"title" : "Elasticsearch实战",
"author" : "张三",
"price" : 99.9,
"publish_date" : "2023-01-01"
}
}
]
}
}
must_not (非)条件:
GET /books/_search
{
"query": {
"bool": {
"must_not": [
{
//排除价格10-129的文档
"range": {
"price": {
"gte": 10,
"lte": 129
}
}
}
]
}
}
}
{
---
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "13",
"_score" : 0.0,
"_source" : {
"title" : "Elasticsearch分布式架构",
"author" : "赵十五",
"price" : 130.0,
"publish_date" : "2024-01-01"
}
}
]
}
}
should 条件:
这表示满足该条件的文档将在结果中获得更高的相关性评分,但不是强制要求。
GET /books/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"price": {
"value": 120,
"boost": 2
}
}
},
{
"match_phrase": {
"title": {
"query": "核心原理",
"boost": 1 整体权重加倍
}
}
}
],
"minimum_should_match": 1 至少满足1个should条件
}
}
}
match_phrase:要求匹配的词语不仅要在字段中存在,还要按顺序出现
得到的结果不一定是或,
{
----
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "3",
"_score" : 9.41843,
"_source" : {
"title" : "Elasticsearch核心原理",
"author" : "王五",
"price" : 105.0,
"publish_date" : "2023-03-03"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"title" : "深入理解Elasticsearch",
"author" : "李四",
"price" : 120.0,
"publish_date" : "2023-02-02"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "22",
"_score" : 2.0,
"_source" : {
"title" : "深入理解Elasticsearch",
"author" : "李四",
"price" : 120.0,
"publish_date" : "2023-02-02"
}
}
]
}
}
filter 条件:
filter 查询不评分,只检查是否匹配,从而提高查询效率。
GET /books/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"price": [
120,
90
]
}
}
}
}
}
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"title" : "深入理解Elasticsearch",
"author" : "李四",
"price" : 120.0,
"publish_date" : "2023-02-02"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"title" : "从零开始学Elasticsearch",
"author" : "刘八",
"price" : 90.0,
"publish_date" : "2023-06-06"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "22",
"_score" : 0.0,
"_source" : {
"title" : "深入理解Elasticsearch",
"author" : "李四",
"price" : 120.0,
"publish_date" : "2023-02-02"
}
}
]
GET /books/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gte": 85,
"lte": 90
}
}
}
}
}
}
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"title" : "从零开始学Elasticsearch",
"author" : "刘八",
"price" : 90.0,
"publish_date" : "2023-06-06"
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "19",
"_score" : 0.0,
"_source" : {
"title" : "Elasticsearch快速入门",
"author" : "郑二十一",
"price" : 85.0,
"publish_date" : "2024-07-07"
}
}
]
2、分页与排序
分页:
浅分页:
//适用于需要分页展示结果,通常用于用户界面的分页浏览。
GET /books/_search
{
"query": {"match_all": {}},
"from": 1,
"size": 2
}
做过测试,越往后的分页,执行的效率越低。总体上会随着from的增加,消耗时间也会增加。而且数据量越大,就越明显!from+size查询在10000-50000条数据(1000到5000页)以内的时候还是可以的,但是如果数据过多的话,就会出现深分页问题。
深分页:
//适用于需要逐批检索大量数据的情况,如数据导出、批量处理等
GET /books/_search?scroll=5m
{
"query": {
"match_all": {}
},
"size": 2
}
GET _search/scroll
{
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAD2iEWTC1ENkdnWHpSUGFFVU1Cc2Q2SmhqQQ",
"scroll": "5m"
}
第一次查询响应会有_scroll_id,scroll=5m表示该_scroll_id时效性为5分钟,GET _search/scroll查询下一页的数据,一直请求一直翻页。
分页方式 | 性能 | 优点 | 缺点 | 场景 |
---|---|---|---|---|
from + size | 低 | 灵活性好,实现简单 | 深度分页问题 | 数据量比较小,能容忍深度分页问题 |
scroll | 中 | 解决了深度分页问题 | 无法反应数据的实时性(快照版本)维护成本高,需要维护一个 scroll_id | 海量数据的导出需要查询海量结果集的数据 |
search_after | 高 | 性能最好不存在深度分页问题能够反映数据的实时变更 | 实现复杂,需要有一个全局唯一的字段连续分页的实现会比较复杂,因为每一次查询都需要上次查询的结果 | 海量数据的分页 |
排序:
GET /books/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 200, //es默认分页10条数据,所以这里自定义分页显示所以数据
//先price正序,同price内,publish_date倒序
"sort": [
{
"price": "asc"
},
{
"publish_date": "desc"
}
]
}
五、索引
创建索引
PUT /my_index {
"settings": {
"number_of_shards": 3, // 主分片数(创建时固定)
"number_of_replicas": 2 // 副本数(提升容灾能力)
}
}
PUT /my_index_1
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"price": {
"type": "float"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
GET /my_index_1
修改索引
//修改设置
PUT /my_index_1/_settings
{
"number_of_replicas": 2
}
//新增字段
PUT /my_index_1/_mapping(或_mapping)
{
"properties": {
"address1": { "type": "text" }
}
}
迁移索引
//创建一个索引
PUT /test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" }
}
}
}
//添加文档
POST _bulk
{"index": {"_index": "test"}}
{"field1": "日志1","field2":23}
{"index": {"_index": "test"}}
{"field1": "日志2","field2":24}
{"index": {"_index": "test"}}
{"field1": "日志3","field2":25}
{"index": {"_index": "test"}}
{"field1": "日志4","field2":26}
//创建一个新索引
PUT /test2
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" }
}
}
}
//数据迁移(也理解为复制)
POST /_reindex
{
"source": { "index": "test" },
"dest": { "index": "test2" }
}
DELETE /test
六、聚合分析
指标聚合
定义:对数值字段进行数学运算,输出单一统计值。
Sum:计算字段总和、Avg:计算平均值、Min/Max:找极值、Stats:综合统计(含count、sum、min、max、avg)
GET /books/_search
{
"query": {
"range": {
"price": {
"gte": 110,
"lte": 119
}
}
},
"aggs": {
"price_sum": {
"sum": {
"field": "price"
}
},
"price_avg":{
"avg": {
"field": "price"
}
},
"price_max":{
"max": {
"field": "price"
}
},
"price_min":{
"min": {
"field": "price"
}
},
"price_stats":{
"stats": {
"field": "price"
}
}
}
}
桶聚合
定义:按字段值或条件将文档分组到桶中,类似SQL的GROUP BY。
Terms:按字段值分组、Date Histogram:按时间间隔分组、Range:按数值范围分组。
//from [ to )
GET /books/_search
{
"query": {
"range": {
"price": {
"gte": 110,
"lte": 119
}
}
},
"sort": [
{
"price": {
"order": "asc"
}
}
],
"aggs": {
"按标题分组": {
"terms": {
"field": "title.keyword",
"size": 100
}
},
"按年份分组": {
"date_histogram": {
"field": "publish_date",
"interval": "year"
},
"aggs": {
"按数值范围分组": {
"range": {
"field": "price",
"ranges": [
{
"from": 110,
"to": 113
},
{
"from": 113,
"to": 115
},
{
"from": 115,
"to": 119
}
]
},
"aggs": {
"按价格分组": {
"terms": {
"field": "price",
"size": 100
}
}
}
}
}
}
}
}
{
---
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
---
]
},
"aggregations" : {
"按年份分组" : {
"buckets" : [
{
"key_as_string" : "2023-01-01T00:00:00.000Z",
"key" : 1672531200000,
"doc_count" : 3,
"按数值范围分组" : {
"buckets" : [
{
"key" : "110.0-113.0",
"from" : 110.0,
"to" : 113.0,
"doc_count" : 2,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 110.0,
"doc_count" : 2
}
]
}
},
{
"key" : "113.0-115.0",
"from" : 113.0,
"to" : 115.0,
"doc_count" : 0,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key" : "115.0-119.0",
"from" : 115.0,
"to" : 119.0,
"doc_count" : 1,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 115.0,
"doc_count" : 1
}
]
}
}
]
}
},
{
"key_as_string" : "2024-01-01T00:00:00.000Z",
"key" : 1704067200000,
"doc_count" : 2,
"按数值范围分组" : {
"buckets" : [
{
"key" : "110.0-113.0",
"from" : 110.0,
"to" : 113.0,
"doc_count" : 1,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 112.0,
"doc_count" : 1
}
]
}
},
{
"key" : "113.0-115.0",
"from" : 113.0,
"to" : 115.0,
"doc_count" : 0,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key" : "115.0-119.0",
"from" : 115.0,
"to" : 119.0,
"doc_count" : 1,
"按价格分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 118.0,
"doc_count" : 1
}
]
}
}
]
}
}
]
},
"按标题分组" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Elasticsearch与云计算",
"doc_count" : 1
},
{
"key" : "Elasticsearch性能调优",
"doc_count" : 1
},
{
"key" : "Elasticsearch数据管理",
"doc_count" : 1
},
{
"key" : "Elasticsearch日志分析",
"doc_count" : 1
},
{
"key" : "Elasticsearch高级搜索",
"doc_count" : 1
}
]
}
}
}
嵌套聚合与排序
按指标聚合排序:
GET /books/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 120
}
}
},
"aggs": {
"year_grouy": {
"date_histogram": {
"field": "publish_date",
"interval": "year"
, "order": {
"price_sum": "asc"
}
},
"aggs": {
"price_sum": {
"sum": {
"field": "price"
}
}
}
}
}
}
复合排序:
GET /books/_search
{
"aggs": {
"year_grouy": {
"date_histogram": {
"field": "publish_date",
"interval": "year",
"order": [
{
"price_sum": "asc"
},
{
"price_avg": "asc"
}
]
},
"aggs": {
"price_sum": {
"sum": {
"field": "price"
}
},
"price_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
七、映射与建模
动态映射 vs 显式映射
类型 | 核心特性 | 适用场景 |
---|---|---|
动态映射 | 自动推断字段类型,支持新字段自动添加(通过dynamic 参数控制行为) 6 |
快速索引未知结构数据,减少人工干预 |
显式映射 | 手动定义字段类型及属性(如分析器、索引选项),确保数据一致性 6 |
需严格数据模型控制的场景(如金融数据) |
动态映射可能导致字段类型冲突(如字符串被误判为数值),需通过dynamic参数限制。
显式映射需提前规划字段结构,但能优化存储和查询性能
字段数据类型(text, keyword, date)
Elasticsearch提供多种字段类型,核心类型包括:
Text
全文检索字段,自动分词(需配置分析器)。
示例:“title”: {“type”: “text”}
Keyword
精确匹配/聚合字段,不分词。
示例:“category”: {“type”: “keyword”}
Date
支持ISO 8601格式及时间戳,自动解析。
示例:“created_at”: {“type”: “date”}
复合类型:
Object:嵌套JSON对象(非独立索引)。
Nested:独立索引数组元素,支持复杂查询。
全部代码
PUT /blogs/_doc/1
{
"title": "Elasticsearch实战应用",
"author": "张三",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
PUT /blogs/_doc/2
{
"title": "Elasticsearch理论基础",
"author": "李四五",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
POST /blogs/_doc
{
"title": "生产测试",
"author": "李四五",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
POST /blogs/_doc
{
"title": "生产测试Elasticsearch",
"author": "李四五",
"content": "Elasticsearch是一个分布式搜索引擎..."
}
GET /blogs/_search
{
"query": {"match": {"title": "Elasticsearch"}}
}
GET /blogs/_search
{
"query": {"match_all": {}}
}
POST /blogs/_doc/1/_update
{
"doc": {"title": "Elasticsearch实战"}
}
DELETE /blogs/_doc/2
POST _bulk
{"index": {"_index": "logs", "_id": "1"}}
{"message": "日志1"}
{"index": {"_index": "logs", "_id": "2"}}
{"message": "日志2"}
{"delete": {"_index": "logs", "_id": "3"}}
GET /logs/_search
{
"query": {"match_all": {}}
}
GET /logs/_search
{
"query": {"match": {
"message": "日志"
}}
}
GET /blogs/_search
{
"query": {"match_bool_prefix": {
"title": "生产 实战"
}}
}
GET /blogs/_search
{
"query": {"match_phrase": {
"title": "开发"
}}
}
GET /blogs/_search
{
"query": {
"prefix": {
"title.keyword": "Elasticsearch"
}
}
}
POST _bulk
{"index": {"_index": "books", "_id": "1"}}
{"title": "Elasticsearch实战", "author": "张三", "price": 99.9, "publish_date": "2023-01-01"}
{"index": {"_index": "books", "_id": "2"}}
{"title": "深入理解Elasticsearch", "author": "李四", "price": 120.0, "publish_date": "2023-02-02"}
{"index": {"_index": "books", "_id": "3"}}
{"title": "Elasticsearch核心原理", "author": "王五", "price": 105.0, "publish_date": "2023-03-03"}
{"index": {"_index": "books", "_id": "4"}}
{"title": "Elasticsearch高级搜索", "author": "赵六", "price": 110.0, "publish_date": "2023-04-04"}
{"index": {"_index": "books", "_id": "5"}}
{"title": "Elasticsearch集群管理", "author": "陈七", "price": 100.0, "publish_date": "2023-05-05"}
{"index": {"_index": "books", "_id": "6"}}
{"title": "从零开始学Elasticsearch", "author": "刘八", "price": 90.0, "publish_date": "2023-06-06"}
{"index": {"_index": "books", "_id": "7"}}
{"title": "Elasticsearch日志分析", "author": "孙九", "price": 115.0, "publish_date": "2023-07-07"}
{"index": {"_index": "books", "_id": "8"}}
{"title": "Elasticsearch与大数据", "author": "周十", "price": 125.0, "publish_date": "2023-08-08"}
{"index": {"_index": "books", "_id": "9"}}
{"title": "Elasticsearch优化实践", "author": "吴十一", "price": 107.0, "publish_date": "2023-09-09"}
{"index": {"_index": "books", "_id": "10"}}
{"title": "Elasticsearch高效开发", "author": "郑十二", "price": 108.0, "publish_date": "2023-10-10"}
{"index": {"_index": "books", "_id": "11"}}
{"title": "Elasticsearch搜索引擎", "author": "王十三", "price": 95.0, "publish_date": "2023-11-11"}
{"index": {"_index": "books", "_id": "12"}}
{"title": "Elasticsearch数据管理", "author": "李十四", "price": 110.0, "publish_date": "2023-12-12"}
{"index": {"_index": "books", "_id": "13"}}
{"title": "Elasticsearch分布式架构", "author": "赵十五", "price": 130.0, "publish_date": "2024-01-01"}
{"index": {"_index": "books", "_id": "14"}}
{"title": "Elasticsearch性能调优", "author": "陈十六", "price": 112.0, "publish_date": "2024-02-02"}
{"index": {"_index": "books", "_id": "15"}}
{"title": "Elasticsearch与云计算", "author": "刘十七", "price": 118.0, "publish_date": "2024-03-03"}
{"index": {"_index": "books", "_id": "16"}}
{"title": "Elasticsearch安全管理", "author": "孙十八", "price": 122.0, "publish_date": "2024-04-04"}
{"index": {"_index": "books", "_id": "17"}}
{"title": "Elasticsearch扩展与插件", "author": "周十九", "price": 125.0, "publish_date": "2024-05-05"}
{"index": {"_index": "books", "_id": "18"}}
{"title": "Elasticsearch最佳实践", "author": "吴二十", "price": 100.0, "publish_date": "2024-06-06"}
{"index": {"_index": "books", "_id": "19"}}
{"title": "Elasticsearch快速入门", "author": "郑二十一", "price": 85.0, "publish_date": "2024-07-07"}
{"index": {"_index": "books", "_id": "20"}}
{"title": "Elasticsearch高级指南", "author": "王二十二", "price": 128.0, "publish_date": "2024-08-08"}
POST _bulk
{"index": {"_index": "books"}}
{"title": "Elasticsearch实战", "author": "张三", "price": 99.9, "publish_date": "2025-03-01"}
{"index": {"_index": "books"}}
{"title": "深入理解Elasticsearch", "author": "李四", "price": 120.0, "publish_date": "2025-01-02"}
GET /books/_search
GET /books/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Elasticsearch"
}
},
{
"term": {
"price": 99.9
}
}
]
}
}
}
GET /books/_search
{
"query": {
"bool": {
"must_not": [
{
"range": {
"price": {
"gte": 10,
"lte": 129
}
}
}
]
}
}
}
GET /books/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"price": {
"value": 120,
"boost": 2
}
}
},
{
"match_phrase": {
"title": {
"query": "核心原理",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
}
}
GET /books/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gte": 85,
"lte": 90
}
}
}
}
}
}
GET /books/_search
{
"query": {
"function_score": {
"query": {
"match": {"title": "E"}
},
"functions": [
{
"filter": {
"term": {
"price": 120
}
}
}
]
}
}
}
GET /books/_search
{
"query": {"match_all": {}},
"from": 0,
"size": 2
}
GET /books/_search
{
"query": {"match_all": {}},
"from": 1,
"size": 2
}
GET /books/_search
{
"query": {"match_all": {}}
}
GET /books/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 200,
"sort": [
{
"price": "asc"
},
{
"publish_date": "desc"
}
]
}
GET /books/_search?scroll=5m
{
"query": {
"match_all": {}
},
"size": 2
}
GET _search/scroll
{
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAD2iEWTC1ENkdnWHpSUGFFVU1Cc2Q2SmhqQQ",
"scroll": "5m"
}
GET /books/_search
{
"query": {
"range": {
"price": {
"gte": 110,
"lte": 119
}
}
},
"aggs": {
"price_sum": {
"sum": {
"field": "price"
}
},
"price_avg":{
"avg": {
"field": "price"
}
},
"price_max":{
"max": {
"field": "price"
}
},
"price_min":{
"min": {
"field": "price"
}
},
"price_stats":{
"stats": {
"field": "price"
}
}
}
}
GET /books/_search
{
"query": {
"range": {
"price": {
"gte": 110,
"lte": 119
}
}
},
"sort": [
{
"price": {
"order": "asc"
}
}
],
"aggs": {
"按标题分组": {
"terms": {
"field": "title.keyword",
"size": 100
}
},
"按年份分组": {
"date_histogram": {
"field": "publish_date",
"interval": "year"
},
"aggs": {
"按数值范围分组": {
"range": {
"field": "price",
"ranges": [
{
"from": 110,
"to": 113
},
{
"from": 113,
"to": 115
},
{
"from": 115,
"to": 119
}
]
},
"aggs": {
"按价格分组": {
"terms": {
"field": "price",
"size": 100
}
}
}
}
}
}
}
}
GET /books/_search
{
"aggs": {
"year_grouy": {
"date_histogram": {
"field": "publish_date",
"interval": "year",
"order": [
{
"price_sum": "asc"
},
{
"price_avg": "asc"
}
]
},
"aggs": {
"price_sum": {
"sum": {
"field": "price"
}
},
"price_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
PUT /ceshi1
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
PUT /my_index_1
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"price": {
"type": "float"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
GET /my_index_1
GET /ceshi1/_mapping
PUT /my_index_1/_settings
{
"number_of_replicas": 2
}
PUT /my_index_1/_mapping(或_mapping)
{
"properties": {
"address1": { "type": "text" }
}
}
PUT /test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" }
}
}
}
POST _bulk
{"index": {"_index": "test"}}
{"field1": "日志1","field2":23}
{"index": {"_index": "test"}}
{"field1": "日志2","field2":24}
{"index": {"_index": "test"}}
{"field1": "日志3","field2":25}
{"index": {"_index": "test"}}
{"field1": "日志4","field2":26}
PUT /test2
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" }
}
}
}
POST /_reindex
{
"source": { "index": "test" },
"dest": { "index": "test2" }
}
DELETE /test
GET /test/_search