elasticsearch如何使用reindex迁移索引,完成分片的拆分

发布于:2024-05-24 ⋅ 阅读:(156) ⋅ 点赞:(0)

1、删除我的测试索引:old_index

curl -X DELETE "http://`hostname -i`:9200/old_index"
curl -X DELETE "http://`hostname -i`:9200/new_index"

2、检查集群索引情况

$ curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb

3、新建测试索引:old_index

# 注释
# 1、我只有一个节点,为了测试方便,副本 number_of_replicas 设置为0
# 2、假设我的源索引分片为1,number_of_shards设置为1,用于后续对比验证
curl -X PUT "http://`hostname -i`:9200/old_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}'
# 返回结果,代表索引创建成功
{"acknowledged":true,"shards_acknowledged":true,"index":"old_index"}

4、在old_index索引中插入几条测试数据

curl -X POST "http://`hostname -i`:9200/old_index/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_index": "old_index", "_id": "1" } }
{ "name": "可乐", "description": "大数据SRE工程师", "publish_date": "1991-05-20" }
{ "index": { "_index": "old_index", "_id": "2" } }
{ "name": "炎长", "description": "DBA工程师", "publish_date": "1992-11-23" }
'

# 返回结果
{
	"took": 6,
	"errors": false,
	"items": [{
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 0,
			"_primary_term": 1,
			"status": 201
		}
	}, {
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 1,
			"_primary_term": 1,
			"status": 201
		}
	}]
}

5、查询old_index索引中的数据

curl -X GET "http://`hostname -i`:9200/old_index/_search"

# 查询结果
{
	"took": 7,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 2,
			"relation": "eq"
		},
		"max_score": 1.0,
		"hits": [{
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_score": 1.0,
			"_source": {
				"name": "可乐",
				"description": "大数据SRE工程师",
				"publish_date": "1991-05-20"
			}
		}, {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_score": 1.0,
			"_source": {
				"name": "炎长",
				"description": "DBA工程师",
				"publish_date": "1992-11-23"
			}
		}]
	}
}

6、新建目标索引:new_index

# 注释
# 1、本次将分片设置为2,是为了模拟reindex拆封分片的功能
# 2、建议将目标索引副本设置为0,没有副本,目标索引写入速度会变快,reindex任务执行相应比有部分的写入速度快。reindex结束后,可以根据需要,重新设置副本。

curl -X PUT "http://`hostname -i`:9200/new_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}'

# 返回结果
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}

7、检查两个索引的数据情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        GrJiGswYRqCibszGIVjZhg   2   0          0            0       454b           454b
green  open   old_index        8k4beb7ETpu6Ki-LpOu_EQ   1   0          2            0        4kb            4kb

8、测试reindex将源索引:old_index中的数据迁移到目标索引:new_index

curl -X POST "http://`hostname -i`:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

# 返回结果,创建成功
{"took":8,"timed_out":false,"total":2,"updated":0,"created":2,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

9、检查索引的迁移进度

# 数据量太小,执行时间可能比较快,查看不到reindex的任务情况

curl -X GET "http://`hostname -i`:9200/_tasks?detailed=true&actions=*reindex&human=true"

10、再次检查集群两个索引的情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        aU3mztzXRXOSk9Q1oiP2RA   1   0          2            0      4.4kb          4.4kb
green  open   old_index        g24b-XDfQZ6BO5zdcIOM0A   1   0          2            0      4.4kb          4.4kb

总结

根据实际的生产场景,reindex对源集群性能带来的影响非常大,不建议这样使用。reindex的逻辑是先查询,再写入,一次全量的查询和持续的写入,想想就知道对源集群有多大的压力。如果你的磁盘性能又特别差,集群负载本身就比较高,那你完蛋了。建议最好的方式是将索引迁移至新的es集群中,这样源集群只会涉及到查询,影响最小,新集群刚开始无业务压力,写入不会增加太大的负担。


网站公告

今日签到

点亮在社区的每一天
去签到