Elasticsearch
Elasticsearch 是由以色列程序员 Shay Banon 使用 Java 编写的分布式搜索和分析引擎,首个版本于 2010 年发布,使用 Apache Lucene 协议开源;
2012 年,Banon 创建了 Elasticsearch BV 美国公司(后更名为 Elastic NV),该公司作为 Elasticsearch 和其相关项目的正式赞助商并主导后续开发。
各个版本主要特性和变更
| Released | Versions | Main features & changes |
|---|---|---|
| 2024-03 | 8.x | - Enhanced Security: Built-in security, including advanced encryption and role-based access control; - Further improvements in cluster management and data ingestion; - Advanced Machine Learning; - Data streams for continuous data ingestion; |
| 2020-10 | 7.x | - Improved Scalability: Better horizontal scalability and performance; - Security by Default: Basic security features like TLS/SSL enabled by default; - Dense Vector Fields: Added support for machine learning and vector-based search; |
| 2019-03 | 6.x | - Sequence Numbers: Improved consistency and recovery; - Index Lifecycle Management; - Cross-Cluster Search: Enabled searching across multiple clusters; |
| 2017-06 | 5.x | - Ingest Nodes: Added ability to preprocess documents before indexing; - New Query DSL: Enhanced query capabilities and added new query types; - Rollup Jobs: Allowed summarizing data into more compact forms; |
| - | 3-4 | 跳过,和 Luence 版本对齐,对架构大幅重构 |
| 2015-11 | 2.x | - Better Memory Management, improved heap memory usage; - Pipeline Aggregations; - Security Enhancements |
| 2011-02 | 1.x | - Distributed search, replication, and sharding; - Basic Query DSL; - Full-text Indexing and Search Capabilities; |
容器安装 ES & Kibana
如无特别说明,ELK 组件版本为 8.19.3 .
Start a single-node cluster
测试环境 Docker 27.2.0
7.x 和 8.x 容器方式安装大同小异,8.x 默认启用 xpack.security* 相关安全特性,研测环境可以参考后面虚拟机安装设置关掉.
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.19.3
export PATH_DATA=$HOME/v/data/es01
export PATH_LOG=$HOME/v/log/es01
export PATH_ETC=$HOME/v/etc/es01
mkdir -p $PATH_DATA
mkdir -p $PATH_LOG
mkdir -p $PATH_ETC
cat << EOF > $PATH_ETC/elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
path.data: /usr/share/elasticsearch/data
path.logs: /usr/share/elasticsearch/logs
xpack.security.enabled: false
EOF
docker run \
--restart unless-stopped \
-d \
--name es01 \
-e "discovery.type=single-node" \
-p 9200:9200 \
-p 9300:9300 \
-v $PATH_DATA:/usr/share/elasticsearch/data \
-v $PATH_LOG:/usr/share/elasticsearch/logs \
-v $PATH_ETC/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
docker.elastic.co/elasticsearch/elasticsearch:8.19.3Confirm it is up:
curl -XPOST -H 'Content-Type: application/json' localhost:9200/myidx/_doc -d '{"id":"1","title":"foo"}'
curl 'localhost:9200/myidx/_search?pretty'ref: https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docker.html
(OPTIONAL) Start a Kibana container
docker pull docker.elastic.co/kibana/kibana:8.19.3
export PATH_DATA=$HOME/v/data/kib01
export PATH_LOG=$HOME/v/log/kib01
export PATH_ETC=$HOME/v/etc/kib01
mkdir -p $PATH_DATA
mkdir -p $PATH_LOG
mkdir -p $PATH_ETC
cat << EOF > $PATH_ETC/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://10.0.0.1:9200"]
EOF
docker run \
--restart unless-stopped \
-d \
--name kib01 \
-p 5601:5601 \
-v $PATH_DATA:/usr/share/kibana/data \
-v $PATH_LOG:/usr/share/kibana/logs \
-v $PATH_ETC/kibana.yml:/usr/share/kibana/config/kibana.yml \
docker.elastic.co/kibana/kibana:8.19.3Confirm it is up: curl http://localhost:5601/app/home
ref: https://www.elastic.co/guide/en/kibana/7.17/settings.html
二进制安装 ES, Kibana & Filebeat v8.x
从官网下载 二进制压缩包,并解压
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.19.3-darwin-aarch64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.19.3-darwin-aarch64.tar.gz.sha512
shasum -a 512 -c elasticsearch-8.19.3-darwin-aarch64.tar.gz.sha512
tar -xzf elasticsearch-8.19.3-darwin-aarch64.tar.gz
xattr -d -r com.apple.quarantine elasticsearch-8.19.3/
cd elasticsearch-8.19.3/其他平台
Run Elasticsearch ./bin/elasticsearch
Mark down the elastic's password and kibana enrollment token.
Check that Elasticsearch is running
export ES_HOME=path/to/elasticsearch-8.19.3/
export ELASTIC_PASSWORD="your_password"
curl --cacert $ES_HOME/config/certs/http_ca.crt -u elastic:$ELASTIC_PASSWORD https://localhost:9200wget https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-darwin-aarch64.tar.gz
wget https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-darwin-aarch64.tar.gz.sha512
shasum -a 512 -c kibana-8.19.3-darwin-aarch64.tar.gz.sha512
tar -xzf kibana-8.19.3-darwin-aarch64.tar.gz
xattr -d -r com.apple.quarantine kibana-8.19.3
cd kibana-8.19.3Run kibana ./bin/kibana
配置 filebeat 和 ES 通过 ES 通信
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "elastic"
password: "elastic"
ssl.verification_mode: "none"
# or copy $ES_HOME/config/certs/http_ca.crt to path/to/filebeat/root
# ssl.certificate_authorities: http_ca.crt从 ES 容器实例中复制 HTTP CA 证书 docker cp es00:/usr/share/elasticsearch/config/certs/http_ca.crt .
测试 curl --cacert http_ca.crt -u elastic:elastic https://localhost:9200
二进制安装 ES & Kibana v8.x on Windows
测试环境: Windows 11
在官网下载 https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-windows-x86_64.zip 解压到 c:\pkgs\es
启动 c:\pkgs\es\bin\elasticsearch.bat
留意最后几行,提到 elastic 账户名和对应密码
新开一个 WSL 中测试,自带 cURL 方便 curl --cacert config/certs/http_ca.crt -u elastic:elastic https://localhost:9200
可选,下载 kibana 解压到 c:\pkgs\kibana
注意如果上一步中关闭了 https 和鉴权,kibana 的 config/kibana.yml 配置文件也要需将 es 访问地址做相应修改
可选,重置 es 内置 kibana 相关账号密码 c:\pkgs\es\bin\elasticsearch-reset-password -u kibana_system --url https://localhost:9200
访问 kibana http://localhost:5601/
http://localhost:5601/app/dev_tools#/console 可用于代替 cURL 调试查询
使用 Python 库访问 https://www.elastic.co/guide/en/elasticsearch/client/python-api/8.1/overview.html
常用术语
- document 文档 相当于数据库表中某一条记录
- index(noun) 索引 相当于数据库的 db+tbl 概念,逻辑分类
- index(verb) 索引 添加一个文档到存储并让它可以搜索
- indices/indexes 索引复述
- mapping 映射?
- type 从 v7.x 开始已经废弃概念
常用接口和命令
HTTP 接口请求都是类似以下格式
curl
--cacert http_ca.crt \
-u elastic:$ELASTIC_PASSWORD \
-X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'path前缀_cat是指Compact and aligned text(CAT) APIs;- query-string 参数
pretty会自动美化 JSON 格式化输出;format指定响应格式,支持 text json smile yaml cbor 几种;_cat接口默认响应不打印字段头,带上参数v=ture才打印;
查看节点健康状态 curl localhost:9200/_cat/health?v=true
查看集群健康 curl localhost:9200/_cluster/health
查看主节点 curl 'localhost:9200/_cat/master?format=json' | jq
查看文档总数
- kibana 中
GET _count?pretty - 等效于 cURL
curl localhost:9200/_count?pretty
查看所有索引状态 GET _cat/indices?pretty
查看指定索引文档总数 GET myidx/_count?pretty
查询指定主键文档 GET <idx索引名称>/_doc/<id文档主键> GET myidx/_doc/1?pretty
创建文档并指定 id
POST myidx/_doc/1?pretty
{"id":"1","title":"hello"}创建文档但不指定 id,ES 会自动生成唯一
POST myidx/_doc?pretty
{"title":"hello"}注意和 cURL 有差异,后者需要带头 curl -XPOST -H 'Content-Type: application/json' localhost:9200/myidx/_doc/1?pretty -d '{"id":"1","title":"hello"}'
更新文档(覆盖已有)
POST myidx/_doc/1?pretty
{"title":"你好"}更新文档(仅部分字段),注意,请求体需要包一层 doc
POST myidx/_update/1?pretty
{"doc":{"title":"你好"}}删除文档 DELETE myidx/_doc/1
按查询范围删除批量文档
POST .ds-logs-xxx-2049.05.28-000001/_delete_by_query?scroll_size=10000
{
"query": {
"range": {
"@timestamp": {
"lte": "2049-05-30T00:00:00.000Z"
}
}
}
}
POST .ds-logs-xxx-2049.05.28-000001/_count
{
"query": {
"range": {
"@timestamp": {
"lte": "2049-05-30T00:00:00.000Z"
}
}
}
}
# 释放已删除文档占用的存储空间
POST .ds-logs-2049.05.28-000001/_forcemerge?only_expunge_deletes=true
# 查看 segment
GET .ds-logs-2049.05.28-000001/_segments注:在 SSD 硬盘 ES v8.x 环境中,批量删除超过 3000 万文档时容易卡顿假死。可通过制定时间片分批删除。
删除索引 DELETE myidx/
搜索文档,多种匹配方式 GET _search?q=A%20Tale%20of%20Two%20Cities&pretty&size=5
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/searching.html#searching-uri
GET book/_search
{
"query": {
"match": {"title": "A Tale of Two Cities"}
}
}
GET _search
{
"query": {
"multi_match": {
"query": "A Tale of Two Cities",
"fields": [
"alias",
"title"
]
}
}
}
GET book/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "A Tale of Two Cities"
}
},
{
"match": {
"alias": "A Tale of Two Cities"
}
}
]
}
},
"from": 5,
"size": 10
}自定义索引字段
TBD.
全文索引
TBD.
常见问题
HTTPS 和 HTTP basic 鉴权
在本地开发时,从 7.x 版本开始,ES 默认启用 HTTPS 和 HTTP basic 鉴权,操作繁琐,可通过以下方式关闭。 修改 es 目录下 config/elasticsearch.yml 配置文件中 以下选项的 true 改为 false 后重启服务。
同时禁用 HTTPS 和 HTTP basic 鉴权:
xpack.security.enabled: false
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false仅禁用 HTTPS 但保留 HTTP basic 鉴权:
xpack.security.enabled: true
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: true设置 HTTP basic 验证
修改 elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true重启服务后执行 bin/elasticsearch-setup-passwords -u elastic -i
修改密码 curl -H "Content-Type:application/json" -XPOST -u elastic:elastic 'http://127.0.0.1:9200/_xpack/security/user/elastic/_password' -d '{ "password" : "newPassword" }'
索引容量估算
某原始样本文本数据 200 MB,落入 Postgres 16.x 存储一表约占 270 MB (\d+ 查看)记录数据导入到 ES 7.x ,默认不自定义索引,使用内置自动索引、默认压缩算法容量为 290 MB 。
查看存储占用容量 curl localhost:9203/_cat/indices?v=true ,输出说明
pri.store.size只计算主分片存储占用容量store.size计算主和副本分片存储占用容量总和
默认数据压缩 index.codec: default 为 LZ4
限制内存使用
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/circuit-breaker.html
自动删除过期文档索引
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-lifecycle-management.html
快照备份和恢复
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/snapshot-restore.html
读写分片路由设置
TBD.
冷热数据区分存储
v7.17 升级 v8.x
- v7.x 如果不是 v7.17 最新版本,先升级到 v7.17
- 在 v7.17 上设置
path.repo选项,重启 - 在 v7.17 kibana 设置 repo、新建 snap policy,生成数据快照
- 在 v8 上设置
path.repo选项,将 v7.17 snap 数据复制到该目录下后,重启 - 在 v8 kibana 设置 repo,倒入快照恢复,如果已存在重名 index 需先删除
- 提示 field _anonymous xxx 错误,在 v8 kibana 后台 refresh index 修复
