Skip to content

Elasticsearch

Elasticsearch 是由以色列程序员 Shay Banon 使用 Java 编写的分布式搜索和分析引擎,首个版本于 2010 年发布,使用 Apache Lucene 协议开源;
2012 年,Banon 创建了 Elasticsearch BV 美国公司(后更名为 Elastic NV),该公司作为 Elasticsearch 和其相关项目的正式赞助商并主导后续开发。

各个版本主要特性和变更

ReleasedVersionsMain features & changes
2024-038.x- Enhanced Security: Built-in security, including advanced encryption and role-based access control;
- Further improvements in cluster management and data ingestion;
- Advanced Machine Learning;
- Data streams for continuous data ingestion;
2020-107.x- Improved Scalability: Better horizontal scalability and performance;
- Security by Default: Basic security features like TLS/SSL enabled by default;
- Dense Vector Fields: Added support for machine learning and vector-based search;
2019-036.x- Sequence Numbers: Improved consistency and recovery;
- Index Lifecycle Management;
- Cross-Cluster Search: Enabled searching across multiple clusters;
2017-065.x- Ingest Nodes: Added ability to preprocess documents before indexing;
- New Query DSL: Enhanced query capabilities and added new query types;
- Rollup Jobs: Allowed summarizing data into more compact forms;
-3-4跳过,和 Luence 版本对齐,对架构大幅重构
2015-112.x- Better Memory Management, improved heap memory usage;
- Pipeline Aggregations;
- Security Enhancements
2011-021.x- Distributed search, replication, and sharding;
- Basic Query DSL;
- Full-text Indexing and Search Capabilities;

容器安装 ES & Kibana

如无特别说明,ELK 组件版本为 8.19.3 .

Start a single-node cluster

测试环境 Docker 27.2.0

7.x 和 8.x 容器方式安装大同小异,8.x 默认启用 xpack.security* 相关安全特性,研测环境可以参考后面虚拟机安装设置关掉.

shell
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.19.3

export PATH_DATA=$HOME/v/data/es01
export PATH_LOG=$HOME/v/log/es01
export PATH_ETC=$HOME/v/etc/es01

mkdir -p $PATH_DATA
mkdir -p $PATH_LOG
mkdir -p $PATH_ETC

cat << EOF > $PATH_ETC/elasticsearch.yml
cluster.name: "docker-cluster"

network.host: 0.0.0.0

path.data: /usr/share/elasticsearch/data
path.logs: /usr/share/elasticsearch/logs

xpack.security.enabled: false
EOF

docker run \
--restart unless-stopped \
-d \
--name es01 \
-e "discovery.type=single-node" \
-p 9200:9200 \
-p 9300:9300 \
-v $PATH_DATA:/usr/share/elasticsearch/data \
-v $PATH_LOG:/usr/share/elasticsearch/logs \
-v $PATH_ETC/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
docker.elastic.co/elasticsearch/elasticsearch:8.19.3

Confirm it is up:

curl -XPOST -H 'Content-Type: application/json' localhost:9200/myidx/_doc -d '{"id":"1","title":"foo"}'

curl 'localhost:9200/myidx/_search?pretty'

ref: https://www.elastic.co/guide/en/elasticsearch/reference/8.17/docker.html

(OPTIONAL) Start a Kibana container

shell
docker pull docker.elastic.co/kibana/kibana:8.19.3

export PATH_DATA=$HOME/v/data/kib01
export PATH_LOG=$HOME/v/log/kib01
export PATH_ETC=$HOME/v/etc/kib01

mkdir -p $PATH_DATA
mkdir -p $PATH_LOG
mkdir -p $PATH_ETC

cat << EOF > $PATH_ETC/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://10.0.0.1:9200"]
EOF


docker run \
--restart unless-stopped \
-d \
--name kib01 \
-p 5601:5601 \
-v $PATH_DATA:/usr/share/kibana/data \
-v $PATH_LOG:/usr/share/kibana/logs \
-v $PATH_ETC/kibana.yml:/usr/share/kibana/config/kibana.yml \
docker.elastic.co/kibana/kibana:8.19.3

Confirm it is up: curl http://localhost:5601/app/home

ref: https://www.elastic.co/guide/en/kibana/7.17/settings.html

二进制安装 ES, Kibana & Filebeat v8.x

官网下载 二进制压缩包,并解压

shell
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.19.3-darwin-aarch64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.19.3-darwin-aarch64.tar.gz.sha512
shasum -a 512 -c elasticsearch-8.19.3-darwin-aarch64.tar.gz.sha512
tar -xzf elasticsearch-8.19.3-darwin-aarch64.tar.gz

xattr -d -r com.apple.quarantine elasticsearch-8.19.3/
cd elasticsearch-8.19.3/

其他平台

Run Elasticsearch ./bin/elasticsearch

Mark down the elastic's password and kibana enrollment token.

Check that Elasticsearch is running

shell
export ES_HOME=path/to/elasticsearch-8.19.3/
export ELASTIC_PASSWORD="your_password"

curl --cacert $ES_HOME/config/certs/http_ca.crt -u elastic:$ELASTIC_PASSWORD https://localhost:9200
shell
wget https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-darwin-aarch64.tar.gz
wget https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-darwin-aarch64.tar.gz.sha512

shasum -a 512 -c kibana-8.19.3-darwin-aarch64.tar.gz.sha512
tar -xzf kibana-8.19.3-darwin-aarch64.tar.gz

xattr -d -r com.apple.quarantine kibana-8.19.3
cd kibana-8.19.3

Run kibana ./bin/kibana

配置 filebeat 和 ES 通过 ES 通信

yaml
output.elasticsearch:
  hosts: ["https://localhost:9200"]
  username: "elastic"
  password: "elastic"
  ssl.verification_mode: "none"
  # or copy $ES_HOME/config/certs/http_ca.crt to path/to/filebeat/root
  # ssl.certificate_authorities: http_ca.crt

从 ES 容器实例中复制 HTTP CA 证书 docker cp es00:/usr/share/elasticsearch/config/certs/http_ca.crt .

测试 curl --cacert http_ca.crt -u elastic:elastic https://localhost:9200

二进制安装 ES & Kibana v8.x on Windows

测试环境: Windows 11

在官网下载 https://artifacts.elastic.co/downloads/kibana/kibana-8.19.3-windows-x86_64.zip 解压到 c:\pkgs\es

启动 c:\pkgs\es\bin\elasticsearch.bat

留意最后几行,提到 elastic 账户名和对应密码

新开一个 WSL 中测试,自带 cURL 方便 curl --cacert config/certs/http_ca.crt -u elastic:elastic https://localhost:9200

可选,下载 kibana 解压到 c:\pkgs\kibana

注意如果上一步中关闭了 https 和鉴权,kibana 的 config/kibana.yml 配置文件也要需将 es 访问地址做相应修改

可选,重置 es 内置 kibana 相关账号密码 c:\pkgs\es\bin\elasticsearch-reset-password -u kibana_system --url https://localhost:9200

访问 kibana http://localhost:5601/

http://localhost:5601/app/dev_tools#/console 可用于代替 cURL 调试查询

使用 Python 库访问 https://www.elastic.co/guide/en/elasticsearch/client/python-api/8.1/overview.html

常用术语

  • document 文档 相当于数据库表中某一条记录
  • index(noun) 索引 相当于数据库的 db+tbl 概念,逻辑分类
  • index(verb) 索引 添加一个文档到存储并让它可以搜索
  • indices/indexes 索引复述
  • mapping 映射?
  • type 从 v7.x 开始已经废弃概念

常用接口和命令

HTTP 接口请求都是类似以下格式

shell
curl
--cacert http_ca.crt \
-u elastic:$ELASTIC_PASSWORD \
-X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • path 前缀 _cat 是指 Compact and aligned text (CAT) APIs;
  • query-string 参数
    • pretty 会自动美化 JSON 格式化输出;
    • format 指定响应格式,支持 text json smile yaml cbor 几种;
    • _cat 接口默认响应不打印字段头,带上参数 v=ture 才打印;

查看节点健康状态 curl localhost:9200/_cat/health?v=true

查看集群健康 curl localhost:9200/_cluster/health

查看主节点 curl 'localhost:9200/_cat/master?format=json' | jq

查看文档总数

  • kibana 中 GET _count?pretty
  • 等效于 cURL curl localhost:9200/_count?pretty

查看所有索引状态 GET _cat/indices?pretty

查看指定索引文档总数 GET myidx/_count?pretty

查询指定主键文档 GET <idx索引名称>/_doc/<id文档主键> GET myidx/_doc/1?pretty

创建文档并指定 id

POST myidx/_doc/1?pretty
{"id":"1","title":"hello"}

创建文档但不指定 id,ES 会自动生成唯一

POST myidx/_doc?pretty
{"title":"hello"}

注意和 cURL 有差异,后者需要带头 curl -XPOST -H 'Content-Type: application/json' localhost:9200/myidx/_doc/1?pretty -d '{"id":"1","title":"hello"}'

更新文档(覆盖已有)

POST myidx/_doc/1?pretty
{"title":"你好"}

更新文档(仅部分字段),注意,请求体需要包一层 doc

POST myidx/_update/1?pretty
{"doc":{"title":"你好"}}

删除文档 DELETE myidx/_doc/1

按查询范围删除批量文档

http
POST .ds-logs-xxx-2049.05.28-000001/_delete_by_query?scroll_size=10000
{
    "query": {
        "range": {
          "@timestamp": {
            "lte": "2049-05-30T00:00:00.000Z"
          }
        }
    }
}

POST .ds-logs-xxx-2049.05.28-000001/_count
{
    "query": {
        "range": {
          "@timestamp": {
            "lte": "2049-05-30T00:00:00.000Z"
          }
        }
    }
}

# 释放已删除文档占用的存储空间
POST .ds-logs-2049.05.28-000001/_forcemerge?only_expunge_deletes=true

# 查看 segment
GET .ds-logs-2049.05.28-000001/_segments

注:在 SSD 硬盘 ES v8.x 环境中,批量删除超过 3000 万文档时容易卡顿假死。可通过制定时间片分批删除。

删除索引 DELETE myidx/

搜索文档,多种匹配方式 GET _search?q=A%20Tale%20of%20Two%20Cities&pretty&size=5

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/searching.html#searching-uri

    GET book/_search
    {
      "query": {
    	"match": {"title": "A Tale of Two Cities"}
      }
    }

    GET _search
    {
      "query": {
    	"multi_match": {
    		"query": "A Tale of Two Cities",
    		"fields": [
    				"alias",
    				"title"
    			]
    	}
      }
    }


    GET book/_search
    {
      "query": {
    	"bool": {
    	  "should": [
    		{
    		  "match": {
    			"title": "A Tale of Two Cities"
    		  }
    		},
    		{
    		  "match": {
    			"alias": "A Tale of Two Cities"
    		  }
    		}
    	  ]
    	}
      },
      "from": 5,
      "size": 10
    }

自定义索引字段

TBD.

全文索引

TBD.

常见问题

HTTPS 和 HTTP basic 鉴权

在本地开发时,从 7.x 版本开始,ES 默认启用 HTTPS 和 HTTP basic 鉴权,操作繁琐,可通过以下方式关闭。 修改 es 目录下 config/elasticsearch.yml 配置文件中 以下选项的 true 改为 false 后重启服务。

同时禁用 HTTPS 和 HTTP basic 鉴权:

yaml
xpack.security.enabled: false
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

仅禁用 HTTPS 但保留 HTTP basic 鉴权:

yaml
xpack.security.enabled: true
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: true

设置 HTTP basic 验证

修改 elasticsearch.yml

yaml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

重启服务后执行 bin/elasticsearch-setup-passwords -u elastic -i

修改密码 curl -H "Content-Type:application/json" -XPOST -u elastic:elastic 'http://127.0.0.1:9200/_xpack/security/user/elastic/_password' -d '{ "password" : "newPassword" }'

索引容量估算

某原始样本文本数据 200 MB,落入 Postgres 16.x 存储一表约占 270 MB (\d+ 查看)记录数据导入到 ES 7.x ,默认不自定义索引,使用内置自动索引、默认压缩算法容量为 290 MB 。

查看存储占用容量 curl localhost:9203/_cat/indices?v=true ,输出说明

  • pri.store.size 只计算主分片存储占用容量
  • store.size 计算主和副本分片存储占用容量总和

默认数据压缩 index.codec: defaultLZ4

限制内存使用

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/circuit-breaker.html

自动删除过期文档索引

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-lifecycle-management.html

快照备份和恢复

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/snapshot-restore.html

读写分片路由设置

TBD.

冷热数据区分存储

ElasticSearch ILM

https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management

v7.17 升级 v8.x

  • v7.x 如果不是 v7.17 最新版本,先升级到 v7.17
  • 在 v7.17 上设置 path.repo 选项,重启
  • 在 v7.17 kibana 设置 repo、新建 snap policy,生成数据快照
  • 在 v8 上设置 path.repo 选项,将 v7.17 snap 数据复制到该目录下后,重启
  • 在 v8 kibana 设置 repo,倒入快照恢复,如果已存在重名 index 需先删除
  • 提示 field _anonymous xxx 错误,在 v8 kibana 后台 refresh index 修复

References

Released under the CC-BY-NC-4.0