Differences between revisions 1 and 2
Revision 1 as of 2017-06-05 20:46:05
Size: 8284
Editor: twotwo
Comment:
Revision 2 as of 2019-08-08 20:44:08
Size: 15062
Editor: twotwo
Comment: V7 知识整理
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:

=== How to Learn ES? ===
[[https://github.com/elastic/elasticsearch-definitive-guide/|The Definitive Guide to Elasticsearch|target="_blank"]] 和 [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/index.html|Elasticsearch Reference(7.x)|target="_blank"]] 是我学习 ES 的两份重要参考文档。

虽然 [[https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html|权威指南|target="_blank"]] 是基于 ES V2 编写的,但其对 ES 整体架构层面的理解和组织上,对我的帮助非常大;而 `Guide` 中具体的概念和涉及实操内容可以通过对照 `Reference` 进行确认。结合实操以及我的学习笔记与视频教程,让开发团队的其他成员快速掌握项目开发所需的 ES 相关知识。
Line 86: Line 92:
ElasticSearch与关系型数据库的相比
 * 索引(Index):相当于数据库(Database),用于定义文档类型的存储;在同一个索引中,同一个字段只能定义一个数据类型;
 * 文档类型(Type):相当于表结构(Schema),用于描述文档中的各个字段的定义;不同的文档类型,能够存储不同的字段,服务于不同的查询请求;
Elasticsearch 与关系型数据库的相比
Line 93: Line 97:
 * 索引(Index):相当于数据库(Database),用于定义文档类型的存储;在同一个索引中,同一个字段只能定义一个数据类型;
  * 文档类型(Type):相当于表结构(Schema),用于描述文档中的各个字段的定义;不同的文档类型,能够存储不同的字段,服务于不同的查询请求;

||'''RDBMS''' ||'''Elasticsearch''' ||
||Table ||Index(Type) ||
||Row ||Document ||
||Column ||Field ||
||Schema ||Mapping ||
||SQL ||DSL ||
Line 96: Line 109:

=== The Search API ===
[[https://mindmajix.com/elasticsearch/curl-syntax-with-examples||target="_blank"]]
文档属性([[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping.html|Mapping|target="_blank"]])

{{{#!highlight json numbers=disable
PUT my_index
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "name": { "type": "text" },
      "age": { "type": "integer" },
      "created": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}
}}}

文档属性定义了文档类型的共用属性,适用于文档的所有字段:
 * dynamic_date_formats属性:该属性定义可以识别的日期格式列表;
 * dynamic属性:默认值为true,允许动态地向文档类型中加入新的字段。推荐设置为false,禁止向文档中添加字段,这样,文档类型的所有字段必须在索引映射的properties属性中显式定义,在properties字段中未定义的字段都将会 Elasticsearch 忽略。
  * dynamic设置为ture:默认值,新增加的字段被添加到索引映射中;
  * dynamic设置为false:新增加的字段会被忽略;
  * dynamic设置为strict:当向文档中新增字段时,Elasticsearch 引擎抛出异常;

[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-types.html|Field datatypes|target="_blank"]]


=== Aggregations ===
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html||target="_blank"]] Aggregations

Metrics Aggregations
 * Avg
 * Weighted Avg
 * Cardinality
 * Extended Stats
 * Geo Bounds
 * Geo Centroid
 * Max
 * Min
 * Percentiles
 * Percentile Ranks
 * Scripted Metric
 * Stats
 * Sum
 * Top Hits
 * Value Count
 * Median Absolute Deviation
Bucket Aggregations

Pipeline Aggregations

Matrix Aggregations

=== Query DSL ===
[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl.html||target="_blank"]] Query DSL
Line 104: Line 171:
=== The Query Language ===
{{{#!highlight bash numbers=disable
GET /test/_search
{
  "query": { "match_all": {} }
}

GET /test/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}
}}}

=== Executing Filters ===
{{{#!highlight bash numbers=disable
==== Compound queries ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/compound-queries.html||target="_blank"]] `Boolean/Boosting/Constant score/Disjunction max/Function score`

{{{#!highlight json numbers=disable
Line 139: Line 193:
== Programming ==

=== Elasticsearch Clients ===
[[https://www.elastic.co/guide/en/elasticsearch/client/index.html||target="_blank"]]
 * [[https://github.com/elastic/elasticsearch-py||target="_blank"]]

=== Wrapped with Spring ===
[[https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/||target="_blank"]]

[[https://github.com/cooleo/spring-data-elasticsearch-sample.git||target="_blank"]]

== REST APIs ==
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html|API conventions|target="_blank"]]
==== Full text queries ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/full-text-queries.html||target="_blank"]] `Intervals/Match/Match boolean prefix/Match phrase/Match phrase prefix/Multi-match/Common Terms Query/Query string/Simple query string`
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-match-query.html|Match query|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-multi-match-query.html|Multi-match query|target="_blank"]]
{{{#!highlight json numbers=disable
GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

}}}

==== Joining queries ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/joining-queries.html||target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-nested-query.html|Nested|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-has-child-query.html|Has child|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-has-parent-query.html|Has parent|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-parent-id-query.html|Parent ID|target="_blank"]]
{{{#!highlight json numbers=disable
# index
PUT /my_index
{
    "mappings": {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}
# query
GET /my_index/_search
{
    "query": {
        "nested" : {
            "path" : "obj1",
            "query" : {
                "bool" : {
                    "must" : [
                    { "match" : {"obj1.name" : "blue"} },
                    { "range" : {"obj1.count" : {"gt" : 5}} }
                    ]
                }
            },
            "score_mode" : "avg"
        }
    }
}
}}}
==== Term-level queries ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/term-level-queries.html||target="_blank"]] `Exists/Fuzzy/IDs/Prefix/Range/Regexp/Term/Terms/Terms set/Type Query/Wildcard`
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-range-query.html|range query|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-term-query.html|term query|target="_blank"]]

{{{#!highlight json numbers=disable
GET /_search
{
    "query": {
        "bool": {
            "must": {
                "exists": {
                    "field": "user"
                }
            }
        }
    }
}
}}}

{{{#!highlight json numbers=disable
GET /_search
{
    "query": {
        "range" : {
            "Start" : {
                "time_zone": "+08:00",
                "gte": "2019-08-09T10:00:00",
                "lte": "now"
            }
        }
    }
}
}}}

=== REST APIs ===
==== API conventions ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html||target="_blank"]]
Line 156: Line 288:
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html|Document APIs|target="_blank"]]
==== Document APIs ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html||target="_blank"]]

All CRUD APIs are single-index APIs. The index parameter accepts a single index name, or an alias which points to a single index.
Line 161: Line 297:
Multi-document
Line 166: Line 303:
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html|Search APIs|target="_blank"]]

[[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices.html|Indices APIs|target="_blank"]]

=== CRUD ===
All CRUD APIs are single-index APIs. The index parameter accepts a single index name, or an alias which points to a single index.

===== CRUD Samples =====
|| '''Command''' ||'''HTTP Method''' ||'''Samples''' ||
|| Index || PUT|| PUT my_index/_doc/1 //overwrite<<BR>>{...}||
|| Create || PUT || PUT my_index/_doc/_create/1<<BR>>{...}<<BR>>POST my_index/_doc //auto generate id<<BR>>{...} ||
|| Read || Get || GET my_index/_doc/1 ||
|| Update || POST || POST my_index/_update/1<<BR>>{"doc":{...}}||
|| Delete || DELETE || DELETE my_index/_doc/1 ||
Line 189: Line 328:
===== Multi-document APIs =====
降低网络消耗
 * `GET _mget`
 * `POST _bulk` index/create/update/delete
 * `POST my-index/_msearch`
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/docs-update-by-query.html|POST my-index/_update_by_query|target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/docs-reindex.html|POST _reindex|target="_blank"]]
==== Search APIs ====
[[https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html||target="_blank"]]
 * [[https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html|Request Body Search|target="_blank"]]

{{{#!highlight bash numbers=disable

}}}

[[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices.html|Indices APIs|target="_blank"]]


== Programming ==

=== Elasticsearch Clients ===
[[https://www.elastic.co/guide/en/elasticsearch/client/index.html||target="_blank"]]
 * [[https://github.com/elastic/elasticsearch-py||target="_blank"]]

=== Wrapped with Spring ===
[[https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/||target="_blank"]]

[[https://github.com/cooleo/spring-data-elasticsearch-sample.git||target="_blank"]]
Line 193: Line 361:
 * [[https://github.com/elasticsearch-cn||target="_blank"]] Elastic 中文社区

Back to Elastic Stack

See Also KibanaLogstashSpringBootPython

Elasticsearch

Elasticsearch是一个基于Lucene库的搜索引擎。它提供了一个分布式、支持多租户的全文搜索引擎,具有HTTP Web接口和无模式JSON文档。Elasticsearch是用Java开发的,并在Apache许可证下作为开源软件发布。官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。

1. History

Shay Banon在2004年创造了Elasticsearch的前身,称为Compass。在考虑Compass的第三个版本时,他意识到有必要重写Compass的大部分内容,以“创建一个可扩展的搜索解决方案”。因此,他创建了“一个从头构建的分布式解决方案”,并使用了一个公共接口,即HTTP上的JSON,它也适用于Java以外的编程语言。Shay Banon在2010年2月发布了Elasticsearch的第一个版本。

Elasticsearch BV成立于2012年,主要围绕Elasticsearch及相关软件提供商业服务和产品。2014年6月,在成立公司18个月后,该公司宣布通过C轮融资筹集7000万美元。这轮融资由新企业协会(NEA)牵头。其他投资者包括Benchmark Capital和Index Ventures。这一轮融资总计1.04亿美元。

2015年3月,Elasticsearch公司更名为Elastic。

2. Quick Start

2.1. How to Learn ES?

The Definitive Guide to ElasticsearchElasticsearch Reference(7.x) 是我学习 ES 的两份重要参考文档。

虽然 权威指南 是基于 ES V2 编写的,但其对 ES 整体架构层面的理解和组织上,对我的帮助非常大;而 Guide 中具体的概念和涉及实操内容可以通过对照 Reference 进行确认。结合实操以及我的学习笔记与视频教程,让开发团队的其他成员快速掌握项目开发所需的 ES 相关知识。

2.2. Installing on macOS

➜  ~ brew install elasticsearch
To have launchd start elasticsearch now and restart at login:
  brew services start elasticsearch
Or, if you don't want/need a background service you can just run:
  elasticsearch
➜  ~ mkdir logs && elasticsearch

2.3. Install with Docker

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docker.html#docker-prod-cluster-composefile 从 v5 开始有 Docker 安装的官方文档

docker pull docker.elastic.co/elasticsearch/elasticsearch:5.0.2
# Development mode
docker run -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" docker.elastic.co/elasticsearch/elasticsearch:5.0.2

docker-compose.yml

2.4. Installing from the RPM repository

Create a file called elasticsearch.repo in the /etc/yum.repos.d/

# vi /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-5.x]
name=Elasticsearch repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install elasticsearch -y
chkconfig --add elasticsearch
sudo service elasticsearch start
curl http://localhost:9200/

2.5. System Configuration

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-configuration.html

# Virtual memory
sudo sysctl -w vm.max_map_count=262144
# Disable swap
sudo swapoff -a
sudo sysctl -w vm.swappiness=0

https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster bulk rejections issue

3. Exploring Your Data

https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_data.html

✗ curl "localhost:9200/_cat/indices?v"
health status index               pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   test                    5   1          1            0      3.9kb          3.9kb

不同于SQL语言,对 Elasticsearch 引擎发送的查询请求,有两种方式:

  • 使用 RESTful 风格的API请求对数据进行搜索或更新,这意味着,必须使用搜索API向 Elasticsearch 引擎发起搜索请求;

  • 使用 Qeury DSL,将查询语言封装成JSON结构,在JSON结构中,封装查询请求的参数,作为请求主体(Request Body),发送给 Elasticsearch 引擎处理

3.1. Data Model

https://www.slideshare.net/fhopf/data-modeling-for-elasticsearch

Elasticsearch 与关系型数据库的相比

  • 文档(Document):相当于关系表的数据行(Row Data),存储数据的载体,包含一个或多个存有数据的字段;
    • 字段(Field):文档的一个Key/Value对;
    • 词条(Term):表示文本中的一个单词;//参见 词条(term)查询和全文(fulltext)查询
    • 标记(Token):表示在字段中出现的词,由该词的文本、偏移量(开始和结束)以及类型组成;
  • 索引(Index):相当于数据库(Database),用于定义文档类型的存储;在同一个索引中,同一个字段只能定义一个数据类型;
    • 文档类型(Type):相当于表结构(Schema),用于描述文档中的各个字段的定义;不同的文档类型,能够存储不同的字段,服务于不同的查询请求;

RDBMS

Elasticsearch

Table

Index(Type)

Row

Document

Column

Field

Schema

Mapping

SQL

DSL

索引的存储

文档属性(Mapping)

PUT my_index 
{
  "mappings": {
    "properties": { 
      "title":    { "type": "text"  }, 
      "name":     { "type": "text"  }, 
      "age":      { "type": "integer" },  
      "created":  {
        "type":   "date", 
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}

文档属性定义了文档类型的共用属性,适用于文档的所有字段:

  • dynamic_date_formats属性:该属性定义可以识别的日期格式列表;
  • dynamic属性:默认值为true,允许动态地向文档类型中加入新的字段。推荐设置为false,禁止向文档中添加字段,这样,文档类型的所有字段必须在索引映射的properties属性中显式定义,在properties字段中未定义的字段都将会 Elasticsearch 忽略。
    • dynamic设置为ture:默认值,新增加的字段被添加到索引映射中;
    • dynamic设置为false:新增加的字段会被忽略;
    • dynamic设置为strict:当向文档中新增字段时,Elasticsearch 引擎抛出异常;

Field datatypes

3.2. Aggregations

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html Aggregations

Metrics Aggregations

  • Avg
  • Weighted Avg
  • Cardinality
  • Extended Stats
  • Geo Bounds
  • Geo Centroid
  • Max
  • Min
  • Percentiles
  • Percentile Ranks
  • Scripted Metric
  • Stats
  • Sum
  • Top Hits
  • Value Count
  • Median Absolute Deviation

Bucket Aggregations

Pipeline Aggregations

Matrix Aggregations

3.3. Query DSL

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl.html Query DSL

curl -XGET 'http://localhost:9200/test/_search?q=*&pretty'

3.3.1. Compound queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/compound-queries.html Boolean/Boosting/Constant score/Disjunction max/Function score

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

3.3.2. Full text queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/full-text-queries.html Intervals/Match/Match boolean prefix/Match phrase/Match phrase prefix/Multi-match/Common Terms Query/Query string/Simple query string

GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

3.3.3. Joining queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/joining-queries.html

# index
PUT /my_index
{
    "mappings": {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}
# query
GET /my_index/_search
{
    "query": {
        "nested" : {
            "path" : "obj1",
            "query" : {
                "bool" : {
                    "must" : [
                    { "match" : {"obj1.name" : "blue"} },
                    { "range" : {"obj1.count" : {"gt" : 5}} }
                    ]
                }
            },
            "score_mode" : "avg"
        }
    }
}

3.3.4. Term-level queries

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/term-level-queries.html Exists/Fuzzy/IDs/Prefix/Range/Regexp/Term/Terms/Terms set/Type Query/Wildcard

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "exists": {
                    "field": "user"
                }
            }
        }
    }
}

GET /_search
{
    "query": {
        "range" : {
            "Start" : {
                "time_zone": "+08:00", 
                "gte": "2019-08-09T10:00:00", 
                "lte": "now" 
            }
        }
    }
}

3.4. REST APIs

3.4.1. API conventions

https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html

  • Multiple Indices
  • Date math support in index names
  • Common options
  • URL-based access control

3.4.2. Document APIs

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html

All CRUD APIs are single-index APIs. The index parameter accepts a single index name, or an alias which points to a single index.

  • Index API
  • Get API
  • Delete API
  • Update API

Multi-document

  • Multi Get API(Multi-document)
  • Bulk API(Multi-document)
  • Delete By Query API(Multi-document)
  • Update By Query API(Multi-document)
  • Reindex API(Multi-document)

3.4.2.1. CRUD Samples

Command

HTTP Method

Samples

Index

PUT

PUT my_index/_doc/1 //overwrite
{...}

Create

PUT

PUT my_index/_doc/_create/1
{...}
POST my_index/_doc //auto generate id
{...}

Read

Get

GET my_index/_doc/1

Update

POST

POST my_index/_update/1
{"doc":{...}}

Delete

DELETE

DELETE my_index/_doc/1

# Create Content on my-index, refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
$ curl -XPUT -H'Content-Type:application/json' localhost:9200/my-index/_doc/1 -d '
{ "root" :
   [
      {"column1": "aaa"},
      {"column1": "bbb"}
   ]
}
'
{"_index":"my-index","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

# Delete index
curl -XDELETE 'http://localhost:9200/feed-*'

3.4.2.2. Multi-document APIs

降低网络消耗

3.4.3. Search APIs

https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html


Indices APIs

4. Programming

4.1. Elasticsearch Clients

https://www.elastic.co/guide/en/elasticsearch/client/index.html

4.2. Wrapped with Spring

https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/

https://github.com/cooleo/spring-data-elasticsearch-sample.git

5. Reference

MainWiki: Elasticsearch (last edited 2019-08-08 20:44:08 by twotwo)