See Also Elasticsearch

Inverted Index

倒排索引(Inverted index),也常被称为反向索引、置入档案或反向档案,是一种索引方法,被用来存储在全文搜索下某个单词在一个文档或者一组文档中的存储位置的映射。它是文档检索系统中最常用的数据结构。

有两种不同的反向索引形式:

后者的形式提供了更多的兼容性(比如短语搜索),但是需要更多的时间和空间来创建。

1. Getting Started

1.1. Book

1.2. Search Engine

1.3. Simple Demo

  1. The quick brown fox jumped over the lazy dog
  2. Quick brown foxes leap over lazy dogs in summer

Term      Doc_1  Doc_2
-------------------------
brown   |   X   |  X
dog     |   X   |  X
fox     |   X   |  X
in      |       |  X
jump    |   X   |  X
lazy    |   X   |  X
over    |   X   |  X
quick   |   X   |  X
summer  |       |  X
the     |   X   |  X
------------------------

1.4. Data Structure

Term Dictionary - 单词词典,记录文档中所有的单词及单词到倒排列表的关联关系

Posting List - 倒排列表,记录了单词对应的文档结合,由倒排索引项组成

Posting - 倒排索引项

1.5. Inside ES

POST _analyze
{
  "analyzer": "standard",
  "text": "Mastering Elasticsearch"
}
# Response
{
  "tokens" : [
    {
      "token" : "elasticsearch",
      "start_offset" : 0,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "essentials",
      "start_offset" : 14,
      "end_offset" : 24,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

2. Applications

3. See Also

3.1. Index (search engine)

3.2. Reverse index

3.3. Vector space model

4. Reference

MainWiki: Inverted_Index (last edited 2019-08-25 03:55:10 by twotwo)