what is an elasticsearch index?-程序员宅基地

技术标签: elasticsearch  


What exactly is an index in Elasticsearch? Despite being a very basic question, the answer is surprisingly nuanced.

basic definition

An index is defined as:

An index is like a ‘database’ in a relational database. It has a mapping which defines multiple types.
An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

Ok. So there are two concepts in that definition. First, an index is some type of data organization mechanism, allowing the user to partition data a certain way. The second concept relates to replicas and shards, the mechanism Elasticsearch uses to distribute data around the cluster.

Let’s explore the first concept, using indices to organize data.

indices for relations

The easiest and most familiar layout clones what you would expect from a relational database. You can (very roughly) think of an index like a database.

  • MySQL => Databases => Tables => Columns/Rows
  • Elasticsearch => Indices => Types => Documents with Properties

An Elasticsearch cluster can contain multiple Indices (databases), which in turn contain multipleTypes (tables). These types hold multiple Documents (rows), and each document has Properties(columns).

So in your car manufacturing scenario, you may have a SubaruFactory index. Within this index, you have three different types:

  • People
  • Cars
  • Spare_Parts

Each type then contains documents that correspond to that type (e.g. a Subaru Imprezza doc lives inside of the Cars type. This doc contains all the details about that particular car).

Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]

So to retrieve the Subaru document, I may do this:

$ curl -XGET localhost:9200/SubaruFactory/Cars/SubaruImprezza

indices for logging

Now, the reality is that Indices/Types are much more flexible than the Database/Table abstractions we are used to in RDBMs. They can be considered convenient data organization mechanisms, with added performance benefits depending on how you set up your data.

To demonstrate a radically different approach, a lot of people use Elasticsearch for logging. A standard format is to assign a new index for each day. Your list of indices may look like this:

  • logs-2013-02-22
  • logs-2013-02-21
  • logs-2013-02-20

Elasticsearch allows you to query multiple indices at the same time, so it isn’t a problem to do:

$ curl -XGET localhost:9200/logs-2013-02-22,logs-2013-02-21/Errors/_search?query="q:Error Message"

Which searches the logs from the last two days at the same time. This format has advantages due to the nature of logs – most logs are never looked at and they are organized in a linear flow of time. Making an index per log is more logical and offers better performance for searching.

It is akin to partitioning a RDBM table by time ranges, except we are creating new indices for each partition. This is a concept that RDBM’s would scoff at…a new database for each day? Crazy!

Indices are fairly lightweight data organization mechanisms, so Elasticsearch will happily let you create hundreds of indices.

indices for users

Another radically different approach is to create an index per user. Imagine you have some social networking site, and each users has a large amount of random data. You can create a single index for each user. Your structure may look like:

  • Zach’s Index
    • Hobbies Type
    • Friends Type
    • Pictures Type
  • Fred’s Index
    • Hobbies Type
    • Friends Type
    • Pictures Type

Notice how this setup could easily be done in a traditional RDBM fashion (e.g. “Users” Index, with hobbies/friends/pictures as types). All users would then be thrown into a single, giant index.

Instead, it sometimes makes sense to split data apart for data organization and performance reasons. In this scenario, we are assuming each user has a lot of data, and we want them separate. Elasticsearch has no problem letting us create an index per user.

indices for data distribution

The first three examples dealt entirely with how data should be logically separated, allowing it to be represented naturally and efficiently.

However, the definition of an Index also includes that bit about shards and replicas. Underneath all the indices and types and documents, Elasticsearch has to store the data somewhere. This functionality is stored into shards, which are either the Primary or Replica

Each index is configured for a certain number of primary and replica shards. So taking the “User” example above, if you created an index for every user, you are also creating a set of shards for each user.

This is neither good or bad, simply a consideration when planning your cluster. Different performance requirements benefit from different shard layouts. I’m purposefully leaving this section short, since properly covering shards will require an article of its own.

So just remember, Indices organize data logically, but they also organize data physically through the underlying shards.


版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。


z-index详解_z-index 最大也不是顶级-程序员宅基地

概念z-index 属性设置元素的堆叠顺序。拥有更高堆叠顺序的元素总是会处于堆叠顺序较低的元素的前面。 层级关系的比较1. 对于同级元素,默认(或position:static)情况下文档流后面的元素会覆盖前面的。2. 对于同级元素,position不为static且z-index存在的情况下z-index大的元素会覆盖z-index小的元素,即z-_z-index 最大也不是顶级

CentOS 7.x下安装openssh-server-程序员宅基地

1、查看是否安装了相关软件: rpm -qa|grep -E "openssh" 显示结果含有以下三个软件,则表示已经安装,否则需要安装缺失的软件 openssh-ldap-6.6.1p1-35.el7_3.x86_64 openssh-clients-6.6.1p1-35.el7_3...



nsfocus远程安全评估系统报告提取信息脚本_nsfoucs agent-程序员宅基地

# coding=utf-8from bs4 import BeautifulSoupimport xlwt"""采集移动漏扫报中主机端口,banner信息和漏洞信息python2.7"""ip_port = []ip_alarm = []#获取主机def get_host_list(filename): ips = [] with open(filena..._nsfoucs agent


客户质询的现象是:Slony-I运行中,log中发现FATAL信息:FATAL storeListen: unknown node ID 3出现了上述错误后,再看后继的log,又恢复正常运行了。客户的问题在于:如何看待这个错误信息,它是否是设计上就是这样的?言外之意,这到底是否是一个bug?设计上是否是这样,是无从知晓的,只有问Vendor...

oracle 查询数据建表,oracle数据库中怎么查询自己建的表_liberum Chase的博客-程序员宅基地

select * from user_tables where table_name='ABC';可以查询出ABC但前提得是知道表名是ABC 可以根据表建时间不同,来将你本人建的表和数据库自动建的表分开,虽然他们都是同一个用户建的。 在user_table表里没有建表时间这一字段,可以用user_objects这个表select * from user_tables where table_nam..._oracle 查询参考创建表



用C#创建一个简单的windows窗体应用 (加法器)编译环境,本人用的VS2013。打开vs2013,新建项目 选择C#,创建windows窗体应用 自定义名称及存储位置创建完成之后界面 从工具箱中添加控件 找不到工具箱在哪的可能在这里 我们这次要做的是一个加法器首先在加法器上添加好所需要的控件: Button Label TextBox 之后排成...


学习心得一、 强调动手,强调实践:只有通过大量的实践才能不断的培养编程感觉;编程是有感觉的,就好象英语阅读一样,是有语感的;每天都要编写代码,不断培养编程感觉,正所谓熟能生巧!二、 通过实践来增强对理论的理解:没有实践的理论是空虚的理论;没有理论的实践是盲目的实践; 实践是检验真理的唯一标准。三、 打破同学以往在程序设计学习中存在的误区:a) ..._我 学习心得与 分享,希望

mysql group by 集合_MySQL高级查询之与Group By集合使用介绍_猫切的博客-程序员宅基地

在MySQL中,你可以获取表达式组合的连接值。可以使用DISTINCT删去重复值。假若你希望多结果值进行排序,则应该使用 ORDER BY子句。GROUP_CONCATmysql> SELECT student_name,-> GROUP_CONCAT(test_score)-> FROM student-> GROUP BY student_nam..._group by 获取名称集合

angular2/4 使用[innerHTML]时样式不生效_angular2 [innerhtml]插入的元素样式不生效-程序员宅基地

首先默认情况下,innerHTML会被认为是不安全的,因此屏蔽掉了样式。不生效场景我们通过innerHTML引入一段html格式的文本,文本中有style样式,目标是希望字体是红色的,而实际效果却仍然是默认的黑色。TS文件import { Component, OnInit } from '@angular/core';@Component({ selector: 'app-inner-html', templateUrl: './inner-html.component.html', _angular2 [innerhtml]插入的元素样式不生效

Ubuntu系统配置java环境问题_value '/usr/lib/jvm/java-11-openjdk-amd64/' given _Champion.XL的博客-程序员宅基地

Ubuntu系统配置java环境问题准备:安装java:sudo apt install -y default-jdk安装完成后检查是否安装成功Java –version #查看java版本号安装vim编辑器sudo apt install vim查看是否安装成功输出一下界面,安装成功准备工作完成:开始配置java环境获得java安装路径获得java安装路径为/usr/lib/jvm/java-11-openjdk-amd64/打开profile文件,添加java路径_value '/usr/lib/jvm/java-11-openjdk-amd64/' given for org.gradle.java.home g

Ubuntu 10.04使用Wine安装Source Insight-程序员宅基地

一、Ubuntu安装wine工具$sudo apt-get install wine注:如安装不了加入如下源:{{ deb http://cz.archive.ubuntu.com/ubuntu lucid main universe}}这是Ubuntu 10.04的源。wine版本是1.1.42的,这个版本对中文支持的非常好。二、下载Source Insight