Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3708: [WIP] Update index documents

Classic

List

28 messages Options

Options

12

GitBox

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3708: [WIP] Update index documents

ShreelekhyaG opened a new pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708

### Why is this PR needed?
update index documentation to comply with recent changes

### What changes were proposed in this PR?

### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)

### Is any new testcase added?
- No
- Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3708: [WIP] Update index documents

CarbonDataQA1 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613439176

Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

ajantha-bhat commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613445057

Add to whitelist

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ajantha-bhat commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

ajantha-bhat commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613445163

retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613532950

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1026/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613535972

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2739/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408585482

##########
File path: docs/index/index-management.md
##########
@@ -18,51 +18,49 @@
# CarbonData Index Management

- [Overview](#overview)
-- [DataMap Management](#datamap-management)
+- [Index Management](#index-management)
- [Automatic Refresh](#automatic-refresh)
- [Manual Refresh](#manual-refresh)
-- [DataMap Catalog](#datamap-catalog)
-- [DataMap Related Commands](#datamap-related-commands)
+- [Index Catalog](#index-catalog)
+- [Index Related Commands](#index-related-commands)
- [Explain](#explain)
- - [Show DataMap](#show-datamap)
+ - [Show Index](#show-index)

## Overview

-DataMap can be created using following DDL
+Index can be created using following DDL

```
-CREATE DATAMAP [IF NOT EXISTS] datamap_name
-[ON TABLE main_table]
-USING "datamap_provider"
-[WITH DEFERRED REBUILD]
-DMPROPERTIES ('key'='value', ...)
-AS
- SELECT statement
+CREATE INDEX [IF NOT EXISTS] index_name
+ON TABLE [db_name.]table_name (column_name, ...)
+AS carbondata/bloomfilter/lucene
+[WITH DEFERRED REFRESH]
+[PROPERTIES ('key'='value')]
```

-Currently, there are 5 DataMap implementations in CarbonData.
+Currently, there are 3 Index implementations in CarbonData.

-| DataMap Provider | Description | DMPROPERTIES | Management |
+| Index Provider | Description | PROPERTIES | Management |
| ---------------- | ---------------------------------------- | ---------------------------------------- | ---------------- |
-| mv | multi-table pre-aggregate table | No DMPROPERTY is required | Manual/Automatic |
+| secondary-index | secondary-index tables to hold blocklets as indexes and managed as child tables | No PROPERTY is required | Automatic |

Review comment:
Can remove Properties column, as now, index_columns will be provided in CREATE statement and not required to give in properties

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408583049

##########
File path: docs/index/bloomfilter-index-guide.md
##########
@@ -109,40 +110,40 @@ User can create BloomFilter DataMap using the Create DataMap DDL:
## Loading Data
When loading data to main table, BloomFilter files will be generated for all the
index_columns given in DMProperties which contains the blockletId and a BloomFilter for each index column.
-These index files will be written inside a folder named with DataMap name
+These index files will be written inside a folder named with Index name

Review comment:
Please change the statement in Line no.112. `When loading data to main table, BloomFilter files will be generated for all the index_columns provided in the CREATE statement which contains the blockletId`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408584425

##########
File path: docs/index/bloomfilter-index-guide.md
##########
@@ -109,40 +110,40 @@ User can create BloomFilter DataMap using the Create DataMap DDL:
## Loading Data
When loading data to main table, BloomFilter files will be generated for all the
index_columns given in DMProperties which contains the blockletId and a BloomFilter for each index column.
-These index files will be written inside a folder named with DataMap name
+These index files will be written inside a folder named with Index name
inside each segment folders.

## Querying Data

-User can verify whether a query can leverage BloomFilter DataMap by executing `EXPLAIN` command,
-which will show the transformed logical plan, and thus user can check whether the BloomFilter DataMap can skip blocklets during the scan.
-If the DataMap does not prune blocklets well, you can try to increase the value of property `BLOOM_SIZE` and decrease the value of property `BLOOM_FPP`.
+User can verify whether a query can leverage BloomFilter Index by executing `EXPLAIN` command,
+which will show the transformed logical plan, and thus user can check whether the BloomFilter Index can skip blocklets during the scan.
+If the Index does not prune blocklets well, you can try to increase the value of property `BLOOM_SIZE` and decrease the value of property `BLOOM_FPP`.

-## Data Management With BloomFilter DataMap
-Data management with BloomFilter DataMap has no difference with that on Lucene DataMap.
-You can refer to the corresponding section in `CarbonData Lucene DataMap`.
+## Data Management With BloomFilter Index
+Data management with BloomFilter Index has no difference with that on Lucene Index.
+You can refer to the corresponding section in `CarbonData Lucene Index`.

Review comment:
Looks like `CarbonData Lucene Index` link is not working. Please fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408587021

##########
File path: docs/index/index-management.md
##########
@@ -73,69 +71,69 @@ If user perform following command on the main table, system will return failure.
not, the operation is allowed, otherwise operation will be rejected by throwing exception.
3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.

-If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.
+If user do want to perform above operations on the main table, user can first drop the index, perform the operation, and re-create the index again.

-If user drop the main table, the datamap will be dropped immediately too.
+If user drop the main table, the index will be dropped immediately too.

-We do recommend you to use this management for index datamap.
+We do recommend you to use this management for index.

### Manual Refresh

-When user creates a datamap specifying manual refresh semantic, the datamap is created with status *disabled* and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to *enabled*, so that it can be used in query rewrite.
+When user creates a index specifying manual refresh semantic, the index is created with status *disabled* and query will NOT use this index until user can issue REFRESH INDEX command to build the index. For every REFRESH INDEX command, system will trigger a full rebuild of the index. After rebuild is done, system will change index status to *enabled*, so that it can be used in query rewrite.

-For every new data loading, data update, delete, the related datamap will be made *disabled*,
-which means that the following queries will not benefit from the datamap before it becomes *enabled* again.
+For every new data loading, data update, delete, the related index will be made *disabled*,
+which means that the following queries will not benefit from the index before it becomes *enabled* again.

-If the main table is dropped by user, the related datamap will be dropped immediately.
+If the main table is dropped by user, the related index will be dropped immediately.

**Note**:
-+ If you are creating a datamap on external table, you need to do manual management of the datamap.
-+ For index datamap such as BloomFilter datamap, there is no need to do manual refresh.
++ If you are creating a index on external table, you need to do manual management of the index.
++ For index such as BloomFilter index, there is no need to do manual refresh.
By default it is automatic refresh,
- which means its data will get refreshed immediately after the datamap is created or the main table is loaded.
- Manual refresh on this datamap will has no impact.
+ which means its data will get refreshed immediately after the index is created or the main table is loaded.
+ Manual refresh on this index will has no impact.

-## DataMap Catalog
+## Index Catalog

-Currently, when user creates a datamap, system will store the datamap metadata in a configurable *system* folder in HDFS or S3.
+Currently, when user creates a index, system will store the index metadata in a configurable *system* folder in HDFS or S3.

In this *system* folder, it contains:

-- DataMapSchema file. It is a json file containing schema for one datamap. Ses DataMapSchema class. If user creates 100 datamaps (on different tables), there will be 100 files in *system* folder.
-- DataMapStatus file. Only one file, it is in json format, and each entry in the file represents for one datamap. Ses DataMapStatusDetail class
+- IndexSchema file. It is a json file containing schema for one index. Ses IndexSchema class. If user creates 100 indexes (on different tables), there will be 100 files in *system* folder.
+- IndexStatus file. Only one file, it is in json format, and each entry in the file represents for one index. Ses IndexStatusDetail class

-There is a DataMapCatalog interface to retrieve schema of all datamap, it can be used in optimizer to get the metadata of datamap.
+There is a IndexCatalog interface to retrieve schema of all index, it can be used in optimizer to get the metadata of index.

-## DataMap Related Commands
+## Index Related Commands

### Explain

-How can user know whether datamap is used in the query?
+How can user know whether index is used in the query?

User can set enable.query.statistics = true and use EXPLAIN command to know, it will print out something like

```text
== CarbonData Profiler ==
-Hit mv DataMap: datamap1
-Scan Table: default.datamap1_table
+Hit mv Index: index1
+Scan Table: default.index1_table
+- filter:
-+- pruning by CG DataMap
++- pruning by CG Index
+- all blocklets: 1
skipped blocklets: 0
```

-### Show DataMap
+### Show Index

-There is a SHOW DATAMAPS command, when this is issued, system will read all datamap from *system* folder and print all information on screen. The current information includes:
+There is a SHOW INDEXES command, when this is issued, system will read all index from *system* folder and print all information on screen. The current information includes:

Review comment:
```suggestion
There is a SHOW INDEXES command, when this is issued, system will read all index from the carbon table and print all information on screen. The current information includes:
```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408586768

##########
File path: docs/index/index-management.md
##########
@@ -73,69 +71,69 @@ If user perform following command on the main table, system will return failure.
not, the operation is allowed, otherwise operation will be rejected by throwing exception.
3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.

-If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.
+If user do want to perform above operations on the main table, user can first drop the index, perform the operation, and re-create the index again.

-If user drop the main table, the datamap will be dropped immediately too.
+If user drop the main table, the index will be dropped immediately too.

-We do recommend you to use this management for index datamap.
+We do recommend you to use this management for index.

### Manual Refresh

-When user creates a datamap specifying manual refresh semantic, the datamap is created with status *disabled* and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to *enabled*, so that it can be used in query rewrite.
+When user creates a index specifying manual refresh semantic, the index is created with status *disabled* and query will NOT use this index until user can issue REFRESH INDEX command to build the index. For every REFRESH INDEX command, system will trigger a full rebuild of the index. After rebuild is done, system will change index status to *enabled*, so that it can be used in query rewrite.

-For every new data loading, data update, delete, the related datamap will be made *disabled*,
-which means that the following queries will not benefit from the datamap before it becomes *enabled* again.
+For every new data loading, data update, delete, the related index will be made *disabled*,
+which means that the following queries will not benefit from the index before it becomes *enabled* again.

-If the main table is dropped by user, the related datamap will be dropped immediately.
+If the main table is dropped by user, the related index will be dropped immediately.

**Note**:
-+ If you are creating a datamap on external table, you need to do manual management of the datamap.
-+ For index datamap such as BloomFilter datamap, there is no need to do manual refresh.
++ If you are creating a index on external table, you need to do manual management of the index.
++ For index such as BloomFilter index, there is no need to do manual refresh.
By default it is automatic refresh,
- which means its data will get refreshed immediately after the datamap is created or the main table is loaded.
- Manual refresh on this datamap will has no impact.
+ which means its data will get refreshed immediately after the index is created or the main table is loaded.
+ Manual refresh on this index will has no impact.

-## DataMap Catalog
+## Index Catalog

Review comment:
Please remove this section, as all indexes will be stored in carbontable itself in tableproperties

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408585991

##########
File path: docs/index/index-management.md
##########
@@ -73,69 +71,69 @@ If user perform following command on the main table, system will return failure.
not, the operation is allowed, otherwise operation will be rejected by throwing exception.
3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.

-If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.
+If user do want to perform above operations on the main table, user can first drop the index, perform the operation, and re-create the index again.

Review comment:
Change from `pre-aggregate table` to `index` in Line no.70

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408587720

##########
File path: docs/index/lucene-index-guide.md
##########
@@ -15,75 +15,74 @@
limitations under the License.
-->

-# CarbonData Lucene DataMap (Alpha Feature)
+# CarbonData Lucene Index (Alpha Feature)

-* [DataMap Management](#datamap-management)
-* [Lucene Datamap](#lucene-datamap-introduction)
+* [Index Management](#index-management)
+* [Lucene Index](#lucene-index-introduction)
* [Loading Data](#loading-data)
* [Querying Data](#querying-data)
-* [Data Management](#data-management-with-lucene-datamap)
+* [Data Management](#data-management-with-lucene-index)

-#### DataMap Management
-Lucene DataMap can be created using following DDL
+#### Index Management
+Lucene Index can be created using following DDL
```
- CREATE DATAMAP [IF NOT EXISTS] datamap_name
- ON TABLE main_table
- USING 'lucene'
- DMPROPERTIES ('index_columns'='city, name', ...)
+ CREATE INDEX [IF NOT EXISTS] index_name
+ ON TABLE main_table (index_columns)
+ AS 'lucene'
+ [PROPERTIES ('key'='value')]
```
+index_columns is the list of string columns on which lucene creates indexes.

-DataMap can be dropped using following DDL:
+Index can be dropped using following DDL:
```
- DROP DATAMAP [IF EXISTS] datamap_name
+ DROP INDEX [IF EXISTS] index_name
ON TABLE main_table
```
-To show all DataMaps created, use:
+To show all Indexes created, use:
```
- SHOW DATAMAP
+ SHOW INDEXES
ON TABLE main_table
```
-It will show all DataMaps created on main table.
+It will show all Indexes created on main table.

-## Lucene DataMap Introduction
+## Lucene Index Introduction
Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as
- an index datamap and managed along with main tables by CarbonData. User can create lucene datamap
+ an index and managed along with main tables by CarbonData. User can create lucene index
to improve query performance on string columns which has content of more length. So, user can
search tokenized word or pattern of it using lucene query on text content.

- For instance, main table called **datamap_test** which is defined as:
+ For instance, main table called **index_test** which is defined as:

```
- CREATE TABLE datamap_test (
+ CREATE TABLE index_test (
name string,
age int,
city string,
country string)
STORED AS carbondata
```

- User can create Lucene datamap using the Create DataMap DDL:
+ User can create Lucene index using the Create Index DDL:

```
- CREATE DATAMAP dm
- ON TABLE datamap_test
- USING 'lucene'
- DMPROPERTIES ('INDEX_COLUMNS' = 'name, country',)
+ CREATE INDEX dm
+ ON TABLE index_test (name,country)
+ AS 'lucene'
```

-**DMProperties**
-1. INDEX_COLUMNS: The list of string columns on which lucene creates indexes.
-2. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if specified then it tries to
+**Properties**
+1. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if specified then it tries to
aggregate the unique data till the cache limit and flush to Lucene. It is best suitable for low
cardinality dimensions.
-3. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise in lucene , it means new
+2. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise in lucene , it means new
folder will be created for each blocklet, thus, it eliminates storing blockletid in lucene and
also it makes lucene small chunks of data.

## Loading data
When loading data to main table, lucene index files will be generated for all the
index_columns(String Columns) given in DMProperties which contains information about the data
-location of index_columns. These index files will be written inside a folder named with datamap name

Review comment:
```
When loading data to main table, lucene index files will be generated for all the
index_columns(String Columns) given in CREATE statement which contains information about the data
```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408587194

##########
File path: docs/index/index-management.md
##########
@@ -73,69 +71,69 @@ If user perform following command on the main table, system will return failure.
not, the operation is allowed, otherwise operation will be rejected by throwing exception.
3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.

-If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.
+If user do want to perform above operations on the main table, user can first drop the index, perform the operation, and re-create the index again.

-If user drop the main table, the datamap will be dropped immediately too.
+If user drop the main table, the index will be dropped immediately too.

-We do recommend you to use this management for index datamap.
+We do recommend you to use this management for index.

### Manual Refresh

-When user creates a datamap specifying manual refresh semantic, the datamap is created with status *disabled* and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to *enabled*, so that it can be used in query rewrite.
+When user creates a index specifying manual refresh semantic, the index is created with status *disabled* and query will NOT use this index until user can issue REFRESH INDEX command to build the index. For every REFRESH INDEX command, system will trigger a full rebuild of the index. After rebuild is done, system will change index status to *enabled*, so that it can be used in query rewrite.

-For every new data loading, data update, delete, the related datamap will be made *disabled*,
-which means that the following queries will not benefit from the datamap before it becomes *enabled* again.
+For every new data loading, data update, delete, the related index will be made *disabled*,
+which means that the following queries will not benefit from the index before it becomes *enabled* again.

-If the main table is dropped by user, the related datamap will be dropped immediately.
+If the main table is dropped by user, the related index will be dropped immediately.

**Note**:
-+ If you are creating a datamap on external table, you need to do manual management of the datamap.
-+ For index datamap such as BloomFilter datamap, there is no need to do manual refresh.
++ If you are creating a index on external table, you need to do manual management of the index.
++ For index such as BloomFilter index, there is no need to do manual refresh.
By default it is automatic refresh,
- which means its data will get refreshed immediately after the datamap is created or the main table is loaded.
- Manual refresh on this datamap will has no impact.
+ which means its data will get refreshed immediately after the index is created or the main table is loaded.
+ Manual refresh on this index will has no impact.

-## DataMap Catalog
+## Index Catalog

-Currently, when user creates a datamap, system will store the datamap metadata in a configurable *system* folder in HDFS or S3.
+Currently, when user creates a index, system will store the index metadata in a configurable *system* folder in HDFS or S3.

In this *system* folder, it contains:

-- DataMapSchema file. It is a json file containing schema for one datamap. Ses DataMapSchema class. If user creates 100 datamaps (on different tables), there will be 100 files in *system* folder.
-- DataMapStatus file. Only one file, it is in json format, and each entry in the file represents for one datamap. Ses DataMapStatusDetail class
+- IndexSchema file. It is a json file containing schema for one index. Ses IndexSchema class. If user creates 100 indexes (on different tables), there will be 100 files in *system* folder.
+- IndexStatus file. Only one file, it is in json format, and each entry in the file represents for one index. Ses IndexStatusDetail class

-There is a DataMapCatalog interface to retrieve schema of all datamap, it can be used in optimizer to get the metadata of datamap.
+There is a IndexCatalog interface to retrieve schema of all index, it can be used in optimizer to get the metadata of index.

-## DataMap Related Commands
+## Index Related Commands

### Explain

-How can user know whether datamap is used in the query?
+How can user know whether index is used in the query?

User can set enable.query.statistics = true and use EXPLAIN command to know, it will print out something like

```text
== CarbonData Profiler ==
-Hit mv DataMap: datamap1
-Scan Table: default.datamap1_table
+Hit mv Index: index1
+Scan Table: default.index1_table
+- filter:
-+- pruning by CG DataMap
++- pruning by CG Index
+- all blocklets: 1
skipped blocklets: 0
```

-### Show DataMap
+### Show Index

-There is a SHOW DATAMAPS command, when this is issued, system will read all datamap from *system* folder and print all information on screen. The current information includes:
+There is a SHOW INDEXES command, when this is issued, system will read all index from *system* folder and print all information on screen. The current information includes:

-- DataMapName
-- DataMapProviderName like mv
+- IndexName

Review comment:
Please check and update show index information

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613831678

@ShreelekhyaG Please check and update links in` README.md` file

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408654056

##########
File path: docs/index/bloomfilter-index-guide.md
##########
@@ -109,40 +110,40 @@ User can create BloomFilter DataMap using the Create DataMap DDL:
## Loading Data
When loading data to main table, BloomFilter files will be generated for all the
index_columns given in DMProperties which contains the blockletId and a BloomFilter for each index column.
-These index files will be written inside a folder named with DataMap name
+These index files will be written inside a folder named with Index name

Review comment:
Ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613949596

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1034/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3708: [WIP] Update index documents

In reply to this post by GitBox

CarbonDataQA1 commented on issue #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#issuecomment-613949848

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2747/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

Indhumathi27 commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408785067

##########
File path: README.md
##########
@@ -55,9 +55,9 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com
* [Configuring CarbonData](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
* [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
* [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
-* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md)
- * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md)
- * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
+* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/index/index-management.md)
+ * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/index/bloomfilter-index-guide.md)

Review comment:
Please change from datamap to index

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

GitBox

[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3708: [WIP] Update index documents

In reply to this post by GitBox

ShreelekhyaG commented on a change in pull request #3708: [WIP] Update index documents
URL: https://github.com/apache/carbondata/pull/3708#discussion_r408810115

##########
File path: README.md
##########
@@ -55,9 +55,9 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com
* [Configuring CarbonData](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
* [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
* [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
-* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md)
- * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md)
- * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
+* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/index/index-management.md)
+ * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/index/bloomfilter-index-guide.md)

Review comment:
ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]

With regards,
Apache Git Services

12