Apache CarbonData Dev Mailing List archive

Re: question about dimension's sort order in blocklet level

Posted by Liang Chen on
URL: http://apache-carbondata-dev-mailing-list-archive.168.s1.nabble.com/Re-DISCUSSION-Initiating-Apache-CarbonData-1-1-0-incubating-Release-tp9672p9687.html

Hi

Please create a new mailing list discussion for your topic.
Please provide all columns' cardinality.

For high cardinality column, system doesn't do dictionary
-------------------------------------------------------
##threshold to identify whether high cardinality column
#high.cardinality.threshold=1000000

Regards
Liang

simafengyun wrote

Hi DEV,

I create table according to the below SQL

cc.sql("""
CREATE TABLE IF NOT EXISTS t3
(ID Int, date Timestamp, country String,
name String, phonetype String, serialname String, salary Int,
name1 String, name2 String, name3 String, name4 String, name5 String, name6 String,name7 String,name8 String
)
STORED BY 'carbondata'
""")

after I load data to this table, I found the dimension columns "name" and "name7" both have no dictionary encode.
column "name" has no inverted index but column "name7" has inverted index
questions:
1. why by default they have no dictionary decode and some have no inverted index？
2. is there any document to introduce these loading strategies?
3. the dimension column "name" has no inverted index, does its' data still have order in DataChunk2 blocklet?
4. as I know, usually dimension column data is sorted and stored in DataChunk2 blocklet.
which cases the dimension column data are not sorted in DataChunk2 blocklet except user specify the column with no inverted index?

5. as I know the first column of mdk key is always sorted in DataChunk2 blocklet, why not set the isExplicitSorted to true?