Hi, does this version support for the updating and deleting with spark-2.1? Seems like it does not support, what time is it planned to support it? ------------------ Original ------------------ From: "ravipesala [via Apache CarbonData Mailing List archive]";<[hidden email]>; Date: Sun, Mar 26, 2017 01:16 PM To: "恩爸"<[hidden email]>; Subject: [DISCUSSION] Initiating Apache CarbonData-1.1.0 incubating Release As planned we are going to release Apache CarbonData-1.1.0. Please discuss and vote for it to initiate 1.1.0 release, i will start to prepare the release after 3-days of discussion. It will have following features. 1. Introduced new data format called V3(version 3). Improves the sequential IO by keeping larger size blocklets.So read larger data at once to memory. Introduced pages with size of 32000 each for every column inside blocklet. And min/max is maintained for each page to improve the filter queries. Improved compression/decompression of row pages. Our all performance is improved by 50% compare to old format as per TPC-H benchmark results. 2. Alter table support in carbondata. (Only for Spark 2.1) Support renaming of existing table. Support adding of new column. Support removing of new column. Support Upcasting(Ex: from smallint to int) of datatype 3. Supported Batch Sort to improve dataloading performance. It makes sort step as non blocking step and capable of sorting whole batch in memory and converts to carbondata file. 4. Improved Single pass load by upgrading to latest netty framework and launched dictionary client for each loading 5. Supported range filters to combine the between filters to one filter to improve the filter performance. 6. Apart from features many bugs and improvements are done in this release. -- Thanks & Regards, Ravindra If you reply to this email, your message will be added to the discussion below:
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Initiating-Apache-CarbonData-1-1-0-incubating-Release-tp9623.html
To start a new topic under Apache CarbonData Mailing List archive, email [hidden email] To unsubscribe from Apache CarbonData Mailing List archive, click here. NAML |
Administrator
|
Hi
Yes, update and delete feature with spark-2.x, will be supported after 1.1.0. As planed , 1.2 would support it or earlier. Regards Liang
|
Hi, Liang:
Thanks for your reply. |
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
Administrator
|
Hi
Please create a new mailing list discussion for your topic. Please provide all columns' cardinality. For high cardinality column, system doesn't do dictionary ------------------------------------------------------- ##threshold to identify whether high cardinality column #high.cardinality.threshold=1000000 Regards Liang
|
Hi DEV,
I create table according to the below SQL cc.sql(""" CREATE TABLE IF NOT EXISTS t3 (ID Int, date Timestamp, country String, name String, phonetype String, serialname String, salary Int, name1 String, name2 String, name3 String, name4 String, name5 String, name6 String, name7 String, name8 String ) STORED BY 'carbondata' """) data cardinality as below. | column cardinality | | name | name1 | name2 | name3 | name4 | name5 | name6 | name7 | name8 | | 10000000 | 10000000 | 10000000 | 10000000 | 10000000 | 10000000 | 10000000 | 10000000 | 10000000 | after I load data to this table, I found the dimension columns "name" and "name7" both have no dictionary encode. but column "name" has no inverted index and column "name7" has inverted index questions: 1. the dimension column name has dictionary decode, but have no inverted index, does its' data still have order in DataChunk2 blocklet? 2. is there any document to introduce these loading strategies? 3. if a dimension column has no dictionary decode and no inverted index, user also didn't specify the column with no inverted index when create table does its' data still have order in DataChunk2 blocklet? 4. as I know, by default, all dimension column data are sorted and stored in DataChunk2 blocklet except user specify the column with no inverted index, right? 5. as I know the first dimension column of mdk key is always sorted in DataChunk2 blocklet, why not set the isExplicitSorted to true? the attached is used to generate the data.csv package test; import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.FileWriter; import java.util.HashMap; import java.util.Map; publicclass CreateData { public CreateData() { } publicstaticvoid main(String[] args) { FileOutputStream out = null; FileOutputStream outSTr = null; BufferedOutputStream Buff = null; FileWriter fw = null; intcount = 1000;// 写文件行数 try { outSTr = new FileOutputStream(new File("data.csv")); Buff = new BufferedOutputStream(outSTr); longbegin0 = System.currentTimeMillis(); Buff.write( "ID,date,country,name,phonetype,serialname,salary,name1,name2,name3,name4,name5,name6,name7,name8\n" .getBytes()); intidcount = 10000000; intdatecount = 30; intcountrycount = 5; // intnamecount =5000000; intphonetypecount = 10000; intserialnamecount = 50000; // intsalarycount = 200000; Map<Integer, String> countryMap = new HashMap<Integer, String>(); countryMap.put(1, "usa"); countryMap.put(2, "uk"); countryMap.put(3, "china"); countryMap.put(4, "indian"); countryMap.put(0, "canada"); StringBuilder sb = null; for (inti = idcount; i >= 0; i--) { sb = new StringBuilder(); sb.append(4000000 + i).append(",");// id sb.append("2015/8/" + (i % datecount + 1)).append(","); sb.append(countryMap.get(i % countrycount)).append(","); sb.append("name" + (1600000 - i)).append(",");// name sb.append("phone" + i % phonetypecount).append(","); sb.append("serialname" + (100000 + i % serialnamecount)).append(",");// serialname sb.append(i + 500000).append(","); sb.append("name1" + (i + 100000)).append(",");// name sb.append("name2" + (i + 200000)).append(",");// name sb.append("name3" + (i + 300000)).append(",");// name sb.append("name4" + (i + 400000)).append(",");// name sb.append("name5" + (i + 500000)).append(",");// name sb.append("name6" + (i + 600000)).append(",");// name sb.append("name7" + (i + 700000)).append(",");// name sb.append("name8" + (i + 800000)).append(",").append('\n'); Buff.write(sb.toString().getBytes()); } Buff.flush(); Buff.close(); System.out.println("sb.toString():" + sb.toString()); longend0 = System.currentTimeMillis(); System.out.println("BufferedOutputStream执行耗时:" + (end0 - begin0) + " 豪秒"); } catch (Exception e) { e.printStackTrace(); } finally { try { // fw.close(); Buff.close(); outSTr.close(); // out.close(); } catch (Exception e) { e.printStackTrace(); } } } } |
In reply to this post by xm_zzc
+1
-Regards Kumar Vishal On Mar 27, 2017 09:31, "xm_zzc" <[hidden email]> wrote: > Hi, Liang: > Thanks for your reply. > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Re-DISCUSSION- > Initiating-Apache-CarbonData-1-1-0-incubating-Release-tp9672p9680.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. >
kumar vishal
|
+1
Regards Manish Gupta On Mon, Mar 27, 2017 at 2:41 PM, Kumar Vishal <[hidden email]> wrote: > +1 > -Regards > Kumar Vishal > > On Mar 27, 2017 09:31, "xm_zzc" <[hidden email]> wrote: > > > Hi, Liang: > > Thanks for your reply. > > > > > > > > -- > > View this message in context: http://apache-carbondata- > > mailing-list-archive.1130556.n5.nabble.com/Re-DISCUSSION- > > Initiating-Apache-CarbonData-1-1-0-incubating-Release-tp9672p9680.html > > Sent from the Apache CarbonData Mailing List archive mailing list archive > > at Nabble.com. > > > |
Administrator
|
This post was updated on .
In reply to this post by simafengyun
Hi
Can you provide one table to show your info, can't see very clear? The column of high cardinality(>1000000) would not do dictionary. Regards Liang 2017-03-27 14:32 GMT+05:30 马云 <simafengyun1984@163.com>: > Hi DEV, > > I create table according to the below SQL > > cc.sql(""" > > CREATE TABLE IF NOT EXISTS t3 > > (ID Int, > > date Timestamp, > > country String, > > name String, > > phonetype String, > > serialname String, > > salary Int, > > name1 String, > > name2 String, > > name3 String, > > name4 String, > > name5 String, > > name6 String, > > name7 String, > > name8 String > > ) > > STORED BY 'carbondata' > > """) > > > > data cardinality as below. > > | > > column cardinality > > | > | > > name > > | > > name1 > > | > > name2 > > | > > name3 > > | > > name4 > > | > > name5 > > | > > name6 > > | > > name7 > > | > > name8 > > | > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > 10000000 > > | > > > > after I load data to this table, I found the dimension columns "name" and > "name7" both have no dictionary encode. > > but column "name" has no inverted index and column "name7" has inverted > index > > questions: > > 1. the dimension column name has dictionary decode, but have no inverted > index, does its' data still have order in DataChunk2 blocklet? > > 2. is there any document to introduce these loading strategies? > > > 3. if a dimension column has no dictionary decode and no inverted > index, user also didn't specify the column with no inverted index when > create table > does its' data still have order in DataChunk2 blocklet? > > 4. as I know, by default, all dimension column data are sorted and stored > in DataChunk2 blocklet except user specify the column with no inverted > index, right? > > 5. as I know the first dimension column of mdk key is always sorted in > DataChunk2 blocklet, why not set the isExplicitSorted to true? > > > > the attached is used to generate the data.csv > > package test; > > > > > import java.io.BufferedOutputStream; > > import java.io.File; > > import java.io.FileOutputStream; > > import java.io.FileWriter; > > import java.util.HashMap; > > import java.util.Map; > > > > > publicclass CreateData { > > > > > public CreateData() { > > > > > } > > > > > publicstaticvoid main(String[] args) { > > > > > FileOutputStream out = null; > > > > > FileOutputStream outSTr = null; > > > > > BufferedOutputStream Buff = null; > > > > > FileWriter fw = null; > > > > > intcount = 1000;// 写文件行数 > > > > > try { > > > > > outSTr = new FileOutputStream(new File("data.csv")); > > > > > Buff = new BufferedOutputStream(outSTr); > > > > > longbegin0 = System.currentTimeMillis(); > > Buff.write( > > "ID,date,country,name,phonetype,serialname,salary, > name1,name2,name3,name4,name5,name6,name7,name8\n" > > .getBytes()); > > > > > intidcount = 10000000; > > intdatecount = 30; > > intcountrycount = 5; > > // intnamecount =5000000; > > intphonetypecount = 10000; > > intserialnamecount = 50000; > > // intsalarycount = 200000; > > Map<Integer, String> countryMap = new HashMap<Integer, String>(); > > countryMap.put(1, "usa"); > > countryMap.put(2, "uk"); > > countryMap.put(3, "china"); > > countryMap.put(4, "indian"); > > countryMap.put(0, "canada"); > > > > > StringBuilder sb = null; > > for (inti = idcount; i >= 0; i--) { > > > > > sb = new StringBuilder(); > > sb.append(4000000 + i).append(",");// id > > sb.append("2015/8/" + (i % datecount + 1)).append(","); > > sb.append(countryMap.get(i % countrycount)).append(","); > > sb.append("name" + (1600000 - i)).append(",");// name > > sb.append("phone" + i % phonetypecount).append(","); > > sb.append("serialname" + (100000 + i % > serialnamecount)).append(",");// serialname > > sb.append(i + 500000).append(","); > > sb.append("name1" + (i + 100000)).append(",");// name > > sb.append("name2" + (i + 200000)).append(",");// name > > sb.append("name3" + (i + 300000)).append(",");// name > > sb.append("name4" + (i + 400000)).append(",");// name > > sb.append("name5" + (i + 500000)).append(",");// name > > sb.append("name6" + (i + 600000)).append(",");// name > > sb.append("name7" + (i + 700000)).append(",");// name > > sb.append("name8" + (i + 800000)).append(",").append('\n'); > > > > > Buff.write(sb.toString().getBytes()); > > > > > } > > > > > Buff.flush(); > > > > > Buff.close(); > > System.out.println("sb.toString():" + sb.toString()); > > longend0 = System.currentTimeMillis(); > > > > > System.out.println("BufferedOutputStream执行耗时:" + (end0 - begin0) + > " 豪秒"); > > > > > } catch (Exception e) { > > > > > e.printStackTrace(); > > > > > } > > > > > finally { > > > > > try { > > > > > // fw.close(); > > > > > Buff.close(); > > > > > outSTr.close(); > > > > > // out.close(); > > > > > } catch (Exception e) { > > > > > e.printStackTrace(); > > > > > } > > > > > } > > > > > } > > > > > } -- Regards Liang |
In reply to this post by manishgupta88
Sure, lets do one more release
+1 On Mon, Mar 27, 2017 at 2:58 AM, manish gupta <[hidden email]> wrote: > +1 > > Regards > Manish Gupta > > On Mon, Mar 27, 2017 at 2:41 PM, Kumar Vishal <[hidden email]> > wrote: > > > +1 > > -Regards > > Kumar Vishal > > > > On Mar 27, 2017 09:31, "xm_zzc" <[hidden email]> wrote: > > > > > Hi, Liang: > > > Thanks for your reply. > > > > > > > > > > > > -- > > > View this message in context: http://apache-carbondata- > > > mailing-list-archive.1130556.n5.nabble.com/Re-DISCUSSION- > > > Initiating-Apache-CarbonData-1-1-0-incubating-Release-tp9672p9680.html > > > Sent from the Apache CarbonData Mailing List archive mailing list > archive > > > at Nabble.com. > > > > > > |
Free forum by Nabble | Edit this page |