GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2685 [WIP] Support backward compatability in fileformat and added tests for load with different sort orders ⦠Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata fileformat-back-compat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2685.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2685 ---- commit c265f37caf6d12eb2158e7d5d1cbbe0d777710ed Author: ravipesala <ravi.pesala@...> Date: 2018-08-30T15:11:06Z Support backward compatability in fileformat and added tests for load with different sort orders ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/167/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8238/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/182/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8253/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2685#discussion_r214627216 --- Diff: core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java --- @@ -135,6 +135,18 @@ private static BitSet getPresenceMeta( * @throws IOException problem while reading the index file */ public List<DataFileFooter> getIndexInfo(String filePath, byte[] fileData) throws IOException { + return getIndexInfo(filePath, fileData, true); + } + + /** + * Below method will be used to get the index info from index file + * + * @param filePath file path of the index file --- End diff -- These comments do no provide any useful information, better to remove them --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2685#discussion_r214625972 --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala --- @@ -590,12 +596,60 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll { } } + test("test read using old data") { + val store = new StoreCreator(new File(warehouse1).getAbsolutePath, + new File(warehouse1 + "../../../../../hadoop/src/test/resources/data.csv").getCanonicalPath, + false) + store.createCarbonStore() + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0/0")) + val dfread = spark.read.format("carbon").load(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0") + dfread.show(false) + spark.sql("drop table if exists parquet_table") + } + + test("test read using different sort order data") { + spark.sql("drop table if exists old_comp") + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb")) + val store = new StoreCreator(new File(warehouse1).getAbsolutePath, + new File(warehouse1 + "../../../../../hadoop/src/test/resources/data.csv").getCanonicalPath, + false) + store.setSortCOls(new util.ArrayList[String](Seq ("name").asJava)) + var model = store.createTableAndLoadModel(false) + model.setSegmentId("0") + store.createCarbonStore(model) + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0/0")) + store.setSortCOls(new util.ArrayList[String](Seq ("country,phonetype").asJava)) + model = store.createTableAndLoadModel(false) + model.setSegmentId("1") + store.createCarbonStore(model) + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_1/0")) + store.setSortCOls(new util.ArrayList[String](Seq ("date").asJava)) + model = store.createTableAndLoadModel(false) + model.setSegmentId("2") + store.createCarbonStore(model) + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_2/0")) + store.setSortCOls(new util.ArrayList[String](Seq ("serialname").asJava)) + model = store.createTableAndLoadModel(false) + model.setSegmentId("3") + store.createCarbonStore(model) + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_3/0")) + spark.sql(s"create table old_comp(id int, date string, country string, name string, phonetype string, serialname string, salary int) using carbon options(path='$warehouse1/testdb/testtable/Fact/Part0/', 'sort_columns'='name')") + assert(spark.sql("select * from old_comp where country='china'").count() == 3396) + assert(spark.sql("select * from old_comp ").count() == 4000) + spark.sql("drop table if exists old_comp") + + spark.sql(s"create table old_comp1 using carbon options(path='$warehouse1/testdb/testtable/Fact/Part0/')") + assert(spark.sql("select * from old_comp1 where country='china'").count() == 3396) + assert(spark.sql("select * from old_comp1 ").count() == 4000) + spark.sql("drop table if exists old_comp1") + FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb")) + } override protected def beforeAll(): Unit = { drop } - override def afterAll(): Unit = { --- End diff -- unnecessary modification... --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2685#discussion_r214627971 --- Diff: core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java --- @@ -153,6 +165,14 @@ private static BitSet getPresenceMeta( for (int i = 0; i < table_columns.size(); i++) { columnSchemaList.add(thriftColumnSchemaToWrapperColumnSchema(table_columns.get(i))); } + // In case of non transactional table just set columnuniqueid as columnName to support + // backward compatabiity. non transactional tables column uniqueid is always equal to + // columnname + if (!isTransactionalTable) { --- End diff -- Is the columnSchema only for runtime usage? If not, Will it be problem if use change nonTransactionalTable to transactionalTable? --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2685#discussion_r214626402 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/testutil/StoreCreator.java --- @@ -101,18 +101,32 @@ private static LogService LOG = LogServiceFactory.getLogService(StoreCreator.class.getCanonicalName()); - private static AbsoluteTableIdentifier absoluteTableIdentifier; - private static String storePath = null; + private AbsoluteTableIdentifier absoluteTableIdentifier; + private String storePath = null; + private String csvPath; + private boolean dictionary; + private List<String> sortCOls = new ArrayList<>(); --- End diff -- typing error --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8270/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/199/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8272/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/201/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8279/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/208/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/212/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8283/ --- |
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/2685 1. I think there may be some bugs in index datamap currently to support 'different sort columns per load' 2. What about different dictionary_include per load? --- |
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2685#discussion_r214886972 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java --- @@ -630,6 +638,24 @@ private boolean useMinMaxForExecutorPruning(FilterResolverIntf filterResolverInt return useMinMaxForPruning; } + @Override + public List<Blocklet> prune(Expression expression, SegmentProperties segmentProperties, + List<PartitionSpec> partitions, AbsoluteTableIdentifier identifier) throws IOException { + FilterResolverIntf filterResolverIntf = null; + if (expression != null) { + SegmentProperties properties = getSegmentProperties(); + QueryModel.FilterProcessVO processVO = + new QueryModel.FilterProcessVO(properties.getDimensions(), properties.getMeasures(), + new ArrayList<CarbonDimension>()); + QueryModel.processFilterExpression(processVO, expression, null, null); + // Optimize Filter Expression and fit RANGE filters is conditions apply. + FilterOptimizer rangeFilterOptimizer = new RangeFilterOptmizer(expression); + rangeFilterOptimizer.optimizeFilter(); + filterResolverIntf = CarbonTable.resolveFilter(expression, identifier); --- End diff -- can we pull up the transformation from expression to filterResolverIntf so that we can reuse most code instead of adding another `prune` method everywhere --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2685 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/7/ --- |
Free forum by Nabble | Edit this page |