[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

qiuchenjian-2
GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/2685

    [WIP] Support backward compatability in fileformat and added tests for load with different sort orders

    …
   
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata fileformat-back-compat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2685
   
----
commit c265f37caf6d12eb2158e7d5d1cbbe0d777710ed
Author: ravipesala <ravi.pesala@...>
Date:   2018-08-30T15:11:06Z

    Support backward compatability in fileformat and added tests for load with different sort orders

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [WIP] Support backward compatability in fileformat a...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/167/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [WIP] Support backward compatability in fileformat a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8238/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [WIP] Support backward compatability in fileformat a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/182/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [WIP] Support backward compatability in fileformat a...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8253/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2685#discussion_r214627216
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java ---
    @@ -135,6 +135,18 @@ private static BitSet getPresenceMeta(
        * @throws IOException problem while reading the index file
        */
       public List<DataFileFooter> getIndexInfo(String filePath, byte[] fileData) throws IOException {
    +    return getIndexInfo(filePath, fileData, true);
    +  }
    +
    +  /**
    +   * Below method will be used to get the index info from index file
    +   *
    +   * @param filePath           file path of the index file
    --- End diff --
   
    These comments do no provide any useful information, better to remove them


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2685#discussion_r214625972
 
    --- Diff: integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala ---
    @@ -590,12 +596,60 @@ class SparkCarbonDataSourceTest extends FunSuite with BeforeAndAfterAll {
         }
       }
     
    +  test("test read using old data") {
    +    val store = new StoreCreator(new File(warehouse1).getAbsolutePath,
    +      new File(warehouse1 + "../../../../../hadoop/src/test/resources/data.csv").getCanonicalPath,
    +      false)
    +    store.createCarbonStore()
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0/0"))
    +    val dfread = spark.read.format("carbon").load(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0")
    +    dfread.show(false)
    +    spark.sql("drop table if exists parquet_table")
    +  }
    +
    +  test("test read using different sort order data") {
    +    spark.sql("drop table if exists old_comp")
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb"))
    +    val store = new StoreCreator(new File(warehouse1).getAbsolutePath,
    +      new File(warehouse1 + "../../../../../hadoop/src/test/resources/data.csv").getCanonicalPath,
    +      false)
    +    store.setSortCOls(new util.ArrayList[String](Seq ("name").asJava))
    +    var model = store.createTableAndLoadModel(false)
    +    model.setSegmentId("0")
    +    store.createCarbonStore(model)
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_0/0"))
    +    store.setSortCOls(new util.ArrayList[String](Seq ("country,phonetype").asJava))
    +    model = store.createTableAndLoadModel(false)
    +    model.setSegmentId("1")
    +    store.createCarbonStore(model)
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_1/0"))
    +    store.setSortCOls(new util.ArrayList[String](Seq ("date").asJava))
    +    model = store.createTableAndLoadModel(false)
    +    model.setSegmentId("2")
    +    store.createCarbonStore(model)
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_2/0"))
    +    store.setSortCOls(new util.ArrayList[String](Seq ("serialname").asJava))
    +    model = store.createTableAndLoadModel(false)
    +    model.setSegmentId("3")
    +    store.createCarbonStore(model)
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb/testtable/Fact/Part0/Segment_3/0"))
    +    spark.sql(s"create table old_comp(id int, date string, country string, name string, phonetype string, serialname string, salary int) using carbon options(path='$warehouse1/testdb/testtable/Fact/Part0/', 'sort_columns'='name')")
    +    assert(spark.sql("select * from old_comp where country='china'").count() == 3396)
    +    assert(spark.sql("select * from old_comp ").count() == 4000)
    +    spark.sql("drop table if exists old_comp")
    +
    +    spark.sql(s"create table old_comp1 using carbon options(path='$warehouse1/testdb/testtable/Fact/Part0/')")
    +    assert(spark.sql("select * from old_comp1 where country='china'").count() == 3396)
    +    assert(spark.sql("select * from old_comp1 ").count() == 4000)
    +    spark.sql("drop table if exists old_comp1")
    +    FileFactory.deleteAllFilesOfDir(new File(warehouse1+"/testdb"))
    +  }
     
       override protected def beforeAll(): Unit = {
         drop
       }
     
    -  override def afterAll(): Unit = {
    --- End diff --
   
    unnecessary modification...


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2685#discussion_r214627971
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java ---
    @@ -153,6 +165,14 @@ private static BitSet getPresenceMeta(
           for (int i = 0; i < table_columns.size(); i++) {
             columnSchemaList.add(thriftColumnSchemaToWrapperColumnSchema(table_columns.get(i)));
           }
    +      // In case of non transactional table just set columnuniqueid as columnName to support
    +      // backward compatabiity. non transactional tables column uniqueid is always equal to
    +      // columnname
    +      if (!isTransactionalTable) {
    --- End diff --
   
    Is the columnSchema only for runtime usage?
    If not, Will it be problem if use change nonTransactionalTable to transactionalTable?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [WIP] Support backward compatability in filef...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2685#discussion_r214626402
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/testutil/StoreCreator.java ---
    @@ -101,18 +101,32 @@
     
       private static LogService LOG =
           LogServiceFactory.getLogService(StoreCreator.class.getCanonicalName());
    -  private static AbsoluteTableIdentifier absoluteTableIdentifier;
    -  private static String storePath = null;
    +  private AbsoluteTableIdentifier absoluteTableIdentifier;
    +  private String storePath = null;
    +  private String csvPath;
    +  private boolean dictionary;
    +  private List<String> sortCOls = new ArrayList<>();
    --- End diff --
   
    typing error


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8270/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/199/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8272/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/201/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8279/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/208/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/212/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8283/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    1. I think there may be some bugs in index datamap currently to support 'different sort columns per load'
    2. What about different dictionary_include per load?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2685: [CARBONDATA-2910] Support backward compatabil...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2685#discussion_r214886972
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java ---
    @@ -630,6 +638,24 @@ private boolean useMinMaxForExecutorPruning(FilterResolverIntf filterResolverInt
         return useMinMaxForPruning;
       }
     
    +  @Override
    +  public List<Blocklet> prune(Expression expression, SegmentProperties segmentProperties,
    +      List<PartitionSpec> partitions, AbsoluteTableIdentifier identifier) throws IOException {
    +    FilterResolverIntf filterResolverIntf = null;
    +    if (expression != null) {
    +      SegmentProperties properties = getSegmentProperties();
    +      QueryModel.FilterProcessVO processVO =
    +          new QueryModel.FilterProcessVO(properties.getDimensions(), properties.getMeasures(),
    +              new ArrayList<CarbonDimension>());
    +      QueryModel.processFilterExpression(processVO, expression, null, null);
    +      // Optimize Filter Expression and fit RANGE filters is conditions apply.
    +      FilterOptimizer rangeFilterOptimizer = new RangeFilterOptmizer(expression);
    +      rangeFilterOptimizer.optimizeFilter();
    +      filterResolverIntf = CarbonTable.resolveFilter(expression, identifier);
    --- End diff --
   
    can we pull up the transformation from expression to filterResolverIntf  so that we can reuse most code instead of adding another `prune` method everywhere


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2685
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/7/



---
12