[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

qiuchenjian-2
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1484

    [CARBONDATA-1700][DataLoad] Add TableProperties during (de)serialization of TableSchema

    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [X] Any interfaces changed?
     `NO`
     - [X] Any backward compatibility impacted?
     `NO`
     - [X] Document update required?
    `NO`
     - [X] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            `UNABLE TO TEST IT IN TEST CASE, CAN BE VERIFIED MANUALLY`
            - How it is tested? Please attach test report.
            `TEST IN MANUALLY`
            - Is it a performance related change? Please attach the performance test report.
            `NO`
            - Any additional information to help reviewers in testing this change.
            `NONE`
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
            `NOT RELATED`
   
    COPY FROM JIRA
    ======
   
    # scenario
   
    I encounterd loading data to existed carbondata table failure after query the table after restarting spark session. I have this failure in spark local mode (found it during local test) and haven't test in other scenarioes.
   
    The problem can be reproduced by following steps:
   
    0. START: start a session;
    1. CREATE: create table `t1`;
    2. LOAD: create a dataframe and write apppend to `t1`;
    3. STOP: stop current session;
   
    4. START: start a session;
    5. QUERY: query table `t1`;  ----  This step is essential to reproduce the problem.
    6. LOAD: create a dataframe and write append to `t1`;  --- This step will be failed.
   
    Error will be thrown in Step6. The error message in console looks like
   
    ```
    java.lang.NullPointerException was thrown.
    java.lang.NullPointerException
    at org.apache.spark.sql.execution.command.management.LoadTableCommand.processData(LoadTableCommand.scala:92)
    at org.apache.spark.sql.execution.command.management.LoadTableCommand.run(LoadTableCommand.scala:60)
    at org.apache.spark.sql.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:141)
    at org.apache.spark.sql.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:50)
    at org.apache.spark.sql.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:42)
    at org.apache.spark.sql.CarbonSource.createRelation(CarbonSource.scala:110)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
    ```
   
    The following code can be pasted in `TestLoadDataFrame.scala` to reproduce this problem —— but keep
    in mind you should manually run the first test and then the second in different iteration (to make sure that the sparksession is restarted).
   
    ```
      test("prepare") {
        sql("drop table if exists carbon_stand_alone")
        sql( "create table if not exists carbon_stand_alone (c1 string, c2 string, c3 int)" +
        " stored by 'carbondata'").collect()
        sql("select * from carbon_stand_alone").show()
        df.write
          .format("carbondata")
          .option("tableName", "carbon_stand_alone")
          .option("tempCSV", "false")
          .mode(SaveMode.Append)
          .save()
      }
   
      test("test load dataframe after query") {
   
        sql("select * from carbon_stand_alone").show()
   
        // the following line will cause failure
        df.write
          .format("carbondata")
          .option("tableName", "carbon_stand_alone")
          .option("tempCSV", "false")
          .mode(SaveMode.Append)
          .save()
   
        // if it works fine, it sould be true
        checkAnswer(
          sql("select count(*) from carbon_stand_alone where c3 > 500"), Row(31500 * 2)
        )
      }
    ```
   
    # ANALYSE
    I went through the code and found the problem was caused by NULL `tableProperties` in `tablemeta: tableMeta.carbonTable.getTableInfo
          .getFactTable.getTableProperties` (we will name it `propertyInTableInfo` for short) is null in Line89 in `LoadTableCommand.scala`.
   
    After debug, I found that the `propertyInTableInfo` sett in `CarbonTableInputFormat.setTableInfo(...)` had the correct value. But `CarbonTableInputFormat.getTableInfo(...)` had the incorrect value. The setter is used to serialized TableInfo, while the getter is used to deserialized TableInfo ———— That means there are something wrong in serialization-deserialization.
   
    Keep diving into the code, I found that serialization and deserialization in `TableSchema`, a member of `TableInfo`, ignores the `tableProperties` member, thus causing this value empty after deserialization. Since this value has not been initialized in construtor, so the value remains `NULL` and cause the NPE problem.
   
    # RESOLVE
   
    1. Initialize `tableProperties` in `TableSchema`
    2. Include `tableProperties` in serialization-deserialization of `TableSchema`
   
    # Notes
   
    Although the bug has been fix, I still can't understand why the problem can be triggered in above way.
   
    Tests need the sparksession to be restarted, which is impossible currently, so no tests will be added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata bug_table_property_NPE

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1484
   
----
commit 47f70c7c2b7c0363e982335817055ddfd9c8b84d
Author: xuchuanyin <[hidden email]>
Date:   2017-11-10T13:10:55Z

    Add TableProperties during (de)serialization of TableSchema

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/975/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1595/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1024/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1642/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1484#discussion_r150413184
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchema.java ---
    @@ -77,6 +78,7 @@
     
       public TableSchema() {
         this.listOfColumns = new ArrayList<ColumnSchema>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    this.tableProperties = new HashMap<String, String>(5);
    --- End diff --
   
    I think it is ok to use default constructor of HashMap, 5 is not needed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    I think this problem may because of the relation cache in driver memory


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1484#discussion_r150432165
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchema.java ---
    @@ -77,6 +78,7 @@
     
       public TableSchema() {
         this.listOfColumns = new ArrayList<ColumnSchema>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    this.tableProperties = new HashMap<String, String>(5);
    --- End diff --
   
    yeah, I just follow the previous line to set a default value and think 16 is too big. Sure to fix it.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1037/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1655/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    hi, @ravipesala can you review it?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1484: [CARBONDATA-1700][DataLoad] Add TableProperties duri...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1484
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1484: [CARBONDATA-1700][DataLoad] Add TableProperti...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1484


---