[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294417
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala ---
    @@ -695,7 +695,29 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
           sql(s"select * from datamap_test5 where name='n10'"))
         checkAnswer(sql("SELECT * FROM datamap_test5 WHERE TEXT_MATCH('city:c020')"),
           sql(s"SELECT * FROM datamap_test5 WHERE city='c020'"))
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +  }
     
    +  test("test lucene fine grain datamap rebuild") {
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +    sql(
    +      """
    +        | CREATE TABLE datamap_test5(id INT, name STRING, city STRING, age INT)
    +        | STORED BY 'carbondata'
    +        | TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT')
    +      """.stripMargin)
    +    sql(
    +      s"""
    +         | CREATE DATAMAP dm ON TABLE datamap_test5
    +         | USING 'lucene'
    +         | WITH DEFERRED REBUILD
    +         | DMProperties('INDEX_COLUMNS'='city')
    +      """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE datamap_test5 OPTIONS('header'='false')")
    +    sql("REBUILD DATAMAP dm ON TABLE datamap_test5")
    --- End diff --
   
    Before this statement, what's the status of the datamap? Can we add a case to test it here?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294094
 
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java ---
    @@ -153,8 +153,8 @@ public DataMapWriter createWriter(Segment segment, String shardName) {
       }
     
       @Override
    -  public DataMapRefresher createRefresher(Segment segment, String shardName) {
    -    return new LuceneDataMapRefresher(getCarbonTable().getTablePath(), dataMapName,
    +  public DataMapBuilder createBuilder(Segment segment, String shardName) {
    --- End diff --
   
    A question that does not relate to this PR: what's the relationship between a `segment` and a `shard`? (in traditional carbondata table storage and in other table storage, such as external table/ hive-partition-like table...)


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294354
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    --- End diff --
   
    I think we can do this judgement earlier.
    Currently, we
    1. iterate all the datamaps,
    2. then filter out the visible ones,
    3. then filter out the corse/fine ones,
    4. then apply the filter-expression to filter out the datamaps,
    5. then judge whether the datamap is enabled...
    We can migrate step5 with step2 and skip step3~4 if the datamap is disabled.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294474
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -181,7 +181,8 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
       protected val ON = carbonKeyWord("ON")
       protected val DMPROPERTIES = carbonKeyWord("DMPROPERTIES")
       protected val SELECT = carbonKeyWord("SELECT")
    -  protected val REFRESH = carbonKeyWord("REFRESH")
    +  protected val REBUILD = carbonKeyWord("REBUILD")
    +  protected val DEFERRED = carbonKeyWord("DEFERRED")
    --- End diff --
   
    Now, Carbondata has its own reserved keywords, so we'd better add an document to list them in one place as reference.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294566
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
    --- End diff --
   
    Just come across the style problems. Delete the write spaces after '{' and before '}'.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294368
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -447,18 +450,22 @@ protected Expression getFilterPredicates(Configuration configuration) {
               cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
         }
         // Now try to prune with FG DataMap.
    -    dataMapExprWrapper = DataMapChooser.get()
    +    DataMapExprWrapper fgDataMapExprWrapper = DataMapChooser.get()
             .chooseFGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (dataMapExprWrapper != null && dataMapExprWrapper.getDataMapLevel() == DataMapLevel.FG
    -        && isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
    +    if (fgDataMapExprWrapper != null &&
    --- End diff --
   
    Same as Line435


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294696
 
    --- Diff: integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala ---
    @@ -41,25 +42,34 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
         sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
       }
     
    -  private def checkQuery = {
    +  override def afterEach(): Unit = {
    +    sql(s"DROP TABLE IF EXISTS $normalTable")
    +    sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
    +  }
    +
    +  private def dmSql(sqlText: String, dataMapName: String, shouldHit: Boolean): DataFrame = {
    --- End diff --
   
    better to optimize the method name


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294632
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
   
    Is this method only for tests? If so, better to limit its usage, such as adding annotation 'forTests' or other restictions


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294517
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
           LOGGER.error("Dataload failed due to failure in table status updation.")
           throw new Exception(errorMessage)
    -    } else {
    -      DataMapStatusManager.disableDataMapsOfTable(carbonTable)
    --- End diff --
   
    Is it a bug in previous implementation?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2255
 
    Can you explain the meaning of `deferred rebuild`?
    The data of datamap will be updated only after we fire the `rebuild` statement?
    Do we need to do it only once or have to do it multiple times?
    For a datamap that is not deferred, what will happen if we call `rebuild` on it?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299489
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    +        DataMapStatusManager.isDataMapEnabled(
    +            cgDataMapExprWrapper.getDataMapSchema().getDataMapName())) {
    --- End diff --
   
    This logic should be moved to DataMapStoreManager and DataMapChooser. There when it gets all datamaps it should only enabled datamaps. There should not be any changes here.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299532
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
   
    Test method move to testcase


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299596
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/status/DataMapStatusManager.java ---
    @@ -51,6 +51,16 @@ private DataMapStatusManager() {
         return storageProvider.getDataMapStatusDetails();
       }
     
    +  public static boolean isDataMapEnabled(String dataMapName) throws IOException {
    --- End diff --
   
    By default all datamaps will be disabled when doing loading, but indexdatamap should be taken special case as it will be updated online along with load so  it shouldn't be disabled after load of table. But deffered build it should be disabled after load.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186311641
 
    --- Diff: datamap/examples/src/minmaxdatamap/main/java/org/apache/carbondata/datamap/examples/MinMaxIndexDataMapFactory.java ---
    @@ -96,7 +96,7 @@ public MinMaxIndexDataMapFactory(CarbonTable carbonTable) {
             dataMapMeta.getIndexedColumns());
       }
     
    -  @Override public DataMapRefresher createRefresher(Segment segment, String shardName)
    +  @Override public DataMapRefresher createBuilder(Segment segment, String shardName)
    --- End diff --
   
    fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186312034
 
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java ---
    @@ -153,8 +153,8 @@ public DataMapWriter createWriter(Segment segment, String shardName) {
       }
     
       @Override
    -  public DataMapRefresher createRefresher(Segment segment, String shardName) {
    -    return new LuceneDataMapRefresher(getCarbonTable().getTablePath(), dataMapName,
    +  public DataMapBuilder createBuilder(Segment segment, String shardName) {
    --- End diff --
   
    For a managed table (traditional carbondata table) a shard is an index generated by same write task while loading, inside one segment.
    For external table, shard concept is the same, there are segments and shards inside each partition folder. There is a segment file in meta folder to indicate the segment and shard path.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186348978
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    --- End diff --
   
    ok, fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349004
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -447,18 +450,22 @@ protected Expression getFilterPredicates(Configuration configuration) {
               cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
         }
         // Now try to prune with FG DataMap.
    -    dataMapExprWrapper = DataMapChooser.get()
    +    DataMapExprWrapper fgDataMapExprWrapper = DataMapChooser.get()
             .chooseFGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (dataMapExprWrapper != null && dataMapExprWrapper.getDataMapLevel() == DataMapLevel.FG
    -        && isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
    +    if (fgDataMapExprWrapper != null &&
    --- End diff --
   
    ok, fixed


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349611
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala ---
    @@ -695,7 +695,29 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
           sql(s"select * from datamap_test5 where name='n10'"))
         checkAnswer(sql("SELECT * FROM datamap_test5 WHERE TEXT_MATCH('city:c020')"),
           sql(s"SELECT * FROM datamap_test5 WHERE city='c020'"))
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +  }
     
    +  test("test lucene fine grain datamap rebuild") {
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +    sql(
    +      """
    +        | CREATE TABLE datamap_test5(id INT, name STRING, city STRING, age INT)
    +        | STORED BY 'carbondata'
    +        | TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT')
    +      """.stripMargin)
    +    sql(
    +      s"""
    +         | CREATE DATAMAP dm ON TABLE datamap_test5
    +         | USING 'lucene'
    +         | WITH DEFERRED REBUILD
    +         | DMProperties('INDEX_COLUMNS'='city')
    +      """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE datamap_test5 OPTIONS('header'='false')")
    +    sql("REBUILD DATAMAP dm ON TABLE datamap_test5")
    --- End diff --
   
    added an assert


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349815
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
           LOGGER.error("Dataload failed due to failure in table status updation.")
           throw new Exception(errorMessage)
    -    } else {
    -      DataMapStatusManager.disableDataMapsOfTable(carbonTable)
    --- End diff --
   
    yes


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186350055
 
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
    --- End diff --
   
    fixed


---
1234