[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4222/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4524/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5389/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4527/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184397997
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala ---
    @@ -212,13 +224,18 @@ case class CarbonAlterTableCompactionCommand(
         carbonLoadModel.setFactTimeStamp(loadStartTime)
     
         val isCompactionTriggerByDDl = true
    +    var segmentIds: Option[List[String]] = None
    +    if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
    +      segmentIds = alterTableModel.customSegmentIds
    +    }
    --- End diff --
   
    Modify the code to
    val segmentIds: Option[List[String]] = if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
          alterTableModel.customSegmentIds
        } else {
          None
    }


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184398253
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala ---
    @@ -212,13 +224,18 @@ case class CarbonAlterTableCompactionCommand(
         carbonLoadModel.setFactTimeStamp(loadStartTime)
     
         val isCompactionTriggerByDDl = true
    +    var segmentIds: Option[List[String]] = None
    +    if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
    +      segmentIds = alterTableModel.customSegmentIds
    +    }
         val compactionModel = CompactionModel(compactionSize,
           compactionType,
           carbonTable,
           isCompactionTriggerByDDl,
           CarbonFilters.getCurrentPartitions(sqlContext.sparkSession,
             TableIdentifier(carbonTable.getTableName,
    -        Some(carbonTable.getDatabaseName)))
    +        Some(carbonTable.getDatabaseName))),
    +      segmentIds
    --- End diff --
   
    Please check for code alignment here. It does not seem to be proper


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184401040
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ---
    @@ -444,6 +450,26 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) {
         });
       }
     
    +  /**
    +   * This method will return the list of loads which are specified by user in SQL.
    +   *
    +   * @param listOfSegments
    +   * @param segmentIds
    +   * @return
    +   */
    +  private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments(
    +          List<LoadMetadataDetails> listOfSegments,
    +          Set<String> segmentIds) {
    +    List<LoadMetadataDetails> listOfSegmentsSpecified =
    +            new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    for (LoadMetadataDetails detail : listOfSegments) {
    +      if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) {
    +        listOfSegmentsSpecified.add(detail);
    +      }
    +    }
    +    return listOfSegmentsSpecified;
    --- End diff --
   
    In case of custom compaction user is completely aware of the segments provided in the Alter SQL. Therefore in the segment List provided by the user if any segment is found invalid we should throw exception and compaction process should be aborted


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184403180
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala ---
    @@ -566,6 +567,11 @@ object AlterPreAggregateTableCompactionPostListener extends OperationEventListen
         val compactionType = compactionEvent.carbonMergerMapping.campactionType
         val carbonLoadModel = compactionEvent.carbonLoadModel
         val sparkSession = compactionEvent.sparkSession
    +    val segmentIds = if (compactionType == CompactionType.CUSTOM) {
    +      Some(compactionEvent.carbonMergerMapping.validSegments.map(x => x.getSegmentNo).toList)
    --- End diff --
   
    Custom compaction should be done only for the given table in the Alter SQL. Custom compaction executed on main table should not be applied on child tables/datamaps as one to one mapping of segments might not be there between main table and its child tables.
    Custom compaction can be done for child tables/datamaps directly like main table by specifying the child table/datamap name in the alter SQL.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184386961
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/CompactionSupportSpecifiedSegmentsTest.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the"License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an"AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.spark.testsuite.datacompaction
    +
    +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
    +
    +
    +class CompactionSupportSpecifiedSegmentsTest
    +  extends QueryTest with BeforeAndAfterEach with BeforeAndAfterAll {
    +
    +  val filePath: String = resourcesPath + "/globalsort/sample1.csv"
    +
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll(): Unit = {
    +    super.afterAll()
    +  }
    +
    +  override def beforeEach {
    +    resetConf()
    +    sql("DROP TABLE IF EXISTS seg_compact")
    +    sql(
    +      """
    +        |CREATE TABLE seg_compact
    +        |(id INT, name STRING, city STRING, age INT)
    +        |STORED BY 'org.apache.carbondata.format'
    +        |TBLPROPERTIES('SORT_COLUMNS'='city,name')
    +      """.stripMargin)
    +  }
    +
    +  override def afterEach {
    +    sql("DROP TABLE IF EXISTS seg_compact")
    +  }
    +
    +  private def resetConf() = {
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE,
    +        CarbonCommonConstants.DEFAULT_ENABLE_AUTO_LOAD_MERGE)
    +  }
    +
    +  test("custom compaction") {
    +    for (i <- 0 until 5) {
    +      sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE seg_compact")
    +    }
    +    sql("ALTER TABLE seg_compact COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2,3)")
    +
    +    val segments = sql("SHOW SEGMENTS FOR TABLE seg_compact")
    +    val segInfos = segments.collect().map { each =>
    +      ((each.toSeq) (0).toString, (each.toSeq) (1).toString)
    +    }
    +    assert(segInfos.length == 6)
    +    assert(segInfos.contains(("0", "Success")))
    +    assert(segInfos.contains(("1", "Compacted")))
    +    assert(segInfos.contains(("2", "Compacted")))
    +    assert(segInfos.contains(("3", "Compacted")))
    +    assert(segInfos.contains(("1.1", "Success")))
    +    assert(segInfos.contains(("4", "Success")))
    +  }
    +
    +  test("custom compaction with preagg datamap"){
    +    sql(
    +      s"""create datamap preagg_sum on table seg_compact using 'preaggregate' as select id,sum(age) from seg_compact group by id"""
    +        .stripMargin)
    +    for (i <- 0 until 5) {
    +      sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE seg_compact")
    +    }
    +    sql("ALTER TABLE seg_compact COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2,3)")
    +    val segments = sql("SHOW SEGMENTS FOR TABLE seg_compact_preagg_sum")
    +    val segInfos = segments.collect().map { each =>
    +      ((each.toSeq) (0).toString, (each.toSeq) (1).toString)
    +    }
    --- End diff --
   
    what will be the behavior of custom compaction when preaggregate datamap exists but segment No 1,2,3 does not exist in the preaggregate datamap?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184398845
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala ---
    @@ -124,11 +124,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
     
     
       protected lazy val alterTable: Parser[LogicalPlan] =
    -    ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ (COMPACT ~ stringLit) <~ opt(";")  ^^ {
    -      case dbName ~ table ~ (compact ~ compactType) =>
    +    ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ (COMPACT ~ stringLit) ~
    +      (WHERE ~> (SEGMENT ~ "." ~ ID) ~> IN ~> "(" ~> repsep(segmentId, ",") <~ ")").? <~
    +      opt(";") ^^ {
    +      case dbName ~ table ~ (compact ~ compactType) ~ segs =>
             val altertablemodel =
               AlterTableModel(convertDbNameToLowerCase(dbName), table, None, compactType,
    -          Some(System.currentTimeMillis()), null)
    +            Some(System.currentTimeMillis()), null, segs)
    --- End diff --
   
    Check for code alignment


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    @manishgupta88, I've submitted some changes, have a look please.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    LGTM...can be merged once build is passed
    Please raise a sub-jira task under the same jira to track the Custom compaction implementation for child tables/datamaps and add the jira link link here as we need to implement custom compaction for child tables/datamaps also.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5460/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4296/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    I've raised a sub-task for custom compaction for child tables/datamaps:
    https://issues.apache.org/jira/browse/CARBONDATA-2412


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4626/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1812


---
123