[GitHub] carbondata pull request #1471: [WIP] Datamap FineGrain implementation

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151638182
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -755,7 +758,8 @@ private CarbonInputSplit convertToCarbonInputSplit(ExtendedBlocklet blocklet)
             org.apache.carbondata.hadoop.CarbonInputSplit.from(blocklet.getSegmentId(),
                 new FileSplit(new Path(blocklet.getPath()), 0, blocklet.getLength(),
                     blocklet.getLocations()),
    -            ColumnarFormatVersion.valueOf((short) blocklet.getDetailInfo().getVersionNumber()));
    +            ColumnarFormatVersion.valueOf((short) blocklet.getDetailInfo().getVersionNumber()),
    +            blocklet.getDataMapWriterPath());
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151639114
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/AbstractDataMapWriter.java ---
    @@ -0,0 +1,110 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datamap.dev;
    +
    +import java.io.IOException;
    +
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.datastore.page.ColumnPage;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.util.CarbonUtil;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +/**
    + * Data Map writer
    + */
    +public abstract class AbstractDataMapWriter {
    +
    +  protected AbsoluteTableIdentifier identifier;
    +
    +  protected String segmentId;
    +
    +  protected String writeDirectoryPath;
    +
    +  public AbstractDataMapWriter(AbsoluteTableIdentifier identifier, String segmentId,
    +      String writeDirectoryPath) {
    +    this.identifier = identifier;
    +    this.segmentId = segmentId;
    +    this.writeDirectoryPath = writeDirectoryPath;
    +  }
    +
    +  /**
    +   * Start of new block notification.
    +   *
    +   * @param blockId file name of the carbondata file
    +   */
    +  public abstract void onBlockStart(String blockId);
    +
    +  /**
    +   * End of block notification
    +   */
    +  public abstract void onBlockEnd(String blockId);
    +
    +  /**
    +   * Start of new blocklet notification.
    +   *
    +   * @param blockletId sequence number of blocklet in the block
    +   */
    +  public abstract void onBlockletStart(int blockletId);
    +
    +  /**
    +   * End of blocklet notification
    +   *
    +   * @param blockletId sequence number of blocklet in the block
    +   */
    +  public abstract void onBlockletEnd(int blockletId);
    +
    +  /**
    +   * Add the column pages row to the datamap, order of pages is same as `indexColumns` in
    +   * DataMapMeta returned in DataMapFactory.
    +   * Implementation should copy the content of `pages` as needed, because `pages` memory
    +   * may be freed after this method returns, if using unsafe column page.
    +   */
    +  public abstract void onPageAdded(int blockletId, int pageId, ColumnPage[] pages);
    +
    +  /**
    +   * This is called during closing of writer.So after this call no more data will be sent to this
    +   * class.
    +   */
    +  public abstract void finish();
    +
    +  /**
    +   * It copies the file from temp folder to actual folder
    +   *
    +   * @param dataMapFile
    +   * @throws IOException
    +   */
    +  protected void commitFile(String dataMapFile) throws IOException {
    --- End diff --
   
    Basically, this method should be used inside DataMapWriter to the files once they finish writing in it. It is used for copying from temp location to store. If the error occurs here then it will throw to the DataMapwriter implementation. And writer implementation should handle it otherwise load fails because of the error if it thrown to fact writer


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1471
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1230/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1471
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1250/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827164
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/FGDataMapTestCase.scala ---
    @@ -0,0 +1,434 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.spark.testsuite.datamap
    +
    +import java.io.{ByteArrayInputStream, DataOutputStream, ObjectInputStream, ObjectOutputStream}
    +
    +import scala.collection.JavaConverters._
    +import scala.collection.mutable.ArrayBuffer
    +
    +import com.sun.xml.internal.messaging.saaj.util.ByteOutputStream
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.carbondata.core.datamap.dev.fgdatamap.{AbstractFineGrainDataMap, AbstractFineGrainDataMapFactory}
    +import org.apache.carbondata.core.datamap.dev.{AbstractDataMapWriter, DataMapModel}
    +import org.apache.carbondata.core.datamap.{DataMapDistributable, DataMapMeta, DataMapStoreManager}
    +import org.apache.carbondata.core.datastore.FileHolder
    +import org.apache.carbondata.core.datastore.block.SegmentProperties
    +import org.apache.carbondata.core.datastore.compression.SnappyCompressor
    +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
    +import org.apache.carbondata.core.datastore.impl.FileFactory
    +import org.apache.carbondata.core.datastore.page.ColumnPage
    +import org.apache.carbondata.core.indexstore.FineGrainBlocklet
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable
    +import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata}
    +import org.apache.carbondata.core.scan.expression.Expression
    +import org.apache.carbondata.core.scan.expression.conditional.EqualToExpression
    +import org.apache.carbondata.core.scan.filter.intf.ExpressionType
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf
    +import org.apache.carbondata.core.util.ByteUtil
    +import org.apache.carbondata.core.util.path.CarbonTablePath
    +import org.apache.carbondata.events.Event
    +import org.apache.carbondata.spark.testsuite.datacompaction.CompactionSupportGlobalSortBigFileTest
    +
    +class FGDataMapFactory extends AbstractFineGrainDataMapFactory {
    +  var identifier: AbsoluteTableIdentifier = _
    +  var dataMapName: String = _
    +
    +  /**
    +   * Initialization of Datamap factory with the identifier and datamap name
    +   */
    +  override def init(identifier: AbsoluteTableIdentifier,
    --- End diff --
   
    Please add return type in all functions in this file and CGDataMapTestCase.scala also


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827225
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -463,20 +426,24 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
        * @throws CarbonDataWriterException
        */
       protected void closeExecutorService() throws CarbonDataWriterException {
    -    executorService.shutdown();
         try {
    -      executorService.awaitTermination(2, TimeUnit.HOURS);
    -    } catch (InterruptedException e) {
    -      throw new CarbonDataWriterException(e.getMessage());
    -    }
    -    for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +      listener.finish();
    +      executorService.shutdown();
           try {
    --- End diff --
   
    This try is not required, catch in line 440 is enough


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827362
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -463,20 +426,24 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
        * @throws CarbonDataWriterException
        */
       protected void closeExecutorService() throws CarbonDataWriterException {
    -    executorService.shutdown();
         try {
    -      executorService.awaitTermination(2, TimeUnit.HOURS);
    -    } catch (InterruptedException e) {
    -      throw new CarbonDataWriterException(e.getMessage());
    -    }
    -    for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +      listener.finish();
    +      executorService.shutdown();
           try {
    -        executorServiceSubmitList.get(i).get();
    +        executorService.awaitTermination(2, TimeUnit.HOURS);
           } catch (InterruptedException e) {
             throw new CarbonDataWriterException(e.getMessage());
    -      } catch (ExecutionException e) {
    -        throw new CarbonDataWriterException(e.getMessage());
           }
    +      for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +        executorServiceSubmitList.get(i).get();
    +      }
    +    } catch (InterruptedException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (ExecutionException e) {
    --- End diff --
   
    merge with line 440


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827371
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -463,20 +426,24 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
        * @throws CarbonDataWriterException
        */
       protected void closeExecutorService() throws CarbonDataWriterException {
    -    executorService.shutdown();
         try {
    -      executorService.awaitTermination(2, TimeUnit.HOURS);
    -    } catch (InterruptedException e) {
    -      throw new CarbonDataWriterException(e.getMessage());
    -    }
    -    for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +      listener.finish();
    +      executorService.shutdown();
           try {
    -        executorServiceSubmitList.get(i).get();
    +        executorService.awaitTermination(2, TimeUnit.HOURS);
           } catch (InterruptedException e) {
             throw new CarbonDataWriterException(e.getMessage());
    -      } catch (ExecutionException e) {
    -        throw new CarbonDataWriterException(e.getMessage());
           }
    +      for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +        executorServiceSubmitList.get(i).get();
    +      }
    +    } catch (InterruptedException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (ExecutionException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (IOException e) {
    +      LOGGER.error(e, "Error while writing datamap");
    +      throw new CarbonDataWriterException(e.getMessage());
    --- End diff --
   
    merge with line 440


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827453
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -687,16 +689,17 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // get tokens for all the required FileSystem for table path
         TokenCache.obtainTokensForNamenodes(job.getCredentials(),
             new Path[] { new Path(absoluteTableIdentifier.getTablePath()) }, job.getConfiguration());
    -
    -    TableDataMap blockletMap = DataMapStoreManager.getInstance()
    -        .getDataMap(absoluteTableIdentifier, BlockletDataMap.NAME,
    -            BlockletDataMapFactory.class.getName());
    +    boolean distributedCG = Boolean.parseBoolean(CarbonProperties.getInstance()
    +        .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
    +            CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
    +    TableDataMap blockletMap =
    +        DataMapStoreManager.getInstance().chooseDataMap(absoluteTableIdentifier);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    if (dataMapJob != null) {
    +    if (distributedCG || blockletMap.getDataMapFactory().getDataMapType() == DataMapType.FG) {
    --- End diff --
   
    It seems distributedCG and FG behave the same, right?
    Earlier I though FG datamap will be called in ScanRDD.compute, but it seems not?
    If we collect all pruned blocklet in driver, will it be too many for driver?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151827748
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -75,7 +79,12 @@ public TableDataMap(AbsoluteTableIdentifier identifier, String dataMapName,
         SegmentProperties segmentProperties;
         for (String segmentId : segmentIds) {
           List<Blocklet> pruneBlocklets = new ArrayList<>();
    -      List<DataMap> dataMaps = dataMapFactory.getDataMaps(segmentId);
    +      List<DataMap> dataMaps;
    +      if (blockletDetailsFetcher instanceof DataMapFactory && filterExp == null) {
    +        dataMaps = ((DataMapFactory)blockletDetailsFetcher).getDataMaps(segmentId);
    --- End diff --
   
    Please add correct generic assignment, it is hard to check whether the type is correct or not


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151828034
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/BlockletSerializer.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datamap.dev;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.FineGrainBlocklet;
    +
    +public class BlockletSerializer {
    +
    +  /**
    +   * Serialize and write blocklet to the file.
    +   * @param grainBlocklet
    +   * @param writePath
    +   * @throws IOException
    +   */
    +  public void serializeBlocklet(FineGrainBlocklet grainBlocklet, String writePath)
    --- End diff --
   
    Can we move this to FineGrainBlocklet itself? So that we do not need to keep writePath in many places, like in TableBlockInfo


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151830137
 
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/FGDataMapTestCase.scala ---
    @@ -0,0 +1,434 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.spark.testsuite.datamap
    +
    +import java.io.{ByteArrayInputStream, DataOutputStream, ObjectInputStream, ObjectOutputStream}
    +
    +import scala.collection.JavaConverters._
    +import scala.collection.mutable.ArrayBuffer
    +
    +import com.sun.xml.internal.messaging.saaj.util.ByteOutputStream
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.carbondata.core.datamap.dev.fgdatamap.{AbstractFineGrainDataMap, AbstractFineGrainDataMapFactory}
    +import org.apache.carbondata.core.datamap.dev.{AbstractDataMapWriter, DataMapModel}
    +import org.apache.carbondata.core.datamap.{DataMapDistributable, DataMapMeta, DataMapStoreManager}
    +import org.apache.carbondata.core.datastore.FileHolder
    +import org.apache.carbondata.core.datastore.block.SegmentProperties
    +import org.apache.carbondata.core.datastore.compression.SnappyCompressor
    +import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, CarbonFileFilter}
    +import org.apache.carbondata.core.datastore.impl.FileFactory
    +import org.apache.carbondata.core.datastore.page.ColumnPage
    +import org.apache.carbondata.core.indexstore.FineGrainBlocklet
    +import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapDistributable
    +import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata}
    +import org.apache.carbondata.core.scan.expression.Expression
    +import org.apache.carbondata.core.scan.expression.conditional.EqualToExpression
    +import org.apache.carbondata.core.scan.filter.intf.ExpressionType
    +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf
    +import org.apache.carbondata.core.util.ByteUtil
    +import org.apache.carbondata.core.util.path.CarbonTablePath
    +import org.apache.carbondata.events.Event
    +import org.apache.carbondata.spark.testsuite.datacompaction.CompactionSupportGlobalSortBigFileTest
    +
    +class FGDataMapFactory extends AbstractFineGrainDataMapFactory {
    +  var identifier: AbsoluteTableIdentifier = _
    +  var dataMapName: String = _
    +
    +  /**
    +   * Initialization of Datamap factory with the identifier and datamap name
    +   */
    +  override def init(identifier: AbsoluteTableIdentifier,
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151830168
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -463,20 +426,24 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
        * @throws CarbonDataWriterException
        */
       protected void closeExecutorService() throws CarbonDataWriterException {
    -    executorService.shutdown();
         try {
    -      executorService.awaitTermination(2, TimeUnit.HOURS);
    -    } catch (InterruptedException e) {
    -      throw new CarbonDataWriterException(e.getMessage());
    -    }
    -    for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +      listener.finish();
    +      executorService.shutdown();
           try {
    -        executorServiceSubmitList.get(i).get();
    +        executorService.awaitTermination(2, TimeUnit.HOURS);
           } catch (InterruptedException e) {
             throw new CarbonDataWriterException(e.getMessage());
    -      } catch (ExecutionException e) {
    -        throw new CarbonDataWriterException(e.getMessage());
           }
    +      for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +        executorServiceSubmitList.get(i).get();
    +      }
    +    } catch (InterruptedException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (ExecutionException e) {
    --- End diff --
   
    OK


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151830196
 
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -463,20 +426,24 @@ protected void writeIndexFile() throws IOException, CarbonDataWriterException {
        * @throws CarbonDataWriterException
        */
       protected void closeExecutorService() throws CarbonDataWriterException {
    -    executorService.shutdown();
         try {
    -      executorService.awaitTermination(2, TimeUnit.HOURS);
    -    } catch (InterruptedException e) {
    -      throw new CarbonDataWriterException(e.getMessage());
    -    }
    -    for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +      listener.finish();
    +      executorService.shutdown();
           try {
    -        executorServiceSubmitList.get(i).get();
    +        executorService.awaitTermination(2, TimeUnit.HOURS);
           } catch (InterruptedException e) {
             throw new CarbonDataWriterException(e.getMessage());
    -      } catch (ExecutionException e) {
    -        throw new CarbonDataWriterException(e.getMessage());
           }
    +      for (int i = 0; i < executorServiceSubmitList.size(); i++) {
    +        executorServiceSubmitList.get(i).get();
    +      }
    +    } catch (InterruptedException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (ExecutionException e) {
    +      throw new CarbonDataWriterException(e.getMessage());
    +    } catch (IOException e) {
    +      LOGGER.error(e, "Error while writing datamap");
    +      throw new CarbonDataWriterException(e.getMessage());
    --- End diff --
   
    ok


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151830437
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -687,16 +689,17 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // get tokens for all the required FileSystem for table path
         TokenCache.obtainTokensForNamenodes(job.getCredentials(),
             new Path[] { new Path(absoluteTableIdentifier.getTablePath()) }, job.getConfiguration());
    -
    -    TableDataMap blockletMap = DataMapStoreManager.getInstance()
    -        .getDataMap(absoluteTableIdentifier, BlockletDataMap.NAME,
    -            BlockletDataMapFactory.class.getName());
    +    boolean distributedCG = Boolean.parseBoolean(CarbonProperties.getInstance()
    +        .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
    +            CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
    +    TableDataMap blockletMap =
    +        DataMapStoreManager.getInstance().chooseDataMap(absoluteTableIdentifier);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    if (dataMapJob != null) {
    +    if (distributedCG || blockletMap.getDataMapFactory().getDataMapType() == DataMapType.FG) {
    --- End diff --
   
    In case of FG datamap the rowid information would be written to temp files and writepath is given to FGBlocklet so it does not contain any row information when it returns to the driver.
    While executing filter query executor reads rowids from disk and pass to the scanner.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151831374
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -75,7 +79,12 @@ public TableDataMap(AbsoluteTableIdentifier identifier, String dataMapName,
         SegmentProperties segmentProperties;
         for (String segmentId : segmentIds) {
           List<Blocklet> pruneBlocklets = new ArrayList<>();
    -      List<DataMap> dataMaps = dataMapFactory.getDataMaps(segmentId);
    +      List<DataMap> dataMaps;
    +      if (blockletDetailsFetcher instanceof DataMapFactory && filterExp == null) {
    +        dataMaps = ((DataMapFactory)blockletDetailsFetcher).getDataMaps(segmentId);
    --- End diff --
   
    I have added here but placing generic assignment all places is not possible because we are deriving two types of datamaps through generic .


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1471#discussion_r151831837
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/BlockletSerializer.java ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.datamap.dev;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.indexstore.FineGrainBlocklet;
    +
    +public class BlockletSerializer {
    +
    +  /**
    +   * Serialize and write blocklet to the file.
    +   * @param grainBlocklet
    +   * @param writePath
    +   * @throws IOException
    +   */
    +  public void serializeBlocklet(FineGrainBlocklet grainBlocklet, String writePath)
    --- End diff --
   
    writePath is part of ExtendedBlocklet and FineGrainBlocklet only used till data is written in TableDataMap. And also we need to add writepath to TableBlockinfo as it does not have FineGrainBlocklet.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1471
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1264/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1471
 
    LGTM


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1471: [CARBONDATA-1544][Datamap] Datamap FineGrain impleme...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1471
 
    Merged, please close this PR


---
1234