[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

classic Classic list List threaded Threaded
82 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199316994
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala ---
    @@ -71,8 +70,8 @@ object Spark2TestQueryExecutor {
         .getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb)
       if (warehouse.startsWith("hdfs://")) {
         System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse)
    -    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE,
    -      CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.LOCK_TYPE, CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
    --- End diff --
   
    OK,done


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199317048
 
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ---
    @@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) {
               format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID()));
     
           List<RecordReader<Void, T>> readers = new ArrayList<>(splits.size());
    +      CarbonProperties.getInstance()
    +          .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, "true");
    --- End diff --
   
    not always,  only for SDK reader


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199317072
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.scan.executor.impl;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Executors;
    +
    +import org.apache.carbondata.common.CarbonIterator;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
    +import org.apache.carbondata.core.scan.model.QueryModel;
    +import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +/**
    + * It's for SDK carbon reader to execute the detail query
    + */
    +public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
    --- End diff --
   
    There are some different, get nThread method is different


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199317103
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
    @@ -49,6 +50,7 @@
        */
       protected CarbonLRUCache lruCache;
     
    +  Map<String, Map<String, BlockMetaInfo>> segInfoCache;
    --- End diff --
   
    It's used for reduce the S3 IO, It needs 70*140 IO before, now it only need 140 IO


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5539/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5513/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6688/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user xubo245 commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5546/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5548/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    retest sdv please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5554/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199847207
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
    @@ -81,8 +83,16 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id
             SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();
             Set<String> filesRead = new HashSet<>();
             String segmentFilePath = identifier.getIndexFilePath();
    -        Map<String, BlockMetaInfo> carbonDataFileBlockMetaInfoMapping = BlockletDataMapUtil
    -            .createCarbonDataFileBlockMetaInfoMapping(segmentFilePath);
    +        if (segInfoCache == null) {
    +          segInfoCache = new HashMap<String, Map<String, BlockMetaInfo>>();
    --- End diff --
   
    S3 does not require BlockMetaInfo as location is not valid for S3


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r199848776
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.scan.executor.impl;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Executors;
    +
    +import org.apache.carbondata.common.CarbonIterator;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
    +import org.apache.carbondata.core.scan.model.QueryModel;
    +import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +/**
    + * It's for SDK carbon reader to execute the detail query
    + */
    +public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
    --- End diff --
   
    Donot require SDKDetailQueryExecutor , the problem of increasing thread pool is because of CarbonRecordReader.close is not clearing the VectorDetailQuery thread pool.
    Once CarbonRecordReader.close is called all the correspoding resource should be released.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r200014323
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.scan.executor.impl;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Executors;
    +
    +import org.apache.carbondata.common.CarbonIterator;
    +import org.apache.carbondata.common.logging.LogService;
    +import org.apache.carbondata.common.logging.LogServiceFactory;
    +import org.apache.carbondata.core.constants.CarbonCommonConstants;
    +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
    +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
    +import org.apache.carbondata.core.scan.model.QueryModel;
    +import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
    +import org.apache.carbondata.core.util.CarbonProperties;
    +
    +/**
    + * It's for SDK carbon reader to execute the detail query
    + */
    +public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
    +  private static final LogService LOGGER =
    +          LogServiceFactory.getLogService(SDKDetailQueryExecutor.class.getName());
    +  private static ExecutorService executorService = null;
    +
    +  public SDKDetailQueryExecutor() {
    +    if (executorService == null) {
    +      initThreadPool();
    +    }
    +  }
    +
    +  private static synchronized void initThreadPool() {
    +    int defaultValue = Runtime.getRuntime().availableProcessors();
    +    int nThread;
    +    try {
    +      nThread = Integer.parseInt(CarbonProperties.getInstance()
    +          .getProperty(CarbonCommonConstants.CARBON_READER_THREAD,
    +              String.valueOf(defaultValue)));
    +    } catch (NumberFormatException e) {
    +      nThread = defaultValue;
    +      LOGGER.warn("The " + CarbonCommonConstants.CARBON_READER_THREAD
    +          + " is invalid. Using the default value " + nThread);
    +    }
    +    if (nThread > 0) {
    +      executorService = Executors.newFixedThreadPool(nThread);
    +    } else {
    +      executorService = Executors.newCachedThreadPool();
    +    }
    +  }
    +
    +  public static synchronized void shutdownThreadPool() {
    +    if (executorService != null) {
    +      executorService.shutdownNow();
    --- End diff --
   
    Static one we cannot shutdown, as another CarbonReader might be reading same or different table in same process


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2391#discussion_r200024653
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
    @@ -81,8 +83,16 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id
             SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();
             Set<String> filesRead = new HashSet<>();
             String segmentFilePath = identifier.getIndexFilePath();
    -        Map<String, BlockMetaInfo> carbonDataFileBlockMetaInfoMapping = BlockletDataMapUtil
    -            .createCarbonDataFileBlockMetaInfoMapping(segmentFilePath);
    +        if (segInfoCache == null) {
    +          segInfoCache = new HashMap<String, Map<String, BlockMetaInfo>>();
    --- End diff --
   
    Cache cannot be across queries as new files can be added in same segment path. So move cache to getAll and getDataMaps can take list of segments , so that cache can work across segments.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7109/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5885/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2391
 
    @xubo245 Please close it as it is handled in https://github.com/apache/carbondata/pull/2441


---
12345