Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

Classic

List

82 messages Options

Options

12345

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199316994

--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/test/Spark2TestQueryExecutor.scala ---
@@ -71,8 +70,8 @@ object Spark2TestQueryExecutor {
.getOrCreateCarbonSession(null, TestQueryExecutor.metastoredb)
if (warehouse.startsWith("hdfs://")) {
System.setProperty(CarbonCommonConstants.HDFS_TEMP_LOCATION, warehouse)
- CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE,
- CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.LOCK_TYPE, CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS)
--- End diff --

OKï¼done

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317048

--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java ---
@@ -207,6 +209,8 @@ public CarbonReaderBuilder setEndPoint(String value) {
format.getSplits(new JobContextImpl(job.getConfiguration(), new JobID()));

List<RecordReader<Void, T>> readers = new ArrayList<>(splits.size());
+ CarbonProperties.getInstance()
+ .addProperty(CarbonCommonConstants.ENABLE_SDK_QUERY_EXECUTOR, "true");
--- End diff --

not always, only for SDK reader

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317072

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.executor.impl;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
+import org.apache.carbondata.core.util.CarbonProperties;
+
+/**
+ * It's for SDK carbon reader to execute the detail query
+ */
+public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
--- End diff --

There are some different, get nThread method is different

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199317103

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
@@ -49,6 +50,7 @@
*/
protected CarbonLRUCache lruCache;

+ Map<String, Map<String, BlockMetaInfo>> segInfoCache;
--- End diff --

It's used for reduce the S3 IO, It needs 70*140 IO before, now it only need 140 IO

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5539/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5513/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6688/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2391

retest sdv please

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5546/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2391

retest sdv please

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5548/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ajantha-bhat commented on the issue:

https://github.com/apache/carbondata/pull/2391

retest sdv please

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5554/

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199847207

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
@@ -81,8 +83,16 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id
SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();
Set<String> filesRead = new HashSet<>();
String segmentFilePath = identifier.getIndexFilePath();
- Map<String, BlockMetaInfo> carbonDataFileBlockMetaInfoMapping = BlockletDataMapUtil
- .createCarbonDataFileBlockMetaInfoMapping(segmentFilePath);
+ if (segInfoCache == null) {
+ segInfoCache = new HashMap<String, Map<String, BlockMetaInfo>>();
--- End diff --

S3 does not require BlockMetaInfo as location is not valid for S3

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r199848776

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.executor.impl;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
+import org.apache.carbondata.core.util.CarbonProperties;
+
+/**
+ * It's for SDK carbon reader to execute the detail query
+ */
+public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
--- End diff --

Donot require SDKDetailQueryExecutor , the problem of increasing thread pool is because of CarbonRecordReader.close is not clearing the VectorDetailQuery thread pool.
Once CarbonRecordReader.close is called all the correspoding resource should be released.

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r200014323

--- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/SDKDetailQueryExecutor.java ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.scan.executor.impl;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.scan.result.iterator.SearchModeResultIterator;
+import org.apache.carbondata.core.util.CarbonProperties;
+
+/**
+ * It's for SDK carbon reader to execute the detail query
+ */
+public class SDKDetailQueryExecutor extends AbstractQueryExecutor<Object> {
+ private static final LogService LOGGER =
+ LogServiceFactory.getLogService(SDKDetailQueryExecutor.class.getName());
+ private static ExecutorService executorService = null;
+
+ public SDKDetailQueryExecutor() {
+ if (executorService == null) {
+ initThreadPool();
+ }
+ }
+
+ private static synchronized void initThreadPool() {
+ int defaultValue = Runtime.getRuntime().availableProcessors();
+ int nThread;
+ try {
+ nThread = Integer.parseInt(CarbonProperties.getInstance()
+ .getProperty(CarbonCommonConstants.CARBON_READER_THREAD,
+ String.valueOf(defaultValue)));
+ } catch (NumberFormatException e) {
+ nThread = defaultValue;
+ LOGGER.warn("The " + CarbonCommonConstants.CARBON_READER_THREAD
+ + " is invalid. Using the default value " + nThread);
+ }
+ if (nThread > 0) {
+ executorService = Executors.newFixedThreadPool(nThread);
+ } else {
+ executorService = Executors.newCachedThreadPool();
+ }
+ }
+
+ public static synchronized void shutdownThreadPool() {
+ if (executorService != null) {
+ executorService.shutdownNow();
--- End diff --

Static one we cannot shutdown, as another CarbonReader might be reading same or different table in same process

---

[GitHub] carbondata pull request #2391: [CARBONDATA-2625] Optimize the performance of...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2391#discussion_r200024653

--- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
@@ -81,8 +83,16 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id
SegmentIndexFileStore indexFileStore = new SegmentIndexFileStore();
Set<String> filesRead = new HashSet<>();
String segmentFilePath = identifier.getIndexFilePath();
- Map<String, BlockMetaInfo> carbonDataFileBlockMetaInfoMapping = BlockletDataMapUtil
- .createCarbonDataFileBlockMetaInfoMapping(segmentFilePath);
+ if (segInfoCache == null) {
+ segInfoCache = new HashMap<String, Map<String, BlockMetaInfo>>();
--- End diff --

Cache cannot be across queries as new files can be added in same segment path. So move cache to getAll and getDataMaps can take list of segments , so that cache can work across segments.

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7109/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2391

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5885/

---

[GitHub] carbondata issue #2391: [CARBONDATA-2625] Optimize the performance of Carbon...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2391

@xubo245 Please close it as it is handled in https://github.com/apache/carbondata/pull/2441

---

12345