GitHub user kumarvishal09 opened a pull request:
https://github.com/apache/carbondata/pull/1019 [CARBONDATA-1156]Improve IUD performance and fixed synchronization issue Delete delta file loading is taking more time as it is read for blocklet level. Now added code to read block level. In current IUD design delete delta files are getting listed for each block in executor level in case of parallel query and iud operation it may give wrong result. Now passing delete delta information from driver to executor You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata IUDPerformanceImprovement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1019 ---- commit 60cfc66fe1f2de4cc3c2395a4dd479abb2a602f4 Author: kumarvishal <[hidden email]> Date: 2017-06-12T10:36:24Z Fixed Syncronization issue and improve IUD performance ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1019 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/264/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1019 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2385/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1019 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/266/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1019 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2387/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1019#discussion_r121389622 --- Diff: core/src/main/java/org/apache/carbondata/core/mutate/DeleteDeltaVo.java --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.mutate; + +import java.util.BitSet; +import java.util.Iterator; +import java.util.Set; + +/** + * Class which keep the information about the rows + * while got deleted + */ +public class DeleteDeltaVo { + --- End diff -- Mo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1019#discussion_r121390830 --- Diff: core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java --- @@ -120,7 +122,53 @@ private void initThreadPoolSize() { } } return pageIdDeleteRowsMap; + } + /** + * Below method will be used to read the delete delta files + * and get the map of blockletid and page id mapping to deleted + * rows + * + * @param deltaFiles delete delta files array + * @return map of blockletid_pageid to deleted rows + */ + public Map<String, DeleteDeltaVo> getDeletedRowsDataVo(String[] deltaFiles) { + List<Future<DeleteDeltaBlockDetails>> taskSubmitList = new ArrayList<>(); + ExecutorService executorService = Executors.newFixedThreadPool(thread_pool_size); + for (final String deltaFile : deltaFiles) { + taskSubmitList.add(executorService.submit(new Callable<DeleteDeltaBlockDetails>() { + @Override public DeleteDeltaBlockDetails call() throws IOException { + CarbonDeleteDeltaFileReaderImpl deltaFileReader = + new CarbonDeleteDeltaFileReaderImpl(deltaFile, FileFactory.getFileType(deltaFile)); + return deltaFileReader.readJson(); + } + })); + } + try { + executorService.shutdown(); + executorService.awaitTermination(30, TimeUnit.MINUTES); + } catch (InterruptedException e) { + LOGGER.error("Error while reading the delete delta files : " + e.getMessage()); + } + Map<String, DeleteDeltaVo> pageIdToBlockLetVo = new HashMap<>(); + List<DeleteDeltaBlockletDetails> blockletDetails = null; + for (int i = 0; i < taskSubmitList.size(); i++) { + try { + blockletDetails = taskSubmitList.get(i).get().getBlockletDetails(); + } catch (InterruptedException | ExecutionException e) { + throw new RuntimeException(e); + } + for (DeleteDeltaBlockletDetails blockletDetail : blockletDetails) { + DeleteDeltaVo deleteDeltaVo = pageIdToBlockLetVo.get(blockletDetail.getBlockletKey()); + if (null == deleteDeltaVo) { + deleteDeltaVo = new DeleteDeltaVo(); + pageIdToBlockLetVo.put(blockletDetail.getBlockletKey(), deleteDeltaVo); + } + deleteDeltaVo.insertData(blockletDetail.getDeletedRows()); + ; --- End diff -- remove semicolon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1019#discussion_r121395234 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java --- @@ -126,6 +144,82 @@ private void intialiseInfos() { } } + /** + * Below method will be used to get the delete delta rows for a block + * + * @param dataBlock data block + * @param deleteDeltaInfo delete delta info + * @return blockid+pageid to deleted row mapping + */ + private Map<String, DeleteDeltaVo> getDeleteDeltaDetails(AbstractIndex dataBlock, + DeleteDeltaInfo deleteDeltaInfo) { + // if datablock deleted delta timestamp is more then the current delete delta files timestamp + // then return the current deleted rows + if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo + .getLatestDeleteDeltaFileTimestamp()) { + return dataBlock.getDeletedRowsMap(); + } + CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null; + // get the lock object so in case of concurrent query only one task will read the delete delta + // files other tasks will wait + Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo); + // if lock object is null then add a lock object + if (null == lockObject) { + synchronized (deleteDeltaToLockObjectMap) { + // double checking --- End diff -- Again do `deleteDeltaToLockObjectMap.get(deleteDeltaInfo);` to avoid null pointer exception --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1019#discussion_r121399194 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java --- @@ -126,6 +144,82 @@ private void intialiseInfos() { } } + /** + * Below method will be used to get the delete delta rows for a block + * + * @param dataBlock data block + * @param deleteDeltaInfo delete delta info + * @return blockid+pageid to deleted row mapping + */ + private Map<String, DeleteDeltaVo> getDeleteDeltaDetails(AbstractIndex dataBlock, + DeleteDeltaInfo deleteDeltaInfo) { + // if datablock deleted delta timestamp is more then the current delete delta files timestamp + // then return the current deleted rows + if (dataBlock.getDeleteDeltaTimestamp() >= deleteDeltaInfo + .getLatestDeleteDeltaFileTimestamp()) { + return dataBlock.getDeletedRowsMap(); + } + CarbonDeleteFilesDataReader carbonDeleteDeltaFileReader = null; + // get the lock object so in case of concurrent query only one task will read the delete delta + // files other tasks will wait + Object lockObject = deleteDeltaToLockObjectMap.get(deleteDeltaInfo); + // if lock object is null then add a lock object + if (null == lockObject) { + synchronized (deleteDeltaToLockObjectMap) { + // double checking --- End diff -- ok. I missed it:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit commented on the issue:
https://github.com/apache/carbondata/pull/1019 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/carbondata-pr-spark-1.6/287/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1019 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/2408/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1019 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/1019 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |