GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/3039 [WIP] Optimize implicit filter expression performance by removing extra serialization Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata implicit_column_filter_serialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3039.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3039 ---- commit 73390e08aad788698dd9f12e3513a4a4814afd73 Author: manishgupta88 <tomanishgupta18@...> Date: 2018-12-27T09:48:07Z Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3039 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2094/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3039 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10348/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/3039 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2299/ --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3039#discussion_r244654723 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + + public ImplicitExpression(List<Expression> implicitFilterList) { + // initialize map with half the size of filter list as one block id can contain + // multiple blocklets + blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); + for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); + } + } + + public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) { + this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { + String blockId = + blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); + Set<Integer> blockletIds = blockIdToBlockletIdMapping.get(blockId); + if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); + } + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); --- End diff -- Better to catch the NumberFormatException for Integer.parseInt --- |
In reply to this post by qiuchenjian-2
Github user qiuchenjian commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3039#discussion_r244654748 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + + public ImplicitExpression(List<Expression> implicitFilterList) { + // initialize map with half the size of filter list as one block id can contain + // multiple blocklets + blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); + for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); + } + } + + public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) { + this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { + String blockId = + blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); + Set<Integer> blockletIds = blockIdToBlockletIdMapping.get(blockId); + if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); + } + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); + } + + @Override public ExpressionResult evaluate(RowIntf value) + throws FilterUnsupportedException, FilterIllegalMemberException { + throw new UnsupportedOperationException("Operation not supported for Implicit expression"); + } + + public Map<String, Set<Integer>> getBlockIdToBlockletIdMapping() { + return blockIdToBlockletIdMapping; + } + + @Override public ExpressionType getFilterExpressionType() { + return ExpressionType.IMPLICIT; + } + + @Override public void findAndSetChild(Expression oldExpr, Expression newExpr) { --- End diff -- Why is this method empty? --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3039#discussion_r244668632 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + + public ImplicitExpression(List<Expression> implicitFilterList) { + // initialize map with half the size of filter list as one block id can contain + // multiple blocklets + blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); + for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); + } + } + + public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) { + this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { + String blockId = + blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); + Set<Integer> blockletIds = blockIdToBlockletIdMapping.get(blockId); + if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); + } + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); + } + + @Override public ExpressionResult evaluate(RowIntf value) + throws FilterUnsupportedException, FilterIllegalMemberException { + throw new UnsupportedOperationException("Operation not supported for Implicit expression"); + } + + public Map<String, Set<Integer>> getBlockIdToBlockletIdMapping() { + return blockIdToBlockletIdMapping; + } + + @Override public ExpressionType getFilterExpressionType() { + return ExpressionType.IMPLICIT; + } + + @Override public void findAndSetChild(Expression oldExpr, Expression newExpr) { --- End diff -- This method is not required to be implemented in this class so its implementation is empty --- |
In reply to this post by qiuchenjian-2
Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/3039#discussion_r244668804 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.expression.conditional; + +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.ExpressionResult; +import org.apache.carbondata.core.scan.expression.LiteralExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; +import org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException; +import org.apache.carbondata.core.scan.filter.intf.ExpressionType; +import org.apache.carbondata.core.scan.filter.intf.RowIntf; + +import org.apache.commons.lang.StringUtils; + +/** + * Custom class to handle filter values for Implicit filter + */ +public class ImplicitExpression extends Expression { + + /** + * map that contains the mapping of block id to the valid blocklets in that block which contain + * the data as per the applied filter + */ + private Map<String, Set<Integer>> blockIdToBlockletIdMapping; + + public ImplicitExpression(List<Expression> implicitFilterList) { + // initialize map with half the size of filter list as one block id can contain + // multiple blocklets + blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2); + for (Expression value : implicitFilterList) { + String blockletPath = ((LiteralExpression) value).getLiteralExpValue().toString(); + addBlockEntry(blockletPath); + } + } + + public ImplicitExpression(Map<String, Set<Integer>> blockIdToBlockletIdMapping) { + this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping; + } + + private void addBlockEntry(String blockletPath) { + String blockId = + blockletPath.substring(0, blockletPath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR)); + Set<Integer> blockletIds = blockIdToBlockletIdMapping.get(blockId); + if (null == blockletIds) { + blockletIds = new HashSet<>(); + blockIdToBlockletIdMapping.put(blockId, blockletIds); + } + blockletIds.add(Integer.parseInt(blockletPath.substring(blockId.length() + 1))); --- End diff -- Not required to catch the NumberFormatException. Blocklet Id is always expected to be an integer number and in the complete flow it is used as an integer. So if there is any exception in the conversion then the code problem is not here but from the other part of code. So if we handle the exception here the actual cause will get suppressed here --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/3039 LGTM --- |
In reply to this post by qiuchenjian-2
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/3039 LGTM --- |
In reply to this post by qiuchenjian-2
|
Free forum by Nabble | Edit this page |