VenuReddy2103 opened a new pull request #3723: URL: https://github.com/apache/carbondata/pull/3723 ### Why is this PR needed? BlockletIndexInputFormat object instantiation failed due to mismatch in params passed to reflection constructor instantiation and actual parameters of BlockletIndexInputFormat constructor. ### What changes were proposed in this PR? 1. Have modified to pass the correct parameters while instanting the BlockletIndexInputFormat through reflections 2. Segment min-max based pruning to happen when CARBON_LOAD_INDEXES_PARALLEL is enabled. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
CarbonDataQA1 commented on issue #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-617658724 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1100/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-617669025 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2813/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r412931653 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java ########## @@ -198,25 +198,17 @@ public static void loadIndexes(CarbonTable carbonTable, IndexExprWrapper indexEx String clsName = "org.apache.spark.sql.secondaryindex.Jobs.SparkBlockletIndexLoaderJob"; IndexJob indexJob = (IndexJob) createIndexJob(clsName); String className = "org.apache.spark.sql.secondaryindex.Jobs.BlockletIndexInputFormat"; - SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo = - getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration()); - List<Segment> invalidSegments = validAndInvalidSegmentsInfo.getInvalidSegments(); FileInputFormat indexFormat = - createIndexJob(carbonTable, indexExprWrapper, validSegments, invalidSegments, - partitionsToPrune, className, false); + createIndexJob(carbonTable, indexExprWrapper, validSegments, className); Review comment: Please add an UT with Carbon load indexes parallel enabled, to avoid these issues in future ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on issue #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-617746551 @VenuReddy2103 Please check and fix method `CarbonProperties.isIndexParallelLoadingEnabled` in getting load index property also in this PR ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 edited a comment on issue #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-617746551 @VenuReddy2103 Please check and fix getting load index property from method `CarbonProperties.isIndexParallelLoadingEnabled` also in this PR ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
akashrn5 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r413625305 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/Jobs/BlockletIndexInputFormat.java ########## @@ -123,8 +127,10 @@ public BlockletIndexInputFormat(CarbonTable table, IndexExprWrapper indexExprWra Cache<TableBlockIndexUniqueIdentifierWrapper, BlockletIndexWrapper> cache = CacheProvider.getInstance().createCache(CacheType.DRIVER_BLOCKLET_INDEX); private Iterator<TableBlockIndexUniqueIdentifier> iterator; + private Map<String, Map<String, BlockMetaInfo>> segInfoCache = new HashMap<>(); Review comment: please add a comment, stating why this map is required. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-618890989 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1117/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-618892748 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2830/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r414650596 ########## File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java ########## @@ -198,25 +198,17 @@ public static void loadIndexes(CarbonTable carbonTable, IndexExprWrapper indexEx String clsName = "org.apache.spark.sql.secondaryindex.Jobs.SparkBlockletIndexLoaderJob"; IndexJob indexJob = (IndexJob) createIndexJob(clsName); String className = "org.apache.spark.sql.secondaryindex.Jobs.BlockletIndexInputFormat"; - SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo = - getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration()); - List<Segment> invalidSegments = validAndInvalidSegmentsInfo.getInvalidSegments(); FileInputFormat indexFormat = - createIndexJob(carbonTable, indexExprWrapper, validSegments, invalidSegments, - partitionsToPrune, className, false); + createIndexJob(carbonTable, indexExprWrapper, validSegments, className); Review comment: Added a testcase ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r414650854 ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/Jobs/BlockletIndexInputFormat.java ########## @@ -123,8 +127,10 @@ public BlockletIndexInputFormat(CarbonTable table, IndexExprWrapper indexExprWra Cache<TableBlockIndexUniqueIdentifierWrapper, BlockletIndexWrapper> cache = CacheProvider.getInstance().createCache(CacheType.DRIVER_BLOCKLET_INDEX); private Iterator<TableBlockIndexUniqueIdentifier> iterator; + private Map<String, Map<String, BlockMetaInfo>> segInfoCache = new HashMap<>(); Review comment: Added a comment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-619188674 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2832/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
CarbonDataQA1 commented on pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#issuecomment-619189998 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1120/ ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415518496 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -340,5 +340,15 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be sql("DROP table IF EXISTS carbonCahe") } - + test("Test query with parallel index load") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL, "true") + sql("drop table if exists parallel_index_load") + sql("CREATE table parallel_index_load (a STRING, b STRING, c INT) STORED AS carbondata") + sql("insert into parallel_index_load select 'aa', 'bb', 1") + sql("insert into parallel_index_load select 'cc', 'dd', 2") + sql("insert into parallel_index_load select 'ee', 'ff', 3") + sql("select a, b from parallel_index_load").show() Review comment: can change it to `.collect()` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415519312 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -340,5 +340,15 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be sql("DROP table IF EXISTS carbonCahe") } - + test("Test query with parallel index load") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL, "true") + sql("drop table if exists parallel_index_load") Review comment: Please add drop table to afterALL also and revert 'CARBON_LOAD_INDEXES_PARALLEL' to default at end of the testcase ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415575181 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -340,5 +340,15 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be sql("DROP table IF EXISTS carbonCahe") } - + test("Test query with parallel index load") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL, "true") + sql("drop table if exists parallel_index_load") + sql("CREATE table parallel_index_load (a STRING, b STRING, c INT) STORED AS carbondata") + sql("insert into parallel_index_load select 'aa', 'bb', 1") + sql("insert into parallel_index_load select 'cc', 'dd', 2") + sql("insert into parallel_index_load select 'ee', 'ff', 3") + sql("select a, b from parallel_index_load").show() Review comment: done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415579010 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -340,5 +340,15 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be sql("DROP table IF EXISTS carbonCahe") } - + test("Test query with parallel index load") { + CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_LOAD_INDEXES_PARALLEL, "true") + sql("drop table if exists parallel_index_load") Review comment: Done. Have added `afterEach` to ensure all the tables are dropped at the end of testcase. And reverted `CARBON_LOAD_INDEXES_PARALLEL` to default as well ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415633357 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -189,7 +195,6 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be } test("test queries with column_meta_cache and cache_level='BLOCKLET'") { - dropSchema Review comment: removing this might cause testcase failure. Please revert ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
Indhumathi27 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415633999 ########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -1561,6 +1561,8 @@ private CarbonCommonConstants() { @CarbonProperty(dynamicConfigurable = true) public static final String CARBON_LOAD_INDEXES_PARALLEL = "carbon.load.indexes.parallel."; + public static final String CARBON_LOAD_INDEXES_PARALLEL_DEFAULT = "false"; Review comment: please add comment ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
In reply to this post by GitBox
VenuReddy2103 commented on a change in pull request #3723: URL: https://github.com/apache/carbondata/pull/3723#discussion_r415650431 ########## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestQueryWithColumnMetCacheAndCacheLevelProperty.scala ########## @@ -189,7 +195,6 @@ class TestQueryWithColumnMetCacheAndCacheLevelProperty extends QueryTest with Be } test("test queries with column_meta_cache and cache_level='BLOCKLET'") { - dropSchema Review comment: In practice, one testcase shouldn't depend on other testcase. `afterEach` ensures that previous test tables are removed. Have ran all the tests in file to ensure there are no failures. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [hidden email] |
Free forum by Nabble | Edit this page |