GitHub user ravikiran23 opened a pull request:
https://github.com/apache/incubator-carbondata/pull/523 [CARBONDATA-440] fixing no kettle issue for IUD. For iud data load flow will be used. so in the case of NO-KETTLE, need to handle data load. load count/ segment count should be string because in compaction case it will be 2.1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravikiran23/incubator-carbondata IUD-NO-KETTLE Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #523 ---- commit 5dd98b38e332b08f11daeaa683950b90172e02a9 Author: ravikiran <[hidden email]> Date: 2017-01-09T13:28:13Z fixing no kettle issue for IUD. load count/ segment count should be string because in compaction case it will be 2.1 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 Build Success with Spark 1.5.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/559/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95704439 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,51 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } + else { --- End diff -- move to previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95709745 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,51 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { --- End diff -- how about in carbon-spark2 module, can you check the same in that module also? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 I verified with `mvn clean verify -Pno-kettle -Pspark-1.6` but it failed in test case `insert from hive-sum expression` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95751767 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,51 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { --- End diff -- as of now IUD is supported in 1.6.2. support is not there for 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95752235 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,51 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } + else { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/565/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 @jackylk i verified the same test case with new code with out my fix , it is still failing. this may be due to some other PR. my code doesnt impact insert into flow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 @jackylk Please review and merge this PR, I will fix the testcases for no kettle flow and raise in another PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95920624 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { --- End diff -- move `try` to `CarbonDataLoadForUpdate.run` only, we should limit the try scope, do the same for next `try` also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95920765 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } else { + try { + val recordReaders = mutable.Buffer[CarbonIterator[Array[AnyRef]]]() + val serializer = SparkEnv.get.closureSerializer.newInstance() + var serializeBuffer: ByteBuffer = null + recordReaders += new CarbonIteratorImpl( + new NewRddIterator(iter, + carbonLoadModel, + TaskContext.get())) + + val loader = new SparkPartitionLoader(carbonLoadModel, + index, + null, + null, + segId, + loadMetadataDetails) + // Intialize to set carbon properties + loader.initialize() + + loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) + new DataLoadExecutor() + .execute(carbonLoadModel, loader.storeLocation, recordReaders.toArray) --- End diff -- move to previous line, break the line at parameter list --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95920779 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } else { + try { + val recordReaders = mutable.Buffer[CarbonIterator[Array[AnyRef]]]() + val serializer = SparkEnv.get.closureSerializer.newInstance() + var serializeBuffer: ByteBuffer = null + recordReaders += new CarbonIteratorImpl( + new NewRddIterator(iter, + carbonLoadModel, + TaskContext.get())) + + val loader = new SparkPartitionLoader(carbonLoadModel, + index, + null, + null, + segId, + loadMetadataDetails) + // Intialize to set carbon properties + loader.initialize() + + loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) + new DataLoadExecutor() + .execute(carbonLoadModel, loader.storeLocation, recordReaders.toArray) + + } catch { + case e: BadRecordFoundException => + loadMetadataDetails + .setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_PARTIAL_SUCCESS) --- End diff -- move to previous line, break the line at parameter list --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95920835 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, --- End diff -- move to previous line, break the line at parameter list --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r95920952 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } else { + try { + val recordReaders = mutable.Buffer[CarbonIterator[Array[AnyRef]]]() + val serializer = SparkEnv.get.closureSerializer.newInstance() + var serializeBuffer: ByteBuffer = null + recordReaders += new CarbonIteratorImpl( + new NewRddIterator(iter, + carbonLoadModel, + TaskContext.get())) + + val loader = new SparkPartitionLoader(carbonLoadModel, + index, + null, --- End diff -- You are following different code style, can you make the style like other code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/523 @ravikiran23 Please work on comments given by @jackylk , we should merge this soon otherwise it will be blocking for testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r96169559 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r96169571 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } else { + try { + val recordReaders = mutable.Buffer[CarbonIterator[Array[AnyRef]]]() + val serializer = SparkEnv.get.closureSerializer.newInstance() + var serializeBuffer: ByteBuffer = null + recordReaders += new CarbonIteratorImpl( + new NewRddIterator(iter, + carbonLoadModel, + TaskContext.get())) + + val loader = new SparkPartitionLoader(carbonLoadModel, + index, + null, + null, + segId, + loadMetadataDetails) + // Intialize to set carbon properties + loader.initialize() + + loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) + new DataLoadExecutor() + .execute(carbonLoadModel, loader.storeLocation, recordReaders.toArray) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
In reply to this post by qiuchenjian-2
Github user ravikiran23 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/523#discussion_r96169576 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -719,16 +720,50 @@ object CarbonDataRDDFactory { loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) val rddIteratorKey = CarbonCommonConstants.RDDUTIL_UPDATE_KEY + UUID.randomUUID().toString + if (useKettle) { + try { + RddInpututilsForUpdate.put(rddIteratorKey, + new RddIteratorForUpdate(iter, carbonLoadModel)) + carbonLoadModel.setRddIteratorKey(rddIteratorKey) + CarbonDataLoadForUpdate + .run(carbonLoadModel, index, storePath, kettleHomePath, + segId, loadMetadataDetails, executionErrors) + } finally { + RddInpututilsForUpdate.remove(rddIteratorKey) + } + } else { + try { + val recordReaders = mutable.Buffer[CarbonIterator[Array[AnyRef]]]() + val serializer = SparkEnv.get.closureSerializer.newInstance() + var serializeBuffer: ByteBuffer = null + recordReaders += new CarbonIteratorImpl( + new NewRddIterator(iter, + carbonLoadModel, + TaskContext.get())) + + val loader = new SparkPartitionLoader(carbonLoadModel, + index, + null, + null, + segId, + loadMetadataDetails) + // Intialize to set carbon properties + loader.initialize() + + loadMetadataDetails.setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS) + new DataLoadExecutor() + .execute(carbonLoadModel, loader.storeLocation, recordReaders.toArray) + + } catch { + case e: BadRecordFoundException => + loadMetadataDetails + .setLoadStatus(CarbonCommonConstants.STORE_LOADSTATUS_PARTIAL_SUCCESS) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at [hidden email] or file a JIRA ticket with INFRA. --- |
Free forum by Nabble | Edit this page |