[GitHub] [carbondata] jackylk opened a new pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk opened a new pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
jackylk opened a new pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643
 
 
    ### Why is this PR needed?
    Change Data Capture and Data Duplication is common tasks for data management, we should add examples for user to reference
   We can leverage CarbonData's Merge API to implement these use cases
   
    ### What changes were proposed in this PR?
   1. Two examples are added
   2. A performance improvement: earlier carbon always use 'full_outer' join to do Merge, in this PR we decide Join type based on match condition
   3. Make auditor enable dynamically, user can disable it.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592469513
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/528/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592476576
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2228/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
jackylk commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592900548
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592913627
 
 
   Build Failed  with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/536/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592916515
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2236/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] jackylk commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
jackylk commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592926820
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592928253
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/541/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-592935246
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2241/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#discussion_r389453999
 
 

 ##########
 File path: processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java
 ##########
 @@ -58,18 +58,26 @@
     }
   }
 
+  private static boolean enable = true;
+
+  public static void enable(boolean bool) {
 
 Review comment:
   Config it with system property will be better.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on a change in pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
niuge01 commented on a change in pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#discussion_r389454190
 
 

 ##########
 File path: processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java
 ##########
 @@ -58,18 +58,26 @@
     }
   }
 
+  private static boolean enable = true;
 
 Review comment:
   Make this field final, and use upper field name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] niuge01 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
niuge01 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-596361719
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-596395332
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2394/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
CarbonDataQA1 commented on issue #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643#issuecomment-596397641
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/688/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services
Reply | Threaded
Open this post in threaded view
|

[GitHub] [carbondata] asfgit closed pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example

GitBox
In reply to this post by GitBox
asfgit closed pull request #3643: [CARBONDATA-3726] Add CDC and Data Dedup example
URL: https://github.com/apache/carbondata/pull/3643
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


With regards,
Apache Git Services