[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

classic Classic list List threaded Threaded
106 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2131

    [WIP] Support unmanaged carbon table read and write

    * carbon SDK writer will take the input data and write back the carbondata and carbonindex files in the path specified.
    This output doesn't have metadata folder. So, it is called unmanaged carbon table.
   
    *This can be read by creating external table in the location of sdk writer output path.
    Please refer,
    **TestUnmanagedCarbonTable.scla** for the example scenario.
   
    *Load, insert, compaction, alter, IUD etc features are blocked for unmanaged table.
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2131
   
----
commit 3075af0a47506509e7b6ba75352a90b1238a8d7e
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-27T08:30:48Z

    unmanaged table backup

commit 618ee43c0e03a25bd0036f02ff95cad4d8f896ef
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-27T08:32:24Z

    unmanaged table backup_1

commit 1bb8c4d328725a7abb6a3c1e450445162394dc27
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-27T11:16:23Z

    unmanaged working

commit c129cfe291c8403812ff04118292d98c9265eaa9
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-27T16:14:17Z

    unmanaged metadata folder issue

commit b1146c2b2e82e1af70ece61943f25a0e9571db4d
Author: sounakr <sounakr@...>
Date:   2018-03-27T11:32:09Z

    Writer And SDK Changes

commit c87788c40834b92226d1fc8ecd33d7d0fd2eaf21
Author: sounakr <sounakr@...>
Date:   2018-03-28T14:12:57Z

    Create External table fix

commit 59bc5423af6999d9ae38c621ebc9ae99bb714325
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-29T06:03:19Z

    findbugs fix

commit 2e4fd0532b1113daec85cdb67a1217aa2dae6791
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-29T07:19:28Z

    fixed metadata issue

commit a1fa8e4e19017c3ec8cd05f324ae291609c19d19
Author: ajantha-bhat <ajanthabhat@...>
Date:   2018-03-29T14:14:56Z

    Added the testcase for unmanaged table

commit c4ad34a2920e0e00e026aed153096a56238ed029
Author: sounakr <sounakr@...>
Date:   2018-03-29T14:19:32Z

    SDK changes Phase 1

commit e280f771ea6098f21b1f6fc365b5fcb8c467f1fc
Author: sounakr <sounakr@...>
Date:   2018-04-01T14:11:52Z

    Committer Reader Implementation

commit cd5d2b8260b37f1e866401fe2ec616ce3697208b
Author: sounakr <sounakr@...>
Date:   2018-04-02T07:12:49Z

    Rebase Changes

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3514/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4741/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178505075
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -143,6 +143,16 @@
     
       private boolean hasDataMapSchema;
     
    +  /**
    +   * The boolean field which points if the data written for UnManaged Table
    +   * or Managed Table. The difference between managed and unManaged table is
    +   * unManaged Table will not contain any Metadata folder and subsequently
    +   * no TableStatus or Schema files.
    +   */
    +  private boolean isUnManagedTable;
    +
    +  private long UUID;
    --- End diff --
   
    UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178505154
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTableBuilder.java ---
    @@ -48,23 +48,39 @@ public CarbonTableBuilder tablePath(String tablePath) {
         return this;
       }
     
    +
    +  public CarbonTableBuilder isUnManagedTable(boolean isUnManagedTable) {
    +    Objects.requireNonNull(isUnManagedTable, "UnManaged Table should not be null");
    +    this.unManagedTable = isUnManagedTable;
    +    return this;
    +  }
    +
       public CarbonTableBuilder tableSchema(TableSchema tableSchema) {
         Objects.requireNonNull(tableSchema, "tableSchema should not be null");
         this.tableSchema = tableSchema;
         return this;
       }
     
    +  public CarbonTableBuilder setUUID(long uuid) {
    --- End diff --
   
    UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178505178
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java ---
    @@ -77,6 +77,19 @@
        */
       private String tablePath;
     
    +  /**
    +   * The boolean field which points if the data written for UnManaged Table
    +   * or Managed Table. The difference between managed and unManaged table is
    +   * unManaged Table will not contain any Metadata folder and subsequently
    +   * no TableStatus or Schema files.
    +   */
    +  private boolean isUnManagedTable;
    +
    +  /**
    +   * Unique ID
    +   */
    +  private long UUID;
    --- End diff --
   
    UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4743/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178511046
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -224,22 +198,28 @@ protected CarbonTable getOrCreateCarbonTable(Configuration configuration) throws
     
         // do block filtering and get split
         List<InputSplit> splits =
    -        getSplits(job, filterInterface, filteredSegmentToAccess, matchedPartitions, partitionInfo,
    -            null, updateStatusManager);
    +         getSplits(job, filterInterface, filteredSegmentToAccess, matchedPartitions, partitionInfo,
    +            null, updateStatusManager, readCommitted);
    +
         // pass the invalid segment to task side in order to remove index entry in task side
    -    if (invalidSegments.size() > 0) {
    -      for (InputSplit split : splits) {
    -        ((org.apache.carbondata.hadoop.CarbonInputSplit) split).setInvalidSegments(invalidSegments);
    -        ((org.apache.carbondata.hadoop.CarbonInputSplit) split)
    -            .setInvalidTimestampRange(invalidTimestampsList);
    +    if (readCommitted instanceof TableStatusReadCommitted) {
    --- End diff --
   
    One inputformat will read table according to one readCommitted.  So make it member variable of inputformat


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3516/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178517359
 
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -159,6 +162,11 @@ public static void setTablePath(Configuration configuration, String tablePath) {
         configuration.set(FileInputFormat.INPUT_DIR, tablePath);
       }
     
    +  public static void setCarbonUnmanagedTable(Configuration configuration,
    --- End diff --
   
    use boolean to setvalue, not string


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178518466
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -83,28 +85,59 @@ public DataMapWriter createWriter(Segment segment, String writeDirectoryPath) {
       }
     
       @Override
    -  public List<CoarseGrainDataMap> getDataMaps(Segment segment) throws IOException {
    +  public List<CoarseGrainDataMap> getDataMaps(Segment segment, ReadCommitted readCommitted)
    +      throws IOException {
         List<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
    -        getTableBlockIndexUniqueIdentifiers(segment);
    +        getTableBlockIndexUniqueIdentifiers(segment, readCommitted);
         return cache.getAll(tableBlockIndexUniqueIdentifiers);
       }
     
    -  private List<TableBlockIndexUniqueIdentifier> getTableBlockIndexUniqueIdentifiers(
    -      Segment segment) throws IOException {
    +  private List<TableBlockIndexUniqueIdentifier> getTableBlockIndexUniqueIdentifiers(Segment segment,
    +      ReadCommitted readCommitted) throws IOException {
         List<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers =
             segmentMap.get(segment.getSegmentNo());
         if (tableBlockIndexUniqueIdentifiers == null) {
           tableBlockIndexUniqueIdentifiers = new ArrayList<>();
    +      // TODO: integrate with ReadCommitted
    +      //      ReadCommitted readCommitted;
    +      //      if (job.getConfiguration().get(CARBON_UNMANAGED_TABLE).equalsIgnoreCase("true")) {
    +      //        updateStatusManager = null;
    +      //        readCommitted = new LatestFilesReadCommitted(identifier.getTablePath());
    +      //      } else {
    +      //        loadMetadataDetails = SegmentStatusManager
    +      //         .readTableStatusFile(CarbonTablePath
    +      //          .getTableStatusFilePath(identifier.getTablePath()));
    +      //        updateStatusManager =
    +      //          new SegmentUpdateStatusManager(identifier, loadMetadataDetails);
    +      //        readCommitted =
    +      //          new TableStatusReadCommitted(job, this, loadMetadataDetails, updateStatusManager);
    +      //      }
    +      //            Map<String, String> indexFiles = readCommitted.getCommittedIndexList(segment);
           Map<String, String> indexFiles;
    -      if (segment.getSegmentFileName() == null) {
    -        String path =
    -            CarbonTablePath.getSegmentPath(identifier.getTablePath(), segment.getSegmentNo());
    -        indexFiles = new SegmentIndexFileStore().getIndexFilesFromSegment(path);
    +      if (CarbonUtil.isUnmanagedCarbonTable(identifier.getTablePath(), true)) {
    +        if (null != readCommitted) {
    +          indexFiles = readCommitted.getCommittedIndexMapSegments();
    +        } else {
    +          indexFiles =
    +              new SegmentIndexFileStore().getIndexFilesFromSegment(identifier.getTablePath());
    +        }
           } else {
    -        SegmentFileStore fileStore =
    -            new SegmentFileStore(identifier.getTablePath(), segment.getSegmentFileName());
    -        indexFiles = fileStore.getIndexFiles();
    +        if (segment.getSegmentFileName() == null) {
    +
    +          if (null != readCommitted) {
    +            indexFiles = readCommitted.getCommittedIndexMapPerSegment(segment);
    --- End diff --
   
    This logic should be common across managed and unmanaged


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4748/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3521/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4248/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4250/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    retest this please


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4757/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4254/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #2131: [WIP] Support unmanaged carbon table read and...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2131#discussion_r178571705
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -79,26 +80,27 @@
        *
        * @param segments
        * @param filterExp
    +   * @param readCommitted
        * @return
        */
       public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    +      List<PartitionSpec> partitions, ReadCommitted readCommitted) throws IOException {
    --- End diff --
   
    Can you explain what is ReadCommitted and why is it needed?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #2131: [WIP] Support unmanaged carbon table read and write

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2131
 
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3529/



---
1234 ... 6