GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2131 [WIP] Support unmanaged carbon table read and write * carbon SDK writer will take the input data and write back the carbondata and carbonindex files in the path specified. This output doesn't have metadata folder. So, it is called unmanaged carbon table. *This can be read by creating external table in the location of sdk writer output path. Please refer, **TestUnmanagedCarbonTable.scla** for the example scenario. *Load, insert, compaction, alter, IUD etc features are blocked for unmanaged table. your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata unmanaged_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2131 ---- commit 3075af0a47506509e7b6ba75352a90b1238a8d7e Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-27T08:30:48Z unmanaged table backup commit 618ee43c0e03a25bd0036f02ff95cad4d8f896ef Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-27T08:32:24Z unmanaged table backup_1 commit 1bb8c4d328725a7abb6a3c1e450445162394dc27 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-27T11:16:23Z unmanaged working commit c129cfe291c8403812ff04118292d98c9265eaa9 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-27T16:14:17Z unmanaged metadata folder issue commit b1146c2b2e82e1af70ece61943f25a0e9571db4d Author: sounakr <sounakr@...> Date: 2018-03-27T11:32:09Z Writer And SDK Changes commit c87788c40834b92226d1fc8ecd33d7d0fd2eaf21 Author: sounakr <sounakr@...> Date: 2018-03-28T14:12:57Z Create External table fix commit 59bc5423af6999d9ae38c621ebc9ae99bb714325 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-29T06:03:19Z findbugs fix commit 2e4fd0532b1113daec85cdb67a1217aa2dae6791 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-29T07:19:28Z fixed metadata issue commit a1fa8e4e19017c3ec8cd05f324ae291609c19d19 Author: ajantha-bhat <ajanthabhat@...> Date: 2018-03-29T14:14:56Z Added the testcase for unmanaged table commit c4ad34a2920e0e00e026aed153096a56238ed029 Author: sounakr <sounakr@...> Date: 2018-03-29T14:19:32Z SDK changes Phase 1 commit e280f771ea6098f21b1f6fc365b5fcb8c467f1fc Author: sounakr <sounakr@...> Date: 2018-04-01T14:11:52Z Committer Reader Implementation commit cd5d2b8260b37f1e866401fe2ec616ce3697208b Author: sounakr <sounakr@...> Date: 2018-04-02T07:12:49Z Rebase Changes ---- --- |
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3514/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4741/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178505075 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -143,6 +143,16 @@ private boolean hasDataMapSchema; + /** + * The boolean field which points if the data written for UnManaged Table + * or Managed Table. The difference between managed and unManaged table is + * unManaged Table will not contain any Metadata folder and subsequently + * no TableStatus or Schema files. + */ + private boolean isUnManagedTable; + + private long UUID; --- End diff -- UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178505154 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTableBuilder.java --- @@ -48,23 +48,39 @@ public CarbonTableBuilder tablePath(String tablePath) { return this; } + + public CarbonTableBuilder isUnManagedTable(boolean isUnManagedTable) { + Objects.requireNonNull(isUnManagedTable, "UnManaged Table should not be null"); + this.unManagedTable = isUnManagedTable; + return this; + } + public CarbonTableBuilder tableSchema(TableSchema tableSchema) { Objects.requireNonNull(tableSchema, "tableSchema should not be null"); this.tableSchema = tableSchema; return this; } + public CarbonTableBuilder setUUID(long uuid) { --- End diff -- UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178505178 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java --- @@ -77,6 +77,19 @@ */ private String tablePath; + /** + * The boolean field which points if the data written for UnManaged Table + * or Managed Table. The difference between managed and unManaged table is + * unManaged Table will not contain any Metadata folder and subsequently + * no TableStatus or Schema files. + */ + private boolean isUnManagedTable; + + /** + * Unique ID + */ + private long UUID; --- End diff -- UID cannot be in Tablelevel datastructure, as it is unique for one load. Move it to LoadModel --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4743/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178511046 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -224,22 +198,28 @@ protected CarbonTable getOrCreateCarbonTable(Configuration configuration) throws // do block filtering and get split List<InputSplit> splits = - getSplits(job, filterInterface, filteredSegmentToAccess, matchedPartitions, partitionInfo, - null, updateStatusManager); + getSplits(job, filterInterface, filteredSegmentToAccess, matchedPartitions, partitionInfo, + null, updateStatusManager, readCommitted); + // pass the invalid segment to task side in order to remove index entry in task side - if (invalidSegments.size() > 0) { - for (InputSplit split : splits) { - ((org.apache.carbondata.hadoop.CarbonInputSplit) split).setInvalidSegments(invalidSegments); - ((org.apache.carbondata.hadoop.CarbonInputSplit) split) - .setInvalidTimestampRange(invalidTimestampsList); + if (readCommitted instanceof TableStatusReadCommitted) { --- End diff -- One inputformat will read table according to one readCommitted. So make it member variable of inputformat --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3516/ --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178517359 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java --- @@ -159,6 +162,11 @@ public static void setTablePath(Configuration configuration, String tablePath) { configuration.set(FileInputFormat.INPUT_DIR, tablePath); } + public static void setCarbonUnmanagedTable(Configuration configuration, --- End diff -- use boolean to setvalue, not string --- |
In reply to this post by qiuchenjian-2
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178518466 --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java --- @@ -83,28 +85,59 @@ public DataMapWriter createWriter(Segment segment, String writeDirectoryPath) { } @Override - public List<CoarseGrainDataMap> getDataMaps(Segment segment) throws IOException { + public List<CoarseGrainDataMap> getDataMaps(Segment segment, ReadCommitted readCommitted) + throws IOException { List<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = - getTableBlockIndexUniqueIdentifiers(segment); + getTableBlockIndexUniqueIdentifiers(segment, readCommitted); return cache.getAll(tableBlockIndexUniqueIdentifiers); } - private List<TableBlockIndexUniqueIdentifier> getTableBlockIndexUniqueIdentifiers( - Segment segment) throws IOException { + private List<TableBlockIndexUniqueIdentifier> getTableBlockIndexUniqueIdentifiers(Segment segment, + ReadCommitted readCommitted) throws IOException { List<TableBlockIndexUniqueIdentifier> tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { tableBlockIndexUniqueIdentifiers = new ArrayList<>(); + // TODO: integrate with ReadCommitted + // ReadCommitted readCommitted; + // if (job.getConfiguration().get(CARBON_UNMANAGED_TABLE).equalsIgnoreCase("true")) { + // updateStatusManager = null; + // readCommitted = new LatestFilesReadCommitted(identifier.getTablePath()); + // } else { + // loadMetadataDetails = SegmentStatusManager + // .readTableStatusFile(CarbonTablePath + // .getTableStatusFilePath(identifier.getTablePath())); + // updateStatusManager = + // new SegmentUpdateStatusManager(identifier, loadMetadataDetails); + // readCommitted = + // new TableStatusReadCommitted(job, this, loadMetadataDetails, updateStatusManager); + // } + // Map<String, String> indexFiles = readCommitted.getCommittedIndexList(segment); Map<String, String> indexFiles; - if (segment.getSegmentFileName() == null) { - String path = - CarbonTablePath.getSegmentPath(identifier.getTablePath(), segment.getSegmentNo()); - indexFiles = new SegmentIndexFileStore().getIndexFilesFromSegment(path); + if (CarbonUtil.isUnmanagedCarbonTable(identifier.getTablePath(), true)) { + if (null != readCommitted) { + indexFiles = readCommitted.getCommittedIndexMapSegments(); + } else { + indexFiles = + new SegmentIndexFileStore().getIndexFilesFromSegment(identifier.getTablePath()); + } } else { - SegmentFileStore fileStore = - new SegmentFileStore(identifier.getTablePath(), segment.getSegmentFileName()); - indexFiles = fileStore.getIndexFiles(); + if (segment.getSegmentFileName() == null) { + + if (null != readCommitted) { + indexFiles = readCommitted.getCommittedIndexMapPerSegment(segment); --- End diff -- This logic should be common across managed and unmanaged --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4748/ --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3521/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2131 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4248/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2131 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4250/ --- |
In reply to this post by qiuchenjian-2
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2131 retest this please --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4757/ --- |
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2131 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4254/ --- |
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2131#discussion_r178571705 --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java --- @@ -79,26 +80,27 @@ * * @param segments * @param filterExp + * @param readCommitted * @return */ public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp, - List<PartitionSpec> partitions) throws IOException { + List<PartitionSpec> partitions, ReadCommitted readCommitted) throws IOException { --- End diff -- Can you explain what is ReadCommitted and why is it needed? --- |
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2131 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3529/ --- |
Free forum by Nabble | Edit this page |