Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861] Support sh...

Classic

List

52 messages Options

Options

123

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION]...

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1674#discussion_r157988312

--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionMapFileStore.java ---
@@ -176,13 +179,87 @@ public PartitionMapper readPartitionMap(String partitionMapPath) {
public void readAllPartitionsOfSegment(String segmentPath) {
CarbonFile[] partitionFiles = getPartitionFiles(segmentPath);
if (partitionFiles != null && partitionFiles.length > 0) {
+ partionedSegment = true;
for (CarbonFile file : partitionFiles) {
PartitionMapper partitionMapper = readPartitionMap(file.getAbsolutePath());
partitionMap.putAll(partitionMapper.getPartitionMap());
}
}
}

+ public boolean isPartionedSegment() {
+ return partionedSegment;
+ }
+
+ /**
+ * Drops the partitions from the partition mapper file of the segment and writes to a new file.
+ * @param segmentPath
+ * @param partitionsToDrop
+ * @param uniqueId
+ * @throws IOException
+ */
+ public void dropPartitions(String segmentPath, List<String> partitionsToDrop, String uniqueId)
+ throws IOException {
+ readAllPartitionsOfSegment(segmentPath);
+ List<String> indexesToDrop = new ArrayList<>();
+ for (Map.Entry<String, List<String>> entry: partitionMap.entrySet()) {
+ for (String partition: partitionsToDrop) {
+ if (entry.getValue().contains(partition)) {
+ indexesToDrop.add(entry.getKey());
+ }
+ }
+ }
+ if (indexesToDrop.size() > 0) {
+ // Remove the indexes from partition map
+ for (String indexToDrop : indexesToDrop) {
+ partitionMap.remove(indexToDrop);
+ }
+ PartitionMapper mapper = new PartitionMapper();
+ mapper.setPartitionMap(partitionMap);
+ String path = segmentPath + "/" + uniqueId + CarbonTablePath.PARTITION_MAP_EXT;
+ writePartitionFile(mapper, path);
+ }
+ }
+
+ /**
+ * It deletes the old partition mapper files in case of success. And in case of failure it removes
+ * the old new file.
+ * @param segmentPath
+ * @param uniqueId
+ * @param success
+ */
+ public void commitPartitions(String segmentPath, final String uniqueId, boolean success) {
+ CarbonFile carbonFile = FileFactory.getCarbonFile(segmentPath);
--- End diff --

I think it is better not to take any lock inside processMetadata as some users run this command in different machine . So lock only taken in processData and if anything wrong happens we add back partition in undoMetadata command

---

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION]...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1674#discussion_r158022717

--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionMapFileStore.java ---
@@ -176,13 +179,87 @@ public PartitionMapper readPartitionMap(String partitionMapPath) {
public void readAllPartitionsOfSegment(String segmentPath) {
CarbonFile[] partitionFiles = getPartitionFiles(segmentPath);
if (partitionFiles != null && partitionFiles.length > 0) {
+ partionedSegment = true;
for (CarbonFile file : partitionFiles) {
PartitionMapper partitionMapper = readPartitionMap(file.getAbsolutePath());
partitionMap.putAll(partitionMapper.getPartitionMap());
}
}
}

+ public boolean isPartionedSegment() {
+ return partionedSegment;
+ }
+
+ /**
+ * Drops the partitions from the partition mapper file of the segment and writes to a new file.
+ * @param segmentPath
+ * @param partitionsToDrop
+ * @param uniqueId
+ * @throws IOException
+ */
+ public void dropPartitions(String segmentPath, List<String> partitionsToDrop, String uniqueId)
--- End diff --

1) clean files needs to be handled to delete partition drop.
2) Reading paritionmap should adhere to segUpdatetimestamp/transactionid from tablestatus file and drop parition should create mapfile with segUpdatetimestamp and update it later in segmentstatus. After max query time during cleanup old files can be deleted.
This will avoid temporary state reading during query.
3) Drop partition done in one driver should also drop related partitions in cache in other driver. So there should be way to drop using segUpdatetimestamp/transactionid.
4) Same thing applicable to InsertOverwrite scenario also.

---

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION]...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1674#discussion_r158055612

--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionMapFileStore.java ---
@@ -176,13 +179,87 @@ public PartitionMapper readPartitionMap(String partitionMapPath) {
public void readAllPartitionsOfSegment(String segmentPath) {
CarbonFile[] partitionFiles = getPartitionFiles(segmentPath);
if (partitionFiles != null && partitionFiles.length > 0) {
+ partionedSegment = true;
for (CarbonFile file : partitionFiles) {
PartitionMapper partitionMapper = readPartitionMap(file.getAbsolutePath());
partitionMap.putAll(partitionMapper.getPartitionMap());
}
}
}

+ public boolean isPartionedSegment() {
+ return partionedSegment;
+ }
+
+ /**
+ * Drops the partitions from the partition mapper file of the segment and writes to a new file.
+ * @param segmentPath
+ * @param partitionsToDrop
+ * @param uniqueId
+ * @throws IOException
+ */
+ public void dropPartitions(String segmentPath, List<String> partitionsToDrop, String uniqueId)
--- End diff --

1.For clean files jira CARBONDATA-1863 is already created as part of a design.
2. Now the partitionmap is created using the loadtime stamp. And it will not delete while dropping the partition. But while reading we make sure that we read the latest file using the latest timestamp. And also load status is updated with new timestamp so all the cache will be cleared from the driver.

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1674

retest sdv please

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1674

retest this please

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1674

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2188/

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1674

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2460/

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1674

retest this please

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1674

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/966/

---

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION]...

In reply to this post by qiuchenjian-2

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1674#discussion_r158095586

--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/PartitionMapFileStore.java ---
@@ -176,13 +179,87 @@ public PartitionMapper readPartitionMap(String partitionMapPath) {
public void readAllPartitionsOfSegment(String segmentPath) {
CarbonFile[] partitionFiles = getPartitionFiles(segmentPath);
if (partitionFiles != null && partitionFiles.length > 0) {
+ partionedSegment = true;
for (CarbonFile file : partitionFiles) {
PartitionMapper partitionMapper = readPartitionMap(file.getAbsolutePath());
partitionMap.putAll(partitionMapper.getPartitionMap());
}
}
}

+ public boolean isPartionedSegment() {
+ return partionedSegment;
+ }
+
+ /**
+ * Drops the partitions from the partition mapper file of the segment and writes to a new file.
+ * @param segmentPath
+ * @param partitionsToDrop
+ * @param uniqueId
+ * @throws IOException
+ */
+ public void dropPartitions(String segmentPath, List<String> partitionsToDrop, String uniqueId)
--- End diff --

Latest file reading always can make query to read intermediate state, so Query should always should stick to previously committed timestamp.
It is better to be handled in separate issue "support committed timestamp based read interface to data maps making them to read only committed content".

---

[GitHub] carbondata issue #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION] Suppor...

In reply to this post by qiuchenjian-2

Github user gvramana commented on the issue:

https://github.com/apache/carbondata/pull/1674

LGTM

---

[GitHub] carbondata pull request #1674: [CARBONDATA-1859][CARBONDATA-1861][PARTITION]...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1674

---

123