Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage preparing for ...

Classic

List

15 messages Options

Options

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage preparing for ...

GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/932

[CARBONDATA-1074] Add TablePage preparing for data load refactory

Add TablePage preparing for data load refactory.
Unify different steps to use ConvertedRow instead of Object[], steps includes:
1. write step of normal sort table
2. write step of no sort table
3. compaction merging step

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata tablepage

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #932

----
commit 2f2a3019fddc101267e4c058dda26564e1fe6859
Author: jackylk <[hidden email]>
Date: 2017-05-21T16:24:38Z

add TablePage

----

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage preparing for data lo...

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/932

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118011607

--- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.store;
+
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.datastore.page.ComplexColumnPage;
+import org.apache.carbondata.core.datastore.page.FixLengthColumnPage;
+import org.apache.carbondata.core.datastore.page.KeyColumnPage;
+import org.apache.carbondata.core.datastore.page.VarLengthColumnPage;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import org.apache.carbondata.processing.newflow.row.ConvertedRow;
+import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+
+import org.apache.spark.sql.types.Decimal;
+
+/**
+ * Represent a page data for all columns, we store its data in columnar layout, so that
+ * all processing apply to TablePage can be done in vectorized fashion.
+ */
+class TablePage {
+
+ // For all dimension and measure columns, we store the column data directly in the page,
+ // the length of the page is the number of rows.
+
+ // TODO: we should have separate class for key columns so that keys are stored together in
+ // one vector to make it efficient for sorting
+ private KeyColumnPage keyColumnPage;
--- End diff --

@jackylk I have one query related to this changes please correct me.
Can we use the ColumnVector interface for this?? the way we are storing the data while processing the vector batch during query, we can use the same interface while loading. There we will have flexibility for storing data in offheap/onheap both.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118012761

--- Diff: processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.store;
+
+import java.io.ByteArrayOutputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.datastore.page.ComplexColumnPage;
+import org.apache.carbondata.core.datastore.page.FixLengthColumnPage;
+import org.apache.carbondata.core.datastore.page.KeyColumnPage;
+import org.apache.carbondata.core.datastore.page.VarLengthColumnPage;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import org.apache.carbondata.processing.newflow.row.ConvertedRow;
+import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+
+import org.apache.spark.sql.types.Decimal;
+
+/**
+ * Represent a page data for all columns, we store its data in columnar layout, so that
+ * all processing apply to TablePage can be done in vectorized fashion.
+ */
+class TablePage {
+
+ // For all dimension and measure columns, we store the column data directly in the page,
+ // the length of the page is the number of rows.
+
+ // TODO: we should have separate class for key columns so that keys are stored together in
+ // one vector to make it efficient for sorting
+ private KeyColumnPage keyColumnPage;
--- End diff --

You mean through ColumnarVectorWrapper? In fact, processing package should not depend on spark, so here I think it is better to use our own data structure. In future PR, we can change KeyPage to have offheap implementation.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118190822

--- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/readsupport/SparkRowReadSupportImpl.java ---
@@ -36,7 +36,7 @@
isMeasure = new boolean[carbonColumns.length];
dataTypes = new DataType[carbonColumns.length];
for (int i = 0; i < carbonColumns.length; i++) {
- isMeasure[i] = !carbonColumns[i].isDimesion();
+ isMeasure[i] = !carbonColumns[i].isDimension();
--- End diff --

Now isMeasure() is added in CarbonColumn You can u use that method.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118392660

--- Diff: integration/spark2/src/main/java/org/apache/carbondata/spark/readsupport/SparkRowReadSupportImpl.java ---
@@ -36,7 +36,7 @@
isMeasure = new boolean[carbonColumns.length];
dataTypes = new DataType[carbonColumns.length];
for (int i = 0; i < carbonColumns.length; i++) {
- isMeasure[i] = !carbonColumns[i].isDimesion();
+ isMeasure[i] = !carbonColumns[i].isDimension();
--- End diff --

actually, the initialization in this class is unnecessary, because `readRow` only return a `new GenericInternalRow(data)`, so I will move `isMeasure` variable

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/932

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/932

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/932

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/932

retest this please

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118815732

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/row/ConvertedRow.java ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.row;
+
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper;
+
+public class ConvertedRow {
--- End diff --

Why we should create new instance here for every row. Here all methods could be static. Just pass CarbonRow to the required method and get the converted data. So even `CarbonFactDataHandlerColumnar.addDataToStore` should only take `CarbonRow`.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/932#discussion_r118818487

--- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/row/WriteStepRow.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.row;
+
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper;
+
+public class WriteStepRow {
--- End diff --

Better name it as some utility as all methods are static here.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/932

It seems some check style issues are there, please fix it.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #932: [CARBONDATA-1074] Add TablePage and ConvertedRow, pre...

In reply to this post by qiuchenjian-2

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/932

LGTM

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #932: [CARBONDATA-1074] Add TablePage and ConvertedR...

In reply to this post by qiuchenjian-2

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/932

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---