Apache CarbonData Dev Mailing List archive › Apache CarbonData JIRA issues

[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...

Classic

List

Threaded

44 messages Options

123

qiuchenjian-2

[GitHub] carbondata pull request #2691: [CARBONDATA-2912] Support CSV table load csv ...

GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/2691

[CARBONDATA-2912] Support CSV table load csv data with spark2.2

In branch-1.3, CSV table cann't load csv data with spark2.2
Carbon need upgrade commons-lang3 vision
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?

- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.

- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata CARBONDATA-2912_twoInsert1.3.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2691.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2691

----
commit c055c8f33123bfb6e1103456bea23a0ff8c944ca
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-03T20:31:00Z

[maven-release-plugin] prepare release apache-carbondata-1.3.0-rc2

commit 607b4cef646b2b9a3c2a8fc687dc40342165979a
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-03T20:31:53Z

[maven-release-plugin] prepare for next development iteration

commit 449668ad9cda869b14f31dcc2c6df6454701cddc
Author: dhatchayani <dhatcha.official@...>
Date: 2018-02-05T10:51:09Z

[CARBONDATA-2131] Alter table adding long datatype is failing but Create table with long type is successful, in Spark 2.1

Modified code to make "Create table" supported data types and "alter add columns" supported data types consistent

This closes #1932

commit a3b97f38412cf96ee041b6ebfbd7c39af54e391d
Author: kumarvishal <kumarvishal.1802@...>
Date: 2018-02-05T09:47:02Z

[CARBONDATA-2142] Fixed Pre-Aggregate datamap creation issue

Fixed Reverting changes issue in case of create pre-aggregate dataâ¦ map creation is failing
Removed look-up while creating the pre-aggregate data map
Removed unused code

This closes #1943

commit 2c5ecfbfe5ce3357d041207cad8edcf587e4115f
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-07T13:14:33Z

[CARBONDATA-2119]deserialization issue for carbonloadmodel

Problem:
Load model was not getting de-serialized in the executor due to which 2 different carbon table objects were being created.
Solution:
Reconstruct carbonTable from tableInfo if not already created.

This closes #1947

commit 8b105a1e1f6e7e7e3b0bc13d44c1bf93fd821e31
Author: m00258959 <manish.gupta@...>
Date: 2018-02-07T06:37:33Z

[CARBONDATA-2143] Fixed query memory leak issue for task failure during initialization of record reader

Problem:
Whenever a query is executed, in the internalCompute method of CarbonScanRdd class record reader is initialized. A task completion listener is attached to each task after initialization of the record reader.
During record reader initialization, queryResultIterator is initialized and one blocklet is processed. The blocklet processed will use available unsafe memory.
Lets say there are 100 columns and 80 columns get the space but there is no space left for the remaining columns to be stored in the unsafe memory. This will result is memory exception and record reader initialization will fail leading to failure in query.
In the above case the unsafe memory allocated for 80 columns will not be freed and will always remain occupied till the JVM process persists.

Impact
It is memory leak in the system and can lead to query failures for queries executed after one one query fails due to the above reason.

Solution:
Attach the task completion listener before record reader initialization so that if the query fails at the very first instance after using unsafe memory, still that memory will be cleared.

This closes #1948

commit 9f73f0e60611c52278d2d475a89d42adebf32f60
Author: m00258959 <manish.gupta@...>
Date: 2018-02-05T11:40:18Z

[CARBONDATA-2134] Prevent implicit column filter list from getting serialized while submitting task to executor

Problem
In the current store blocklet pruning in driver and no further pruning takes place in the executor side. But still the implicit column filter list being sent to executor. As the size of list grows the cost of serializing and deserializing the list is increasing which can impact the query performance.

Solution
Remove the list from the filter expression before submitting the task to executor.

This closes #1935

commit 1137c285f55dfdc0de24bdebf81d78187df93f8a
Author: kunal642 <kunalkapoor642@...>
Date: 2018-02-08T06:20:23Z

[CARBONDATA-1763] Dropped table if exception thrown while creation

Preaggregate table is not getting dropped when creation fails because

Exceptions from undo metadata is not handled
If preaggregate table is not registered with main table(main table updation fails) then it is not dropped from metastore.

This closes #1951

commit 6e435de5e04ace63fe5b105e2f180ef0932d80d3
Author: rahulforallp <rahul.kumar@...>
Date: 2018-02-06T13:11:35Z

[CARBONDATA-2137] Delete query performance improved

Following is the configuration used :

SPARK_EXECUTOR_MEMORY : 200G
SPARK_DRIVER_MEMORY : 20G
SPARK_EXECUTOR_CORES : 32
SPARK_EXECUTOR_INSTANCEs : 3

Earlier it was taking 20 minute now it is taking approx 5 minute

This closes #1937

commit bc3f825107517ad1e39a385c488beadd6022ab8e
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-08T17:40:43Z

[CARBONDATA-2150] Unwanted updatetable status files are being generated for the delete operation where no records are deleted

Problem:
Unwanted updatetable status files are being generated for the delete operation where no records are deleted

Analysis:
when the filter value for delete operation is less than the maximum value in that column, then getsplits() will return the
block and hence in delete logic, it was creating update table status file even though delete operation was not done and
added spark context to create database event

This closes #1957

commit 15cc7fa97722d055ad5627b3a915ee6d2b6817d6
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-14T13:37:15Z

[CARBONDATA-2182] Added one more params called extraParams in SessionParams and add carbonSessionInfo to CarbonEnvInitPreEvent

Add one more param called ExtraParmas in SessionParams for session Level operations and pass the carbonSessionInfo to event, so that user can
save information in that at session level in carbonSessionInfo

This closes #1978

commit 27634deee82d7a1560e75f8dfc09333eb8df51db
Author: anubhav100 <anubhav.tarar@...>
Date: 2018-02-06T08:03:39Z

[CARBONDATA-2133] Fixed Exception displays after performing select query on newly added Boolean Type

Problem : In Restructure util and RestructureBasedVectorResultCollector to get the default value of a measure type the case for boolean data type was
missing,and in DataTypeUtil to store default value in bytes case of boolean data type was missing

Solution: Add the Boolean data type case

This closes #1934

commit aff3b39efd772a881590432816369a05d0cb5855
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-15T13:30:26Z

[CARBONDATA-2103] Optimize show tables for filtering datamaps

Problem
Show tables was taking more time as two times lookup was happening to filter out the datamaps

Solution
add a hive table property which is true for all tables and false for datamaps like preAggregate table and show tables filter out these tables
based on the property.

This closes #1980

commit 7beef112b59c9ccfe14baca87ae841cfe77e4dce
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-14T10:15:04Z

[CARBONDATA-2183] Fix compaction when segment is delete during compaction and remove unnecessary parameters in functions

Problem:
when compaction is started and job is running, and parallelly the segment involved in the compaction is deleted using DeleteSegmentByID, then
compaction is success.

Solution:
when compaction is started and job is running, and parallelly the segment involved in the compaction is deleted using DeleteSegmentByID, then
compaction should be aborted and failed. and proper error message should thrown to user. THis PR also removes the unnecessary parameters in functions.

This closes #1979

commit 39ac94e462e6571414dee8f58c174e44a79f8ad4
Author: kunal642 <kunalkapoor642@...>
Date: 2018-02-12T19:23:31Z

[CARBONDATA-2142] [CARBONDATA-1763] Fixed issues while creation concurrent datamaps

Analysis:
1. GenerateTableSchemaString in CarbonMetastore did not have any specific implementation for hive metastore due to which carbontables were being
cached in MetaData. As there is no way to refresh table in hivemetastore therefore this is wrong. All queries should get the latest carbon table
from metastore and not from cache.
2. If updating the main table status fails then revertMainTableChanges method is called to revert the changes. The logic to revert was wrong which led
to wrong entry getting deleted from the schema.
3. Moved the force remove logic before taking locks as deletion from metastore should happen even if the lock if not present as the table is in
stale state(Entry is not there in parent but available in metastore).

This closes #1975

commit c2785b352f7b7cb2dd524811b0696fb18c12d5b0
Author: BJangir <babulaljangir111@...>
Date: 2018-02-11T19:32:30Z

[CARBONDATA-2161] update mergeTo column for compacted segment of streaming table

This closes #1971

commit f8a62a9bd8ba39cd6bc247c587a7a3e1afd99254
Author: QiangCai <qiangcai@...>
Date: 2018-02-11T08:06:01Z

[CARBONDATA-2151][Streaming] Fix filter query issue on streaming table

1.Fix filter query issue for timestamp, date, decimal
2.Add more test case
dataType: int, streaming, float, double, decimal, timestamp, date, complex
operation: =, <>, >=, >, <, <=, in, like, between, is null, is not null

This closes #1969

commit 4bbbd4b1df444163cfb72cf74a05c1a9d09e1200
Author: BJangir <babulaljangir111@...>
Date: 2018-02-19T17:01:00Z

[CARBONDATA-2185] Add InputMetrics for Streaming Reader

This closes #1985

commit 6f9016db52dd3f9c31ba20e585debfc283e2594e
Author: Zhang Zhichao <441586683@...>
Date: 2018-02-09T09:32:54Z

[CARBONDATA-2149]Fix complex type data displaying error when use DataFrame to write complex type data

The default value of 'complex_delimiter_level_1' and 'complex_delimiter_level_2' is wrong, it must be '$' and ':', not be '$' and '\:'. Escape characters '\' need to be added only when using delimiters in ArrayParserImpl or StructParserImpl

This closes #1962

commit b0a2fabcc8584dfba24ad0ea135948f5365a7335
Author: QiangCai <qiangcai@...>
Date: 2018-02-25T10:53:41Z

[CARBONDATA-2200] Fix bug of LIKE operation on streaming table

Fix bug of LIKE operation on streaming table,
LIKE operation will be converted to StartsWith / EndsWith / Contains expression.
Carbon will use RowLevelFilterExecuterImpl to evaluate this expression.
Streaming table also should implement RowLevelFilterExecuterImpl.

This closes #1996

commit e363dd1a68e2138591a930055dd1701a1245825f
Author: rahulforallp <rahul.kumar@...>
Date: 2018-02-25T09:55:26Z

[CARBONDATA-2201] NPE fixed while triggering the LoadTablePreExecutionEvent before Streaming

While triggering the LoadTablePreExecutionEvent we require options provided by user and the finalOptions.
In case of streaming both are same. If we pass null . It may cause NPE.

This closes #1997

commit 0f210c86ca3ee9f0fa845cdeaef418ed9253b6f8
Author: Zhang Zhichao <441586683@...>
Date: 2018-02-04T04:54:24Z

[MINOR]Remove dependency of Java 1.8

This closes #1928

commit 758d03e783e324f70b6599be7feb1951b1034f51
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-09T04:07:02Z

[CARBONDATA-2168] Support global sort for standard hive partitioning

This closes #1972

commit 1997ca235f90b5746262c9654b685b9b6bd3f16a
Author: ravipesala <ravi.pesala@...>
Date: 2018-02-14T19:01:56Z

[CARBONDATA-2187][PARTITION] Partition restructure for new folder structure and supporting partition location feature

This closes #1984

commit b51d8186a82818672067dfd0387af6ff505f940c
Author: Jatin <jatin.demla@...>
Date: 2018-02-23T11:26:17Z

[CARBONDATA-2199] Fixed Dimension column after restructure getting wrong block datatype

Problem: Changing datatype of measure having sort_columns calls for restructure and after having restructure it changes the datatype to actual datatype for which accessing the data with changed datatype gives exception of incorrect length.

Solution: Store the datatype in DimensionInfo while restructuring and access the same datatype to get the block data type.

This closes #1993

commit 7726b4f9b379b0eec4b9fff6571415f47fa55587
Author: Jatin <jatin.demla@...>
Date: 2018-02-27T10:43:40Z

[CARBONDATA-2207] Fix testcases after using hive metastore

CarbonTable was getting null in case of hivemetatore so, fetch the same from metastore instead of carbon.

This closes #2005

commit b360f9084f873bc096d7fabfde20730fbc752350
Author: chenliang613 <chenliang613@...>
Date: 2018-02-08T17:32:38Z

[HOTFIX] Add partition usage code

This closes #1956

commit b9a6b68658fd0f7f408102374b3ef31dcfe44cea
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-28T11:58:43Z

[CARBONDATA-2217]fix drop partition for non existing partition and set FactTimeStamp during compaction for partition table

Problem:
1)when drop partition is fired for a column which does not exists , it throws null pointer exception
2)select * is not working when clean files operation is fired after second level of compaction, it throws exception sometimes
3)new segment is getting created for all the segments if any one partition is dropped

Solution:
1)have a null check , if column does not exists
2)give different timestamp for fact files during compaction to avoid deletion of files during clean files
3)for the partition which is dropped, only for that new segment file should be written and not for all the partition
4) This PR also contains fix for creating a pre aggregate table with same name which has already created in other database

This closes #2017

commit 660190fb544e338acd131e7cc30de171e7600df6
Author: akashrn5 <akashnilugal@...>
Date: 2018-02-28T12:08:50Z

[CARBONDATA-2103]Make show datamaps configurable in show tables command

Make the show datamaps in show tables configurable:

a new carbon property is added called carbon.query.show.datamaps, by default is it true, show show tables will list all the table including main table and datamaps.
if we want to filter datamaps in show tables, configure this as false

This closes #2015

commit 092b5d58a50498a0a66bf6166907965612eb1fc5
Author: ravipesala <ravi.pesala@...>
Date: 2018-03-01T06:34:53Z

[CARBONDATA-2219] Added validation for external partition location to use same schema.

This closes #2018

----

---

qiuchenjian-2