[GitHub] carbondata pull request #1565: [CARBONDATA-1518]Support creating timeseries ...

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518]Support creating timeseries ...

qiuchenjian-2
GitHub user kumarvishal09 opened a pull request:

    https://github.com/apache/carbondata/pull/1565

    [CARBONDATA-1518]Support creating timeseries while creating main table.

    Support creating timeseries while creating main table.
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
   
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
   
     - [ ] Testing done
            Please provide details on
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kumarvishal09/incubator-carbondata master_timeseries

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1565
   
----
commit fc76b22e90a3836bae68a6c3061781faf3a047e8
Author: kumarvishal <[hidden email]>
Date:   2017-11-15T16:27:24Z

    Create Timeseries table
   
    timeseries support

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518]Support creating timeseries while c...

qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1860/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518]Support creating timeseries while c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1861/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518]Support creating timeseries while c...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1436/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154094455
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/converter/ThriftWrapperSchemaConverterImpl.java ---
    @@ -197,6 +197,7 @@
           thriftColumnSchema.setColumnProperties(properties);
         }
         thriftColumnSchema.setAggregate_function(wrapperColumnSchema.getAggFunction());
    +    thriftColumnSchema.setTimeSeriesFunction(wrapperColumnSchema.getTimeSeriesFunction());
    --- End diff --
   
    I don't think we should add a new field to schema about this, let it be as tableProperty only.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154094933
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/preagg/TimeSeriesUDF.java ---
    @@ -0,0 +1,115 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.preagg;
    +
    +import java.sql.Timestamp;
    +import java.util.Calendar;
    +import java.util.GregorianCalendar;
    +import java.util.Locale;
    +
    +/**
    + * class for applying timeseries udf
    + */
    +public class TimeSeriesUDF {
    +
    +  // thread local for keeping calender instance
    +  private ThreadLocal<Calendar> calanderThreadLocal = new ThreadLocal<>();
    +
    +  /**
    +   * singleton instance
    +   */
    +  public static final TimeSeriesUDF INSTANCE = new TimeSeriesUDF();
    +
    +  private TimeSeriesUDF() {
    +    initialize();
    +  }
    +
    +  /**
    +   * Below method will be used to apply udf on data provided
    +   * Method will work based on below logic.
    +   * Data: 2016-7-23 01:01:30,10
    +   * Year Level UDF will return: 2016-1-1 00:00:00,0
    +   * Month Level UDF will return: 2016-7-1 00:00:00,0
    +   * Day Level UDF will return: 2016-7-23 00:00:00,0
    +   * Hour Level UDF will return: 2016-7-23 01:00:00,0
    +   * Minute Level UDF will return: 2016-7-23 01:01:00,0
    +   * Second Level UDF will return: 2016-7-23 01:01:30,0
    +   * If function does not match with any of the above functions
    +   * it will throw exception
    +   *
    +   * @param data     timestamp data
    +   * @param function time series function name
    +   * @return data after applying udf
    +   */
    +  public Timestamp applyUDF(Timestamp data, String function) {
    +    if (null == data) {
    +      return data;
    +    }
    +    initialize();
    +    Calendar calendar = calanderThreadLocal.get();
    +    calendar.clear();
    +    calendar.setTimeInMillis(data.getTime());
    +    switch (function.toLowerCase(Locale.getDefault())) {
    +      case "second":
    --- End diff --
   
    Can you use enums


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154095105
 
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -122,6 +122,12 @@ struct ColumnSchema{
       *  will be usefull in case of pre-aggregate
       **/
      17: optional list<ParentColumnTableRelation> parentColumnTableRelations;
    +
    +  /**
    +  * function applied on timestamp column for timeseries for ex: hourTimestamp, dayTimestamp,
    +   * monthTimestamp, yearTimestamp
    +  **/
    + 18: optional string timeSeriesFunction;
    --- End diff --
   
    Keep as tableProperty only


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154095462
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ---
    @@ -465,13 +472,22 @@ class TableNewProcessor(cm: TableModel) {
             encoders.add(Encoding.DICTIONARY)
             encoders
           }
    +      val timeSeriesFunction = if (cm.parentTable.isDefined && cm.timSeriesColumn.isDefined &&
    +                                   null != cm.parentTable.get.getColumnByName(
    +                                     cm.parentTable.get.getFactTableName,
    +                                       cm.timSeriesColumn.get)) {
    --- End diff --
   
    Format it properly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154095701
 
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala ---
    @@ -491,13 +507,22 @@ class TableNewProcessor(cm: TableModel) {
               encoders.add(Encoding.DICTIONARY)
               encoders
             }
    +        val timeSeriesFunction = if (cm.parentTable.isDefined && cm.timSeriesColumn.isDefined &&
    +                                    null != cm.parentTable.get.getColumnByName(
    +                                      cm.parentTable.get.getFactTableName,
    +                                      cm.timSeriesColumn.get)) {
    --- End diff --
   
    Please format properly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support ...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154097078
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/timeseries/TimeSeriesUtil.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.command.timeseries
    +
    +import java.sql.Timestamp
    +
    +import org.apache.spark.sql.execution.command.{DataMapField, Field}
    +
    +import org.apache.carbondata.core.metadata.datatype.DataTypes
    +import org.apache.carbondata.core.metadata.schema.table.CarbonTable
    +import org.apache.carbondata.core.preagg.TimeSeriesUDF
    +import org.apache.carbondata.spark.exception.MalformedCarbonCommandException
    +
    +/**
    + * Utility class for time series to keep
    + */
    +object TimeSeriesUtil {
    --- End diff --
   
    I think this can be moved to core module


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [WIP][CARBONDATA-1518][Pre-Aggregate]Support creatin...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    please rebase


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518][Pre-Aggregate]Support creating tim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/455/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518][Pre-Aggregate]Support creating tim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1723/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518][Pre-Aggregate]Support creating tim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2105/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata issue #1565: [CARBONDATA-1518][Pre-Aggregate]Support creating tim...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1565
 
    Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/472/



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518][Pre-Aggregate]Support creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154988752
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/preagg/TimeSeriesUDF.java ---
    @@ -0,0 +1,127 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.core.preagg;
    +
    +import java.sql.Timestamp;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.GregorianCalendar;
    +import java.util.List;
    +
    +/**
    + * class for applying timeseries udf
    + */
    +public class TimeSeriesUDF {
    +
    +  public final List<String> TIMESERIES_FUNCTION = new ArrayList<>();
    +
    +  // thread local for keeping calender instance
    +  private ThreadLocal<Calendar> calanderThreadLocal = new ThreadLocal<>();
    +
    +  /**
    +   * singleton instance
    +   */
    +  public static final TimeSeriesUDF INSTANCE = new TimeSeriesUDF();
    +
    +  private TimeSeriesUDF() {
    +    initialize();
    +  }
    +
    +  /**
    +   * Below method will be used to apply udf on data provided
    +   * Method will work based on below logic.
    +   * Data: 2016-7-23 01:01:30,10
    +   * Year Level UDF will return: 2016-1-1 00:00:00,0
    +   * Month Level UDF will return: 2016-7-1 00:00:00,0
    +   * Day Level UDF will return: 2016-7-23 00:00:00,0
    +   * Hour Level UDF will return: 2016-7-23 01:00:00,0
    +   * Minute Level UDF will return: 2016-7-23 01:01:00,0
    +   * Second Level UDF will return: 2016-7-23 01:01:30,0
    +   * If function does not match with any of the above functions
    +   * it will throw exception
    --- End diff --
   
    mention it is IllegalArgumentException explicitly


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518][Pre-Aggregate]Support creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154989550
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala ---
    @@ -50,13 +51,30 @@ case class CarbonCreateDataMapCommand(
         val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
         if (dmClassName.equals("org.apache.carbondata.datamap.AggregateDataMapHandler") ||
             dmClassName.equalsIgnoreCase("preaggregate")) {
    -      CreatePreAggregateTableCommand(
    -        dataMapName,
    -        tableIdentifier,
    -        dmClassName,
    -        dmproperties,
    -        queryString.get
    -      ).processMetadata(sparkSession)
    +      val timeHierarchyString = dmproperties.get("timeseries.hierarchy")
    --- End diff --
   
    please collect all datamap property key string for pre-agg table and put them into a new file, so that it is easier to find all the properties for pre-agg table


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518][Pre-Aggregate]Support creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154990053
 
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/timeseries/TimeseriesUtil.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.command.timeseries
    +
    +import java.sql.Timestamp
    +
    +import org.apache.spark.sql.execution.command.{DataMapField, Field}
    +
    +import org.apache.carbondata.core.metadata.datatype.DataTypes
    +import org.apache.carbondata.core.metadata.schema.table.CarbonTable
    +import org.apache.carbondata.core.preagg.TimeSeriesUDF
    +import org.apache.carbondata.spark.exception.MalformedCarbonCommandException
    +
    +/**
    + * Utility class for time series to keep
    + */
    +object TimeSeriesUtil {
    --- End diff --
   
    Can you move it to java code into core module


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518][Pre-Aggregate]Support creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154990490
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java ---
    @@ -126,8 +126,16 @@
        */
       private String aggFunction = "";
     
    +  /**
    +   * list of parent column relations
    +   */
       private List<ParentColumnTableRelation> parentColumnTableRelations;
     
    +  /**
    +   * timeseries function applied on column
    +   */
    +  private String timeSeriesFunction = "";
    --- End diff --
   
    I think we should create one class to hold all information needed for pre-agg table, instead of putting them here directly in ColumnSchema


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] carbondata pull request #1565: [CARBONDATA-1518][Pre-Aggregate]Support creat...

qiuchenjian-2
In reply to this post by qiuchenjian-2
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1565#discussion_r154990854
 
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java ---
    @@ -476,6 +492,7 @@ public void write(DataOutput out) throws IOException {
         out.writeBoolean(invisible);
         out.writeBoolean(isSortColumn);
         out.writeUTF(null != aggFunction ? aggFunction : "");
    +    out.writeUTF(null != timeSeriesFunction ? timeSeriesFunction : "");
    --- End diff --
   
    It will not be null since initialization value is ""


---
123