GroupingSets Unary Logical Operator

GroupingSets is a unary logical operator that represents SQL’s GROUPING SETS variant of GROUP BY clause.

val q = sql("""
  SELECT customer, year, SUM(sales)
  FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales)
  GROUP BY customer, year
  GROUPING SETS ((customer), (year))
  """)
scala> println(q.queryExecution.logical.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02    +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

GroupingSets operator is resolved to an Aggregate logical operator at analysis phase.

scala> println(q.queryExecution.analyzed.numberedTreeString)
00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L]
01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5]
02    +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7]
03       +- SubqueryAlias t1
04          +- LocalRelation [customer#0, year#1, sales#2]
Note
GroupingSets can only be created using SQL.
Note
GroupingSets is not supported on Structured Streaming’s streaming Datasets.

GroupingSets is never resolved (as it can only be converted to an Aggregate logical operator).

The output schema of GroupingSets are exactly the attributes of aggregate named expressions.

Analysis Phase

GroupingSets operator is resolved at analysis phase in the following logical evaluation rules:

GroupingSets operator is resolved to an Aggregate with Expand logical operators.

val spark: SparkSession = ...
// using q from the example above
val plan = q.queryExecution.logical

scala> println(plan.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02    +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]

// Note unresolvedalias for SUM expression
// Note UnresolvedInlineTable and SubqueryAlias

// FIXME Show the evaluation rules to get rid of the unresolvable parts

Creating GroupingSets Instance

GroupingSets takes the following when created:

results matching ""

    No results matching ""