Powered by GitBook

AggregateFunction

AggregateFunction is the contract for Catalyst expressions that represent aggregate functions.

AggregateFunction is used wrapped inside a AggregateExpression (using toAggregateExpression method) when:

Analyzer resolves functions (for SQL mode)
…FIXME: Anywhere else?

import org.apache.spark.sql.functions.collect_list
scala> val fn = collect_list("gid")
fn: org.apache.spark.sql.Column = collect_list(gid)

import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
scala> val aggFn = fn.expr.asInstanceOf[AggregateExpression].aggregateFunction
aggFn: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction = collect_list('gid, 0, 0)

scala> println(aggFn.numberedTreeString)
00 collect_list('gid, 0, 0)
01 +- 'gid

Note	Aggregate functions are not foldable, i.e. FIXME

Table 1. AggregateFunction Top-Level Catalyst Expressions
Name	Behaviour	Examples
DeclarativeAggregate
ImperativeAggregate
`TypedAggregateExpression`

AggregateFunction Contract

abstract class AggregateFunction extends Expression {
  def aggBufferSchema: StructType
  def aggBufferAttributes: Seq[AttributeReference]
  def inputAggBufferAttributes: Seq[AttributeReference]
  def defaultResult: Option[Literal] = None
}

Table 2. AggregateFunction Contract (in alphabetical order)
Method	Description
`aggBufferSchema`	Schema of an aggregation buffer to hold partial aggregate results. Used mostly in ScalaUDAF and AggregationIterator
`aggBufferAttributes`	Collection of `AttributeReference` objects of an aggregation buffer to hold partial aggregate results. Used in: `DeclarativeAggregateEvaluator` `AggregateExpression` for references `Expression`-based aggregate’s `bufferSchema` in DeclarativeAggregate …
`inputAggBufferAttributes`
`defaultResult`	Defaults to `None`.

Creating AggregateExpression for AggregateFunction — `toAggregateExpression` Method

toAggregateExpression(): AggregateExpression  (1)
toAggregateExpression(isDistinct: Boolean): AggregateExpression

Calls the other toAggregateExpression with isDistinct disabled (i.e. false)

toAggregateExpression creates a AggregateExpression for the current AggregateFunction with Complete aggregate mode.

Note	`toAggregateExpression` is used in: `functions` object’s `withAggregateFunction` block to create a Column with AggregateExpression for a `AggregateFunction` FIXME

results matching ""

No results matching ""