// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.expressions.Literal
val e: Expression = Literal("hello")
import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)
// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
scala> val s = v.asInstanceOf[UTF8String].toString
s: String = hello
Expression — Executable Node in Catalyst Tree
Expression is a executable node (in a Catalyst tree) that can be evaluated to a value given input values, i.e. can produce a JVM object per InternalRow.
|
Note
|
Expression is often called a Catalyst expression even though it is merely built using (not be part of) the Catalyst — Tree Manipulation Framework.
|
Expression can generate a Java source code that is then used in evaluation.
| Name | Scala Kind | Behaviour | Examples |
|---|---|---|---|
abstract class |
|||
trait |
Does not support code generation and falls back to interpreted mode |
||
trait |
|||
abstract class |
Has no child expressions (and hence "terminates" the expression tree). |
||
Can later be referenced in a dataflow graph. |
|||
trait |
|||
trait |
Expression with no SQL representation Gives the only custom sql method that is non-overridable (i.e. When requested SQL representation, |
||
abstract class |
|||
trait |
Timezone-aware expressions |
||
abstract class |
|||
trait |
Cannot be evaluated, i.e. eval and doGenCode are not supported and report an
|
|
Expression Contract
package org.apache.spark.sql.catalyst.expressions
abstract class Expression extends TreeNode[Expression] {
// only required methods that have no implementation
def dataType: DataType
def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
def eval(input: InternalRow = EmptyRow): Any
def nullable: Boolean
}
| Method | Description | ||
|---|---|---|---|
Code-generated evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way not directly using eval). Used as part of genCode. |
|||
No-code-generated evaluation that evaluates the expression to a JVM object for a given InternalRow (without generating a corresponding Java code.)
|
|||
Code-generated evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way not directly using eval). Similar to doGenCode but supports expression reuse (aka subexpression elimination). |
|||
SQL representation prettyName followed by
|
Nondeterministic Expression
Nondeterministic expressions are non-deterministic and non-foldable, i.e. deterministic and foldable properties are disabled (i.e. false). They require explicit initialization before evaluation.
Nondeterministic expressions have two additional methods:
-
initInternalfor internal initialization (called beforeeval) -
evalInternaltoevaluate a InternalRow into a JVM object.
|
Note
|
Nondeterministic is a Scala trait.
|
Nondeterministic expressions have the additional initialized flag that is enabled (i.e. true) after the other additional initInternal method has been called.
Examples of Nondeterministic expressions are InputFileName, MonotonicallyIncreasingID, SparkPartitionID functions and the abstract RDG (that is the base for Rand and Randn functions).
|
Note
|
Nondeterministic expressions are the target of PullOutNondeterministic logical plan rule.
|