// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.expressions.Literal
val e: Expression = Literal("hello")
import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)
// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
scala> val s = v.asInstanceOf[UTF8String].toString
s: String = hello
Expression — Executable Node in Catalyst Tree
Expression
is a executable node (in a Catalyst tree) that can be evaluated to a value given input values, i.e. can produce a JVM object per InternalRow
.
Note
|
Expression is often called a Catalyst expression even though it is merely built using (not be part of) the Catalyst — Tree Manipulation Framework.
|
Expression
can generate a Java source code that is then used in evaluation.
Name | Scala Kind | Behaviour | Examples |
---|---|---|---|
abstract class |
|||
trait |
Does not support code generation and falls back to interpreted mode |
||
trait |
|||
abstract class |
Has no child expressions (and hence "terminates" the expression tree). |
||
Can later be referenced in a dataflow graph. |
|||
trait |
|||
trait |
Expression with no SQL representation Gives the only custom sql method that is non-overridable (i.e. When requested SQL representation, |
||
abstract class |
|||
trait |
Timezone-aware expressions |
||
abstract class |
|||
trait |
Cannot be evaluated, i.e. eval and doGenCode are not supported and report an
|
|
Expression Contract
package org.apache.spark.sql.catalyst.expressions
abstract class Expression extends TreeNode[Expression] {
// only required methods that have no implementation
def dataType: DataType
def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
def eval(input: InternalRow = EmptyRow): Any
def nullable: Boolean
}
Method | Description | ||
---|---|---|---|
Code-generated evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way not directly using eval). Used as part of genCode. |
|||
No-code-generated evaluation that evaluates the expression to a JVM object for a given InternalRow (without generating a corresponding Java code.)
|
|||
Code-generated evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way not directly using eval). Similar to doGenCode but supports expression reuse (aka subexpression elimination). |
|||
SQL representation prettyName followed by
|
Nondeterministic
Expression
Nondeterministic
expressions are non-deterministic and non-foldable, i.e. deterministic
and foldable
properties are disabled (i.e. false
). They require explicit initialization before evaluation.
Nondeterministic
expressions have two additional methods:
-
initInternal
for internal initialization (called beforeeval
) -
evalInternal
toeval
uate a InternalRow into a JVM object.
Note
|
Nondeterministic is a Scala trait.
|
Nondeterministic
expressions have the additional initialized
flag that is enabled (i.e. true
) after the other additional initInternal
method has been called.
Examples of Nondeterministic
expressions are InputFileName
, MonotonicallyIncreasingID
, SparkPartitionID
functions and the abstract RDG
(that is the base for Rand
and Randn
functions).
Note
|
Nondeterministic expressions are the target of PullOutNondeterministic logical plan rule.
|