AbstractSqlParser — Base SQL Parsing Infrastructure

AbstractSqlParser is the one and only ParserInterface in Spark SQL that acts as the foundation of the SQL parsing infrastructure with two concrete implementations available (that are merely required to define their custom AstBuilder for the final transformation of SQL textual representation to their Spark SQL equivalent entities, i.e. DataType, Expression, LogicalPlan and TableIdentifier).

AbstractSqlParser first sets up SqlBaseLexer and SqlBaseParser for parsing (and pass the latter on to a parsing function) and use AstBuilder for the actual parsing.

Table 1. AbstractSqlParser’s Implementations (in alphabetical order)
Name Description

SparkSqlParser

The default SQL parser available as sqlParser in SessionState.

val spark: SparkSession = ...
spark.sessionState.sqlParser

CatalystSqlParser

Parses DataType or StructType (schema) from their canonical string representation.

AbstractSqlParser simply relays all the SQL parsing to translate a SQL string to that specialized AstBuilder.

AbstractSqlParser Contract

abstract class AbstractSqlParser extends ParserInterface {
  def astBuilder: AstBuilder
  def parse[T](command: String)(toResult: SqlBaseParser => T): T
  def parseDataType(sqlText: String): DataType
  def parsePlan(sqlText: String): LogicalPlan
  def parseExpression(sqlText: String): Expression
  def parseTableIdentifier(sqlText: String): TableIdentifier
  def parseTableSchema(sqlText: String): StructType
}
Table 2. AbstractSqlParser Contract (in alphabetical order)
Method Description

astBuilder

Gives AstBuilder for the actual SQL parsing.

Used in all the parse methods, i.e. parseDataType, parseExpression, parsePlan, parseTableIdentifier, and parseTableSchema.

Note
Both implementations, i.e. SparkSqlParser and CatalystSqlParser, come with their own astBuilder method.

parse

Sets up SqlBaseLexer and SqlBaseParser for parsing and passes the latter on to the input toResult function where the parsing finally happens.

Used in all the parse methods, i.e. parseDataType, parseExpression, parsePlan, parseTableIdentifier, and parseTableSchema.

parseDataType

Used when…​

parseExpression

Used when…​

parsePlan

Creates a LogicalPlan for a given SQL textual statement.

When a SQL statement could not be parsed, parsePlan reports a ParseException:

Unsupported SQL statement

parseTableIdentifier

Used when…​

parseTableSchema

Used when…​

Setting Up SqlBaseLexer and SqlBaseParser for Parsing — parse Method

parse[T](command: String)(toResult: SqlBaseParser => T): T

parse sets up a proper ANTLR parsing infrastructure with SqlBaseLexer and SqlBaseParser (which are the ANTLR-specific classes of Spark SQL that are auto-generated at build time from the SqlBase.g4 grammar).

Tip
Review the definition of ANTLR grammar for Spark SQL in sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4.

Internally, parse first prints out the following INFO message to the logs:

INFO SparkSqlParser: Parsing command: [command]
Tip
Enable INFO logging level for the custom AbstractSqlParser, i.e. SparkSqlParser or CatalystSqlParser, to see the above INFO message.

parse then creates and sets up a SqlBaseLexer and SqlBaseParser that in turn passes the latter on to the input toResult function where the parsing finally happens.

Note
parse uses SLL prediction mode for parsing first before falling back to LL mode.

In case of parsing errors, parse reports a ParseException.

results matching ""

    No results matching ""