CatalogImpl

CatalogImpl is the one and only Catalog that…​FIXME

spark sql CatalogImpl.png
Figure 1. CatalogImpl uses SessionCatalog (through SparkSession)
Note
CatalogImpl is in org.apache.spark.sql.internal package.

functionExists Method

Caution
FIXME

refreshTable Method

Caution
FIXME

Caching Table or View In-Memory — cacheTable Method

cacheTable(tableName: String): Unit

Internally, cacheTable first creates a DataFrame for the table followed by requesting CacheManager to cache it.

Note
cacheTable uses the session-scoped SharedState to access the CacheManager.
Note
cacheTable is a part of Catalog contract.

Removing All Cached Tables From In-Memory Cache — clearCache Method

clearCache(): Unit

clearCache requests CacheManager to remove all cached tables from in-memory cache.

Note
clearCache is a part of Catalog contract.

Creating External Table From Path — createExternalTable Method

createExternalTable(tableName: String, path: String): DataFrame
createExternalTable(tableName: String, path: String, source: String): DataFrame
createExternalTable(
  tableName: String,
  source: String,
  options: Map[String, String]): DataFrame
createExternalTable(
  tableName: String,
  source: String,
  schema: StructType,
  options: Map[String, String]): DataFrame

createExternalTable creates an external table tableName from the given path and returns the corresponding DataFrame.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...

val readmeTable = spark.catalog.createExternalTable("readme", "README.md", "text")
readmeTable: org.apache.spark.sql.DataFrame = [value: string]

scala> spark.catalog.listTables.filter(_.name == "readme").show
+------+--------+-----------+---------+-----------+
|  name|database|description|tableType|isTemporary|
+------+--------+-----------+---------+-----------+
|readme| default|       null| EXTERNAL|      false|
+------+--------+-----------+---------+-----------+

scala> sql("select count(*) as count from readme").show(false)
+-----+
|count|
+-----+
|99   |
+-----+

The source input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable uses spark.sql.sources.default setting to know the data source format.

Note
source input parameter must not be hive as it leads to a AnalysisException.

createExternalTable sets the mandatory path option when specified explicitly in the input parameter list.

createExternalTable parses tableName into TableIdentifier (using SparkSqlParser). It creates a CatalogTable and then executes (by toRDD) a CreateTable logical plan. The result DataFrame is a Dataset[Row] with the QueryExecution after executing SubqueryAlias logical plan and RowEncoder.

spark sql CatalogImpl createExternalTable.png
Figure 2. CatalogImpl.createExternalTable
Note
createExternalTable is a part of Catalog contract.

results matching ""

    No results matching ""