CatalogImpl

CatalogImpl is the one and only Catalog that…FIXME

Figure 1. CatalogImpl uses SessionCatalog (through SparkSession)

Note	`CatalogImpl` is in `org.apache.spark.sql.internal` package.

`functionExists` Method

Caution

FIXME

`refreshTable` Method

Caution

FIXME

Caching Table or View In-Memory — `cacheTable` Method

cacheTable(tableName: String): Unit

Internally, cacheTable first creates a DataFrame for the table followed by requesting CacheManager to cache it.

Note	`cacheTable` uses the session-scoped SharedState to access the `CacheManager`.

Note	`cacheTable` is a part of Catalog contract.

Removing All Cached Tables From In-Memory Cache — `clearCache` Method

clearCache(): Unit

clearCache requests CacheManager to remove all cached tables from in-memory cache.

Note	`clearCache` is a part of Catalog contract.

Creating External Table From Path — `createExternalTable` Method

createExternalTable(tableName: String, path: String): DataFrame
createExternalTable(tableName: String, path: String, source: String): DataFrame
createExternalTable(
  tableName: String,
  source: String,
  options: Map[String, String]): DataFrame
createExternalTable(
  tableName: String,
  source: String,
  schema: StructType,
  options: Map[String, String]): DataFrame

createExternalTable creates an external table tableName from the given path and returns the corresponding DataFrame.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...

val readmeTable = spark.catalog.createExternalTable("readme", "README.md", "text")
readmeTable: org.apache.spark.sql.DataFrame = [value: string]

scala> spark.catalog.listTables.filter(_.name == "readme").show
+------+--------+-----------+---------+-----------+
|  name|database|description|tableType|isTemporary|
+------+--------+-----------+---------+-----------+
|readme| default|       null| EXTERNAL|      false|
+------+--------+-----------+---------+-----------+

scala> sql("select count(*) as count from readme").show(false)
+-----+
|count|
+-----+
|99   |
+-----+

The source input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable uses spark.sql.sources.default setting to know the data source format.

Note	`source` input parameter must not be `hive` as it leads to a `AnalysisException`.

createExternalTable sets the mandatory path option when specified explicitly in the input parameter list.

createExternalTable parses tableName into TableIdentifier (using SparkSqlParser). It creates a CatalogTable and then executes (by toRDD) a CreateTable logical plan. The result DataFrame is a Dataset[Row] with the QueryExecution after executing SubqueryAlias logical plan and RowEncoder.

spark sql CatalogImpl createExternalTable.png

Figure 2. CatalogImpl.createExternalTable

Note	`createExternalTable` is a part of Catalog contract.

CatalogImpl

CatalogImpl

`functionExists` Method

`refreshTable` Method

Caching Table or View In-Memory — `cacheTable` Method

Removing All Cached Tables From In-Memory Cache — `clearCache` Method

Creating External Table From Path — `createExternalTable` Method

results matching ""

No results matching ""

CatalogImpl

functionExists Method

refreshTable Method

Caching Table or View In-Memory — cacheTable Method

Removing All Cached Tables From In-Memory Cache — clearCache Method

Creating External Table From Path — createExternalTable Method

results matching ""

No results matching ""

`functionExists` Method

`refreshTable` Method

Caching Table or View In-Memory — `cacheTable` Method

Removing All Cached Tables From In-Memory Cache — `clearCache` Method

Creating External Table From Path — `createExternalTable` Method