CatalogImpl
|
Note
|
CatalogImpl is in org.apache.spark.sql.internal package.
|
functionExists Method
|
Caution
|
FIXME |
refreshTable Method
|
Caution
|
FIXME |
Caching Table or View In-Memory — cacheTable Method
cacheTable(tableName: String): Unit
Internally, cacheTable first creates a DataFrame for the table followed by requesting CacheManager to cache it.
|
Note
|
cacheTable uses the session-scoped SharedState to access the CacheManager.
|
|
Note
|
cacheTable is a part of Catalog contract.
|
Removing All Cached Tables From In-Memory Cache — clearCache Method
clearCache(): Unit
clearCache requests CacheManager to remove all cached tables from in-memory cache.
|
Note
|
clearCache is a part of Catalog contract.
|
Creating External Table From Path — createExternalTable Method
createExternalTable(tableName: String, path: String): DataFrame
createExternalTable(tableName: String, path: String, source: String): DataFrame
createExternalTable(
tableName: String,
source: String,
options: Map[String, String]): DataFrame
createExternalTable(
tableName: String,
source: String,
schema: StructType,
options: Map[String, String]): DataFrame
createExternalTable creates an external table tableName from the given path and returns the corresponding DataFrame.
import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...
val readmeTable = spark.catalog.createExternalTable("readme", "README.md", "text")
readmeTable: org.apache.spark.sql.DataFrame = [value: string]
scala> spark.catalog.listTables.filter(_.name == "readme").show
+------+--------+-----------+---------+-----------+
| name|database|description|tableType|isTemporary|
+------+--------+-----------+---------+-----------+
|readme| default| null| EXTERNAL| false|
+------+--------+-----------+---------+-----------+
scala> sql("select count(*) as count from readme").show(false)
+-----+
|count|
+-----+
|99 |
+-----+
The source input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable uses spark.sql.sources.default setting to know the data source format.
|
Note
|
source input parameter must not be hive as it leads to a AnalysisException.
|
createExternalTable sets the mandatory path option when specified explicitly in the input parameter list.
createExternalTable parses tableName into TableIdentifier (using SparkSqlParser). It creates a CatalogTable and then executes (by toRDD) a CreateTable logical plan. The result DataFrame is a Dataset[Row] with the QueryExecution after executing SubqueryAlias logical plan and RowEncoder.
|
Note
|
createExternalTable is a part of Catalog contract.
|