CatalogImpl
Note
|
CatalogImpl is in org.apache.spark.sql.internal package.
|
functionExists
Method
Caution
|
FIXME |
refreshTable
Method
Caution
|
FIXME |
Caching Table or View In-Memory — cacheTable
Method
cacheTable(tableName: String): Unit
Internally, cacheTable
first creates a DataFrame for the table followed by requesting CacheManager
to cache it.
Note
|
cacheTable uses the session-scoped SharedState to access the CacheManager .
|
Note
|
cacheTable is a part of Catalog contract.
|
Removing All Cached Tables From In-Memory Cache — clearCache
Method
clearCache(): Unit
clearCache
requests CacheManager
to remove all cached tables from in-memory cache.
Note
|
clearCache is a part of Catalog contract.
|
Creating External Table From Path — createExternalTable
Method
createExternalTable(tableName: String, path: String): DataFrame
createExternalTable(tableName: String, path: String, source: String): DataFrame
createExternalTable(
tableName: String,
source: String,
options: Map[String, String]): DataFrame
createExternalTable(
tableName: String,
source: String,
schema: StructType,
options: Map[String, String]): DataFrame
createExternalTable
creates an external table tableName
from the given path
and returns the corresponding DataFrame.
import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...
val readmeTable = spark.catalog.createExternalTable("readme", "README.md", "text")
readmeTable: org.apache.spark.sql.DataFrame = [value: string]
scala> spark.catalog.listTables.filter(_.name == "readme").show
+------+--------+-----------+---------+-----------+
| name|database|description|tableType|isTemporary|
+------+--------+-----------+---------+-----------+
|readme| default| null| EXTERNAL| false|
+------+--------+-----------+---------+-----------+
scala> sql("select count(*) as count from readme").show(false)
+-----+
|count|
+-----+
|99 |
+-----+
The source
input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable
uses spark.sql.sources.default setting to know the data source format.
Note
|
source input parameter must not be hive as it leads to a AnalysisException .
|
createExternalTable
sets the mandatory path
option when specified explicitly in the input parameter list.
createExternalTable
parses tableName
into TableIdentifier
(using SparkSqlParser). It creates a CatalogTable
and then executes (by toRDD) a CreateTable
logical plan. The result DataFrame is a Dataset[Row]
with the QueryExecution after executing SubqueryAlias logical plan and RowEncoder.
Note
|
createExternalTable is a part of Catalog contract.
|