RelationProvider — Data Sources With Schema Inference

RelationProvider is a contract for data source providers that support schema inference (and also can be accessed using SQL’s USING clause, i.e. in CREATE TEMPORARY VIEW and DROP DATABASE DDL operators).

Note
Schema inference is also called schema discovery.

RelationProvider is used exclusively when:

  • DataSource creates a BaseRelation (with no user-defined schema or the user-defined schema matches RelationProvider's)

Note
BaseRelation models a collection of tuples from an external data source with a schema.
Table 1. RelationProvider’s Known Implementations
Name Description

JdbcRelationProvider

KafkaSourceProvider

Tip
Use SchemaRelationProvider for relation providers that require a user-defined schema.

RelationProvider Contract

package org.apache.spark.sql.sources

trait RelationProvider {
  def createRelation(
    sqlContext: SQLContext,
    parameters: Map[String, String]): BaseRelation
}
Table 2. RelationProvider Contract
Method Description

createRelation

Accepts optional parameters (from SQL’s OPTIONS clause)

results matching ""

    No results matching ""