scala> schemaTyped.foreach(println)
StructField(a,IntegerType,true)
StructField(b,StringType,true)
StructType — Data Type for Schema Definition
StructType
is a built-in data type in Spark SQL to represent a collection of StructFields that together define a schema or its part.
Note
|
Read the official documentation of scala.collection.Seq. |
You can compare two StructType
instances to see whether they are equal.
import org.apache.spark.sql.types.StructType
val schemaUntyped = new StructType()
.add("a", "int")
.add("b", "string")
import org.apache.spark.sql.types.{IntegerType, StringType}
val schemaTyped = new StructType()
.add("a", IntegerType)
.add("b", StringType)
scala> schemaUntyped == schemaTyped
res0: Boolean = true
StructType
presents itself as <struct>
or STRUCT
in query plans or SQL.
fromAttributes
Method
Caution
|
FIXME |
toAttributes
Method
Caution
|
FIXME |
Adding Fields to Schema — add
Method
You can add a new StructField
to your StructType
. There are different variants of add
method that all make for a new StructType
with the field added.
add(field: StructField): StructType
add(name: String, dataType: DataType): StructType
add(name: String, dataType: DataType, nullable: Boolean): StructType
add(
name: String,
dataType: DataType,
nullable: Boolean,
metadata: Metadata): StructType
add(
name: String,
dataType: DataType,
nullable: Boolean,
comment: String): StructType
add(name: String, dataType: String): StructType
add(name: String, dataType: String, nullable: Boolean): StructType
add(
name: String,
dataType: String,
nullable: Boolean,
metadata: Metadata): StructType
add(
name: String,
dataType: String,
nullable: Boolean,
comment: String): StructType
DataType Name Conversions
simpleString: String
catalogString: String
sql: String
StructType
as a custom DataType
is used in query plans or SQL. It can present itself using simpleString
, catalogString
or sql
(see DataType Contract).
scala> schemaTyped.simpleString
res0: String = struct<a:int,b:string>
scala> schemaTyped.catalogString
res1: String = struct<a:int,b:string>
scala> schemaTyped.sql
res2: String = STRUCT<`a`: INT, `b`: STRING>
Accessing StructField — apply
Method
apply(name: String): StructField
StructType
defines its own apply
method that gives you an easy access to a StructField
by name.
scala> schemaTyped.printTreeString
root
|-- a: integer (nullable = true)
|-- b: string (nullable = true)
scala> schemaTyped("a")
res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true)
Creating StructType from Existing StructType — apply
Method
apply(names: Set[String]): StructType
This variant of apply
lets you create a StructType
out of an existing StructType
with the names
only.
scala> schemaTyped(names = Set("a"))
res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))
It will throw an IllegalArgumentException
exception when a field could not be found.
scala> schemaTyped(names = Set("a", "c"))
java.lang.IllegalArgumentException: Field c does not exist.
at org.apache.spark.sql.types.StructType.apply(StructType.scala:275)
... 48 elided
Displaying Schema As Tree — printTreeString
Method
printTreeString(): Unit
printTreeString
prints out the schema to standard output.
scala> schemaTyped.printTreeString
root
|-- a: integer (nullable = true)
|-- b: string (nullable = true)
Internally, it uses treeString
method to build the tree and then println
it.