// Use ExpressionEncoder for simplicity
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
val stringEncoder = ExpressionEncoder[String]
val row = stringEncoder.toRow("hello world")
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
val unsafeRow = row match { case ur: UnsafeRow => ur }
scala> println(unsafeRow.getSizeInBytes)
32
scala> unsafeRow.getBytes
res0: Array[Byte] = Array(0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 0, 16, 0, 0, 0, 104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 0, 0, 0, 0, 0)
scala> unsafeRow.getUTF8String(0)
res1: org.apache.spark.unsafe.types.UTF8String = hello world
UnsafeRow — Mutable Raw-Memory Unsafe Binary Row Format
UnsafeRow
is a concrete InternalRow that represents a mutable internal raw-memory (and hence unsafe) binary row format.
In other words, UnsafeRow
is an InternalRow
that is backed by raw memory instead of Java objects.
UnsafeRow
supports Java’s Externalizable and Kryo’s KryoSerializable serialization/deserialization protocols.
The fields of a data row are placed using field offsets.
UnsafeRow’s mutable field data types (in alphabetical order):
-
BooleanType
-
ByteType
-
DateType
-
DoubleType
-
FloatType
-
IntegerType
-
LongType
-
NullType
-
ShortType
-
TimestampType
UnsafeRow
is composed of three regions:
-
Null Bit Set Bitmap Region (1 bit/field) for tracking null values
-
Fixed-Length 8-Byte Values Region
-
Variable-Length Data Section
That gives the property of rows being always 8-byte word aligned and so their size is always a multiple of 8 bytes.
Equality comparision and hashing of rows can be performed on raw bytes since if two rows are identical so should be their bit-wise representation. No type-specific interpretation is required.
isMutable
Method
static boolean isMutable(DataType dt)
isMutable
is enabled (i.e. returns true
) when the input dt
DataType is a mutable field type or DecimalType.
Otherwise, isMutable
is disabled (i.e. returns false
).
Note
|
|
Kryo’s KryoSerializable SerDe Protocol
Tip
|
Read up on KryoSerializable. |
Java’s Externalizable SerDe Protocol
Tip
|
Read up on java.io.Externalizable. |