val dataset: Dataset[Token] = ...
scala> val tokensByName = dataset.groupByKey(_.name)
tokensByName: org.apache.spark.sql.KeyValueGroupedDataset[String,Token] = org.apache.spark.sql.KeyValueGroupedDataset@1e3aad46
KeyValueGroupedDataset — Typed Grouping
KeyValueGroupedDataset
is an experimental interface to calculate aggregates over groups of objects in a typed Dataset.
Note
|
RelationalGroupedDataset is used for untyped Row-based aggregates. |
KeyValueGroupedDataset
is a result of executing groupByKey strongly-typed grouping operator.
Operator | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
KeyValueGroupedDataset
holds keys
that were used for the object.
scala> tokensByName.keys.show
+-----+
|value|
+-----+
| aaa|
| bbb|
+-----+