KeyValueGroupedDataset — Typed Grouping

KeyValueGroupedDataset is an experimental interface to calculate aggregates over groups of objects in a typed Dataset.

Note	RelationalGroupedDataset is used for untyped Row-based aggregates.

KeyValueGroupedDataset is a result of executing groupByKey strongly-typed grouping operator.

val dataset: Dataset[Token] = ...
scala> val tokensByName = dataset.groupByKey(_.name)
tokensByName: org.apache.spark.sql.KeyValueGroupedDataset[String,Token] = org.apache.spark.sql.KeyValueGroupedDataset@1e3aad46

Table 1. KeyValueGroupedDataset’s Aggregate Operators (in alphabetical order)
Operator	Description
`agg`
`cogroup`
`count`
`flatMapGroups`
`flatMapGroupsWithState`
`keys`
`keyAs`
`mapGroups`
`mapGroupsWithState`
`mapValues`
`reduceGroups`

KeyValueGroupedDataset holds keys that were used for the object.

scala> tokensByName.keys.show
+-----+
|value|
+-----+
|  aaa|
|  bbb|
+-----+

KeyValueGroupedDataset — Typed Grouping

KeyValueGroupedDataset — Typed Grouping

results matching ""

No results matching ""