Learning Spark internals using groupBy (to cause shuffle)

Execute the following operation and explain transformations, actions, jobs, stages, tasks, partitions using spark-shell and web UI.

sc.parallelize(0 to 999, 50).zipWithIndex.groupBy(_._1 / 10).collect

You may also make it a little bit heavier with explaining data distribution over cluster and go over the concepts of drivers, masters, workers, and executors.

Learning Spark internals using groupBy (to cause shuffle)

Learning Spark internals using groupBy (to cause shuffle)

results matching ""

No results matching ""