Powered by GitBook

InMemoryTableScanExec Physical Operator

InMemoryTableScanExec is a leaf physical operator that…FIXME

InMemoryTableScanExec is created exclusively when InMemoryScans execution planning strategy finds InMemoryRelation logical operators.

// Sample DataFrames
val tokens = Seq(
  (0, "playing"),
  (1, "with"),
  (2, "InMemoryTableScanExec")
).toDF("id", "token")
val ids = spark.range(10)

// Cache DataFrames
tokens.cache
ids.cache

val q = tokens.join(ids, Seq("id"), "outer")
scala> q.explain
== Physical Plan ==
*Project [coalesce(cast(id#5 as bigint), id#10L) AS id#33L, token#6]
+- SortMergeJoin [cast(id#5 as bigint)], [id#10L], FullOuter
   :- *Sort [cast(id#5 as bigint) ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(cast(id#5 as bigint), 200)
   :     +- InMemoryTableScan [id#5, token#6]
   :           +- InMemoryRelation [id#5, token#6], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
   :                 +- LocalTableScan [id#5, token#6]
   +- *Sort [id#10L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(id#10L, 200)
         +- InMemoryTableScan [id#10L]
               +- InMemoryRelation [id#10L], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                     +- *Range (0, 10, step=1, splits=8)

Table 1. InMemoryTableScanExec SQLMetrics (in alphabetical order)
Name	Description
`numOutputRows`	Number of output rows

spark sql InMemoryTableScanExec webui query details.png

Figure 1. InMemoryTableScanExec in web UI (Details for Query)

InMemoryTableScanExec uses spark.sql.inMemoryTableScanStatistics.enable flag (default: disabled) to enable accumulators (that appears exclusively for testing purposes).

Creating InMemoryTableScanExec Instance

InMemoryTableScanExec takes the following when created:

Attribute expressions
Predicate expressions
InMemoryRelation logical operator

results matching ""

No results matching ""