import org.apache.spark.mllib.linalg.Vectors
// You can create dense vectors explicitly by giving values per index
val denseVec = Vectors.dense(Array(0.0, 0.4, 0.3, 1.5))
val almostAllZeros = Vectors.dense(Array(0.0, 0.4, 0.3, 1.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
// You can however create a sparse vector by the size and non-zero elements
val sparse = Vectors.sparse(10, Seq((1, 0.4), (2, 0.3), (3, 1.5)))
// Convert a dense vector to a sparse one
val fromSparse = sparse.toDense
scala> almostAllZeros == fromSparse
res0: Boolean = true
Vector
Vector
sealed trait represents a numeric vector of values (of Double
type) and their indices (of Int
type).
It belongs to org.apache.spark.mllib.linalg
package.
Note
|
To Scala and Java developers:
It is not the Vector type in Scala or Java. Train your eyes to see two types of the same name. You’ve been warned. |
A Vector
object knows its size
.
A Vector
object can be converted to:
-
Array[Double]
usingtoArray
. -
a dense vector as
DenseVector
usingtoDense
. -
a sparse vector as
SparseVector
usingtoSparse
. -
(1.6.0) a JSON string using
toJson
. -
(internal) a breeze vector as
BV[Double]
usingtoBreeze
.
There are exactly two available implementations of Vector
sealed trait (that also belong to org.apache.spark.mllib.linalg
package):
-
DenseVector
-
SparseVector
Tip
|
Use Vectors factory object to create vectors, be it DenseVector or SparseVector .
|
Note
|
The factory object is called Vectors (plural).
|
import org.apache.spark.mllib.linalg._
// prepare elements for a sparse vector
// NOTE: It is more Scala rather than Spark
val indices = 0 to 4
val elements = indices.zip(Stream.continually(1.0))
val sv = Vectors.sparse(elements.size, elements)
// Notice how Vector is printed out
scala> sv
res4: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,1.0,1.0,1.0,1.0])
scala> sv.size
res0: Int = 5
scala> sv.toArray
res1: Array[Double] = Array(1.0, 1.0, 1.0, 1.0, 1.0)
scala> sv == sv.copy
res2: Boolean = true
scala> sv.toJson
res3: String = {"type":0,"size":5,"indices":[0,1,2,3,4],"values":[1.0,1.0,1.0,1.0,1.0]}