Apache Spark (HDP or HDInsight)

Apache Spark is an open source cluster computing framework. It provides API based on resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines. RDDs function as a working set for distributed programs that offers a form of distributed shared memory.

Components built on top of Spark: Spark SQL, Spark Streaming, MLlib, GraphX. It also can work with R Server.


%d bloggers like this: