Spark Streaming

Spark Streaming is used to build interactive and analytical applications. Used to create low-latency dashboards and security alert system, to optimize operations or prevent specific outcomes. Includes high-level operators to read streaming data from Apache Flume, Apache Kafka, and Twitter; historical data – from HDFS.

Architecture: Spark streams events in small batches that come in short time window before it processes them.

Development: Scala+Dstreams.

Performance: 100s of MB/s with low latency (few seconds).

Concerns: not integrated with Azure platform.