Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Apache Spark

Apache Spark is a powerful unified analytics engine for large-scale [distributed] data processing and machine learning. On top of the Spark core data processing engine are [library] for SQL, machine learning, graph computation, and stream processing. These libraries can be used together in many stages in modern data pipelines and allow for code reuse across [batch-processing], interactive, and [streaming-processing] applications. Spark is useful for [etl] processing, [data-analytics] and machine learning workloads, and for batch and interactive processing of SQL queries, machines learning inferences, and artificial intelligence applications.