Spark provides a machine learning library known as MLlib. Its goal is to make practical machine learning scalable and easy.
Spark MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling.
At a high level, it provides tools such as:
- ML Algorithms: Common learning algorithms such as classification, regression, clustering, and collaborative filtering.
- Featurization: Feature extraction, transformation, dimensionality reduction, and selection.
- Pipelines: Tools for constructing, evaluating, and tuning ML Pipelines.
- Persistence: Saving and load algorithms, models, and Pipelines.
- Utilities: Linear algebra, statistics, data handling, etc.