The Distributed Computing Library (dislib) provides distributed algorithms ready to use as a library. So far, dislib is highly focused on machine learning algorithms, and is greatly inspired by scikit-learn. However, other types of numerical algorithms might be added in the future. The main objective of dislib is to facilitate the execution of big data analytics algorithms in distributed platforms, such as clusters, clouds, and supercomputers.
The following plot shows fit time of some dislib models on the MareNostrum 4 supercomputer (using 8 worker nodes):
Labels on the horizontal axis represent algorithm-dataset, where:
- ALS = AlternatingLeastSquares
- CSVM = CascadeSVM
- GMM = GaussianMixture
- RF = RandomForestClassifier
If you have questions or issues about the dislib you can join us in Slack.
Alternatively, you can send us an e-mail to email@example.com.