API Reference

dislib.array: Distributed array


data.Array - 2-dimensional array divided in blocks that can be operated in a distributed way.

Array creation routines

dislib.array - Build a distributed array (ds-array) from an array-like structure, such as a NumPy array, a list, or a SciPy sparse matrix.

dislib.load_svmlight_file - Build a ds-array from a file in SVMlight format.

dislib.load_txt_file - Build a ds-array from a text file.

dislib.random_array - Build a random ds-array.

Other functions

dislib.apply_along_axis - Applies a function to a ds-array along a given axis.

dislib.utils: Utility functions

utils.shuffle - Randomly shuffles the rows of a ds-array.

dislib.preprocessing: Data pre-processing


preprocessing.StandardScaler - Scale a ds-array to zero mean and unit variance.

dislib.decomposition: Matrix Decomposition


decomposition.PCA - Principal component analysis (PCA).

dislib.cluster: Clustering


cluster.DBSCAN - Perform DBSCAN clustering.

cluster.KMeans - Perform K-Means clustering.

cluster.GaussianMixture - Fit a gaussian mixture model.

dislib.classification: Classification


classification.CascadeSVM - Distributed support vector classification using a cascade of classifiers.

classification.RandomForestClassifier - Build a random forest for classification.

dislib.recommendation: Recommendation


recommendation.ALS - Distributed alternating least squares for collaborative filtering.

dislib.regression: Regression


regression.LinearRegression - Simple linear regression using ordinary least squares.

dislib.neighbors: Neighbor queries


cluster.NearestNeighbors - Perform k-nearest neighbors queries.

dislib.model_selection: Model selection


model_selection.GridSearchCV - Exhaustive search over specified parameter values for an estimator.

model_selection.KFold - K-fold splitter for cross-validation.