dislib package

Subpackages

dislib.load_txt_file(path, block_size, delimiter=', ')[source]

Loads a text file into a distributed array.

Parameters:
  • path (string) – File path.
  • block_size (tuple (int, int)) – Size of the blocks of the array.
  • delimiter (string, optional (default=”,”)) – String that separates columns in the file.
Returns:

x – A distributed representation of the data divided in blocks.

Return type:

ds-array

dislib.load_svmlight_file(path, block_size, n_features, store_sparse)[source]

Loads a SVMLight file into a distributed array.

Parameters:
  • path (string) – File path.
  • block_size (tuple (int, int)) – Size of the blocks for the output ds-array.
  • n_features (int) – Number of features.
  • store_sparse (boolean) – Whether to use scipy.sparse data structures to store data. If False, numpy.array is used instead.
Returns:

x, y – A distributed representation (ds-array) of the X and y.

Return type:

(ds-array, ds-array)

dislib.random_array(shape, block_size, random_state=None)[source]

Returns a distributed array of random floats in the open interval [0.0, 1.0). Values are from the “continuous uniform” distribution over the stated interval.

Parameters:
  • shape (tuple of two ints) – Shape of the output ds-array.
  • block_size (tuple of two ints) – Size of the ds-array blocks.
  • random_state (int or RandomState, optional (default=None)) – Seed or numpy.random.RandomState instance to generate the random numbers.
Returns:

dsarray – Distributed array of random floats.

Return type:

ds-array

dislib.apply_along_axis(func, axis, x, *args, **kwargs)[source]

Apply a function to slices along the given axis.

Execute func(a, *args, **kwargs) where func operates on nd-arrays and a is a slice of arr along axis. The size of the slices is determined by the blocks shape of x.

func must meet the following conditions:

  • Take an nd-array as argument
  • Accept axis as a keyword argument
  • Return an array-like structure
Parameters:
  • func (function) – This function should accept nd-arrays and an axis argument. It is applied to slices of arr along the specified axis.
  • axis (integer) – Axis along which arr is sliced. Can be 0 or 1.
  • x (ds-array) – Input distributed array.
  • args (any) – Additional arguments to func.
  • kwargs (any) – Additional named arguments to func.
Returns:

out – The output array. The shape of out is identical to the shape of arr, except along the axis dimension. The output ds-array is dense regardless of the type of the input array.

Return type:

ds-array

Examples

>>> import dislib as ds
>>> import numpy as np
>>> x = ds.random_array((100, 100), block_size=(25, 25))
>>> mean = ds.apply_along_axis(np.mean, 0, x)
>>> print(mean.collect())
dislib.array(x, block_size)[source]

Loads data into a Distributed Array.

Parameters:
  • x (spmatrix or array-like, shape=(n_samples, n_features)) – Array of samples.
  • block_size ((int, int)) – Block sizes in number of samples.
Returns:

dsarray – A distributed representation of the data divided in blocks.

Return type:

ds-array