verde.train_test_split¶
-
verde.
train_test_split
(coordinates, data, weights=None, **kwargs)[source]¶ Split a dataset into a training and a testing set for cross-validation.
Similar to
sklearn.model_selection.train_test_split
but is tuned to work on multi-component spatial data with optional weights.Extra keyword arguments will be passed to
sklearn.model_selection.ShuffleSplit
, except forn_splits
which is always 1.Parameters: - coordinates : tuple of arrays
Arrays with the coordinates of each data point. Should be in the following order: (easting, northing, vertical, …).
- data : array or tuple of arrays
the data values of each data point. If the data has more than one component, data must be a tuple of arrays (one for each component).
- weights : none or array or tuple of arrays
if not none, then the weights assigned to each data point. If more than one data component is provided, you must provide a weights array for each data component (if not none).
Returns: - train, test : tuples
Each is a tuple = (coordinates, data, weights) generated by separating the input values randomly.
Examples
>>> import numpy as np >>> # Split 2-component data with weights >>> data = (np.array([1, 3, 5, 7]), np.array([0, 2, 4, 6])) >>> coordinates = (np.arange(4), np.arange(-4, 0)) >>> weights = (np.array([1, 1, 2, 1]), np.array([1, 2, 1, 1])) >>> train, test = train_test_split(coordinates, data, weights, ... random_state=0) >>> print("Coordinates:", train[0], test[0], sep='\n ') Coordinates: (array([3, 1, 0]), array([-1, -3, -4])) (array([2]), array([-2])) >>> print("Data:", train[1], test[1], sep='\n ') Data: (array([7, 3, 1]), array([6, 2, 0])) (array([5]), array([4])) >>> print("Weights:", train[2], test[2], sep='\n ') Weights: (array([1, 1, 1]), array([1, 2, 1])) (array([2]), array([1])) >>> # Split single component data without weights >>> train, test = train_test_split(coordinates, data[0], None, ... random_state=0) >>> print("Coordinates:", train[0], test[0], sep='\n ') Coordinates: (array([3, 1, 0]), array([-1, -3, -4])) (array([2]), array([-2])) >>> print("Data:", train[1], test[1], sep='\n ') Data: (array([7, 3, 1]),) (array([5]),) >>> print("Weights:", train[2], test[2], sep='\n ') Weights: (None,) (None,)