create_features_split_regular_ge#

tangles.convenience.create_features_split_regular_ge(single_col_data: Series | ndarray, num_bins: int = 5, max_num_values_for_extensive_seps: int | None = 50, invalid_values: list | ndarray | None = None) → Tuple[ndarray, ndarray]#

A feature factory function creating bipartitions splitting the dataset at equidistant thresholds between minimum and maximum the variable’s range.

The function usually creates multiple features for one question. The range of a variable is divided into regular sections, that is, into intervals of the same size. Their boundaries are used as thresholds.

Each feature describes the subset of respondents who gave an answer at least as high as the thresholds.

Parameters#

single_col_datapd.Series or np.ndarray: The featured data.
num_binsint: Number of bins.
max_num_values_for_extensive_sepsint: If a variable takes at most this number of different values, we fall back to ordinal style features (this functionality is useful if the survey metadata was configured inaccurately in that variables that are ordinal have been incorrectly declared to be numerical variables).
invalid_valueslist or np.ndarray: The invalid values in single_col_data.

Returns#

tuple[np.ndarray, np.ndarray]: The features in the first entry and the corresponding metadata in the second entry.