create_features_split_regular_ge#

tangles.convenience.create_features_split_regular_ge(single_col_data: Series | ndarray, num_bins: int = 5, max_num_values_for_extensive_seps: int | None = 50, invalid_values: list | ndarray | None = None) Tuple[ndarray, ndarray]#

A feature factory function creating bipartitions splitting the dataset at equidistant thresholds between minimum and maximum the variable’s range.

The function usually creates multiple features for one question. The range of a variable is divided into regular sections, that is, into intervals of the same size. Their boundaries are used as thresholds.

Each feature describes the subset of respondents who gave an answer at least as high as the thresholds.

Parameters#

single_col_datapd.Series or np.ndarray

The featured data.

num_binsint

Number of bins.

max_num_values_for_extensive_sepsint

If a variable takes at most this number of different values, we fall back to ordinal style features (this functionality is useful if the survey metadata was configured inaccurately in that variables that are ordinal have been incorrectly declared to be numerical variables).

invalid_valueslist or np.ndarray

The invalid values in single_col_data.

Returns#

tuple[np.ndarray, np.ndarray]

The features in the first entry and the corresponding metadata in the second entry.