create_features#

SimpleSurveyFeatureFactoryMissingValuesImplicit.create_features(column_selection: str | int | list | ndarray | Series | range | Callable[[SurveyVariable], bool] | None = None) Tuple[ndarray, ndarray]#

Create features for a set of variables corresponding to the columns specified by column_selection.

Parameters#

column_selectionstr, int, list, np.ndarray, pd.Series, range, QuestionSelector or None

If None, all columns are taken into account. Otherwise, a subset will be considered. The parameter is interpreted as described in Survey.interpret_column_selection().

Returns#

tuple[np.ndarray, np.ndarray]

In the first entry, a matrix of shape (len(col_data), num_features), that contains the features as (oriented) indicator vectors. Here, num_features is the number of features created by this method.

In the second entry, a matrix of shape (num_features,), that contains metadata for each feature.

In this matrix, every entry is a tuple (column_name, operation, value). For example, the tuple ('feature_name', '>=', 8) describes the feature (or separation) ‘feature_name’ that splits the column into a group that answered less than 8 and one that answered at least 8.