auto_prep.utils package
Submodules
auto_prep.utils.abstract module
- class auto_prep.utils.abstract.Categorical[source]
Bases:
ABC
Abstract interface to indicate categorical step
- class auto_prep.utils.abstract.Classifier[source]
Bases:
ABC
Abstract interface to indicate classification problem.
- class auto_prep.utils.abstract.ModulesHandler[source]
Bases:
ABC
- static construct_pipelines_steps(step_name: str, module_name: str, called_from: str, pipelines: List[List[Step]] = [], required_only_: bool = False) List[List[Step]] [source]
Constructs new pipelines (list of steps) by adding steps from the provided module. The method dynamically loads and groups classes from the module, and then extends existing pipelines by adding required and/or non-required steps.
The method starts by loading and grouping classes from the module. It then explodes the existing pipelines by adding required steps. If the required_only_ flag is False, non-required steps are also added to the pipelines.
- Parameters:
step_name (str) – The name of the step, used for logging purposes.
module_name (str) – The name of the module from which to load and group classes.
called_from (str) – for relative imports.
pipelines (List[List[Step]]) – A list of existing pipelines to be extended.
required_only (bool, optional) – If True, only required steps are added to the pipelines. If False, both required and non-required steps are added. Defaults to False.
- Returns:
A list of new pipelines steps created by adding the corresponding required and non-required steps to the original pipelines.
- Return type:
List[List[Step]]
- static construct_pipelines_steps_helper(step_name: str, package_name: str, called_from: str, pipelines: List[List[Step]], required_only_: bool = False) List[List[Step]] [source]
A helper method to construct and extend pipelines steps by incorporating modules dynamically from a specified package.
This method uses the ModulesHandler.construct_pipelines function to add modules to existing pipelines based on the package’s name and the current file context. It logs the operation’s start and end using the provided logger.
- Parameters:
step_name (str) – The name of the step, used for logging purposes.
package_name (str) – The name of the package containing the modules to be dynamically added to the pipelines.
called_from (str) – for relative imports.
pipelines (List[List[Step]]) – A list of existing pipelines to which new modules will be added.
required_only (bool, optional) – If True, only the required modules (determined by the package) will be added. If False, both required and non-required modules will be included. Defaults to False.
- Returns:
The updated list of pipelines steps after incorporating the modules from the specified package.
- Return type:
List[List[Step]]
- static get_subpackage(__file__)[source]
Returns the name of the package (directory) containing the given file as relative auto_prep subpackage.
- Parameters:
__file__ (str) – The absolute or relative path to the current file.
- Returns:
The name of the directory containing the file, which is treated as the package name.
- Return type:
str
- Raises:
ValueError - if it cannot find the module –
- supported_combinations: List[List[object]] = [('NumericalRequired', (<class 'auto_prep.utils.abstract.Numerical'>, <class 'auto_prep.utils.abstract.RequiredStep'>)), ('NumericalNonRequired', (<class 'auto_prep.utils.abstract.Numerical'>, <class 'auto_prep.utils.abstract.NonRequiredStep'>)), ('CategoricalRequired', (<class 'auto_prep.utils.abstract.Categorical'>, <class 'auto_prep.utils.abstract.RequiredStep'>)), ('CategoricalNonRequired', (<class 'auto_prep.utils.abstract.Categorical'>, <class 'auto_prep.utils.abstract.NonRequiredStep'>)), ('NumericalCategoricalRequired', (<class 'auto_prep.utils.abstract.NumericalCategorical'>, <class 'auto_prep.utils.abstract.RequiredStep'>)), ('NumericalCategoricalNonRequired', (<class 'auto_prep.utils.abstract.NumericalCategorical'>, <class 'auto_prep.utils.abstract.NonRequiredStep'>))]
- supported_interfaces: List[object] = [<class 'auto_prep.utils.abstract.Numerical'>, <class 'auto_prep.utils.abstract.Categorical'>, <class 'auto_prep.utils.abstract.NumericalCategorical'>, <class 'auto_prep.utils.abstract.RequiredStep'>, <class 'auto_prep.utils.abstract.NonRequiredStep'>]
- class auto_prep.utils.abstract.NonRequiredStep[source]
Bases:
Step
Non required step that will be only considered for preprocessing.
- class auto_prep.utils.abstract.Numerical[source]
Bases:
ABC
Abstract interface to indicate numerical step
- class auto_prep.utils.abstract.NumericalCategorical[source]
Bases:
ABC
Abstract interface to indicate categorical and numerical step
- class auto_prep.utils.abstract.Regressor[source]
Bases:
ABC
Abstract interface to indicate regression problem.
- class auto_prep.utils.abstract.RequiredStep[source]
Bases:
Step
Required step that will be always considered in preprocessing.
auto_prep.utils.config module
- class auto_prep.utils.config.GlobalConfig(*args, **kwargs)[source]
Bases:
object
Global config class.
- set(raport_name: str = 'raport', raport_title: str = 'ML Raport', raport_author: str = 'AutoPrep', raport_abstract: str = NoEscape( \begin{abstract} This raport has been generated with AutoPrep. \end{abstract} ), root_dir: str = 'raport', return_tex_: bool = True, logger_colors_map: dict = {'CRITICAL': '\x1b[41m', 'DEBUG': '\x1b[36m', 'ERROR': '\x1b[31m', 'INFO': '\x1b[32m', 'RESET': '\x1b[0m', 'WARNING': '\x1b[33m'}, log_format: str = '%(asctime)s %(levelname)s %(name)s: %(message)s', log_date_format: str = '%Y-%m-%d %H:%M:%S', log_level: str = 50, log_dir: str | None = None, max_log_file_size_in_mb: int = 5, tex_geomatry: dict = {'bmargin': '0.5in', 'footskip': '0.2in', 'headheight': '10pt', 'margin': '0.5in', 'tmargin': '0.5in'}, train_size: float = 0.8, test_size: float = 0.1, valid_size: float = 0.1, random_state: int = 42, max_datasets_after_preprocessing: int = 3, perform_only_required_: bool = False, raport_decimal_precision: int = 4, chart_settings: dict = {'heatmap_cmap': 'coolwarm', 'heatmap_fmt': '.2f', 'palette': 'pastel', 'plot_height_per_row': 8, 'plot_width': 20, 'theme': 'white', 'tick_label_rotation': 45, 'title_fontsize': 18, 'title_fontweight': 'bold', 'xlabel_fontsize': 15, 'ylabel_fontsize': 15}, correlation_selectors_settings: dict = {'k': 10, 'threshold': 0.8}, outlier_detector_settings: dict = {'cook_threshold': 1, 'isol_forest_n_estimators': 100, 'zscore_threshold': 3}, imputer_settings: dict = {'categorical_strategy': 'most_frequent', 'n_iter': 10, 'numerical_strategy': 'mean'}, umap_components: int = 50, correlation_threshold: float = 0.8, correlation_percent: float = 0.7, n_bins: int = 4, outlier_detector_method: str = 'zscore', max_unique_values_classification: int = 20, regression_pipeline_scoring_model: ~sklearn.base.BaseEstimator = RandomForestRegressor(max_depth=5, n_jobs=-1, random_state=42, warm_start=True), classification_pipeline_scoring_model: ~sklearn.base.BaseEstimator = RandomForestClassifier(max_depth=5, n_jobs=-1, random_state=42, warm_start=True), regression_pipeline_scoring_func: callable | str = (<function mean_squared_error>, 'min'), classification_pipeline_scoring_func_bin: callable | str = (<function roc_auc_score>, 'max'), classification_pipeline_scoring_func_multi: callable | str = (<function accuracy_score>, 'max'), max_workers: int | None = None, tuning_params: dict = {'cv': 3, 'n_iter': 10, 'n_jobs': -1, 'random_state': 42, 'verbose': 0}, max_models: int = 3)[source]
- Parameters:
raport_name (str)
raport_title (str)
raport_title
raport_abstract (str) – Defaults to
DEFAULT_ABSTRACT
.root_dir (str) – stored and all cache. Defaults to “raport”.
return_tex (bool) – alongsite the pdf. Defaults to True.
logger_colors_map (dict) – Defaults to
COLORS
.log_format (str) – Defaults to
LOG_FORMAT
.log_date_format (str) – Defaults to
LOG_DATE_FORMAT
.log_level (str) – Defaults to
LOG_LEVEL
.log_dir (str) – If None provided, will default to “logs” in directory from which program was called. -1 means no logging to file.
max_log_file_size_in_mb (int) – each logger. Defaults to 5.
tex_geomatry (dict) – Defaults to
DEFAULT_TEX_GEOMETRY
.train_size (float)
test_size (float)
valid_size (float)
random_state (int)
max_datasets_after_preprocessing (int) – after preprocessing steps. On them further models will be trained. Strongly affects performance.
perform_only_required (bool) – Affects entire process.
raport_decimal_precision (int) – Will use standard python rounding.
chart_settings (dict) – Settings for customizing chart appearance. Defaults to None, which initializes default settings.
correlation_selectors_settings (dict) – Settings for correlation selectors.
outlier_detector_settings (dict) – Settings for outlier detectors
imputer_settings (dict) – Settings for imputers
umap_components (int) – Number of components for UMAP.
max_unique_values_classification (int) – it will calculate number of unique values (in task “auto”). If this number will be lower than that value, it’ll perform classification.
regression_pipeline_scoring_model (BaseEstimator) – in classification regression task.
classification_pipeline_scoring_model (BaseEstimator) – in classification regression task.
regression_pipeline_scoring_func (callable)
classification_pipeline_scoring_func (callable)
raport_chart_color_pallete (List[str])
max_unique_values_classification – it will calculate number of unique values (in task “auto”). If this number will be lower than that value, it’ll perform classification.
regression_pipeline_scoring_model – in classification regression task.
classification_pipeline_scoring_model – in classification regression task.
Union[callable (classification_pipeline_scoring_func_multi)
pair (str] -)
Union[callable
pair
Union[callable
pair
raport_chart_color_pallete
correlation_threshold (float)
correlation_percent (float)
n_bins (int)
outlier_detector_method (str)
max_workers (int)
tuning_params (dict)
max_models (int)
auto_prep.utils.logging_config module
- class auto_prep.utils.logging_config.ColoredFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]
Bases:
Formatter
Custom formatter adding colors to levelname and timing information.
- class auto_prep.utils.logging_config.TimedLogger(name: str, level: int = 0)[source]
Bases:
Logger
Logger subclass that tracks operation timing.
- start_operation(operation: str) None [source]
Initiates the timing of a specific operation.
This method marks the beginning of an operation and starts the timer. It is used to track the duration of a specific operation or task.
- Parameters:
operation (str) – The name or description of the operation being timed.
- auto_prep.utils.logging_config.setup_logger(name: str) TimedLogger [source]
Sets up a logger with both file and console handlers.
- Parameters:
name (str) – The name of the logger.
- Returns:
- An instance of the TimedLogger class, which is a subclass
of the standard Python logger that adds timing functionality.
- Return type:
auto_prep.utils.other module
- auto_prep.utils.other.get_scoring(task: str, y_train: Series) callable | str [source]
Retrieve proper scoring function from config.
- Parameters:
y_train (pd.Series) – Training target dataset.
task (str) – regiression / classification
- auto_prep.utils.other.save_chart(name: str, *args, **kwargs) str [source]
Saves chart to directory specified in config.
- Parameters:
name (str)
- Returs:
path (str) - Path where chart has been saved.