LiveML¶

LiveML Core¶

`adele.modeling.liveml2.core.LiveMLS3Boto` ¶

LiveML Boto3 connections (can only be used in datalake account)

`adele.modeling.liveml2.core.LiveMLCore (LiveMLS3Boto)` ¶

Core LiveML class for all of the other clients to inherit from

Parameters:

Name	Type	Description	Default
`environment`	`str`	Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None.	`None`
`local`	`bool`	Is this client being deployed on local machine (which requires bastion connection to RDS instance) or not	`False`

App¶

`adele.modeling.liveml2.classes.app.core.App (LiveMLCore)` ¶

Core App-level class that deals with anything at the application level. This includes listing all available projects and managing the benchmark of models.

Parameters:

Name	Type	Description	Default
`environment`	`str`	Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None.	required
`local`	`bool`	Is this client being deployed on local machine (which requires bastion connection to RDS instance) or not	required

Examples:

Example 1, get the list of all available projects and their info

>>> client = AppClient("dev")
>>> result = client.list_projects()

Example 2, get the list of projects and their info given a list of project ids

>>> client = AppClient("dev")
>>> result = client.list_projects(['P123456','P555555'])

`adele.modeling.liveml2.classes.app.benchmarks.AppBenchmarks (App)` ¶

Benchmark class

Examples:

>>> client = AppBenchmarks(
        "dev",
        local=True)
>>> client.retrieve_benchmark()
>>> client.get_scorers()

`retrieve_benchmark(self, criteria='neutral')` ¶

Retrives all models that have been marked as part of the benchmark, returning the information of interest reformatted to fit our explorations. Models get marked as 'benchmark' when running get_best_model within a project, to avoid storing those that perform worse than no-skill classifiers.

Parameters:

Name	Type	Description	Default
`criteria`	`str`	Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'.	`'neutral'`

Returns:

Type	Description
`DataFrame`	dataframe containing all properly-formatted models.

`calculate_benchmark_model(self, model, criteria='neutral', by_metric=True)` ¶

Retrieves the benchmarked models, ranks them and calculates evaluation metrics (avg_pct) for the given model uuid.

Parameters:

Name	Type	Description	Default
`model`	`str`	model uuid of the model we are interested on.	required
`criteria`	`str`	Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'.	`'neutral'`
`by_metric`	`bool`	description. Defaults to True.	`True`

Returns:

Type	Description
`dict`	dictionary with all the relevant evaluation metrics, including our own avg_pct.

`calculate_benchmark_scorers(self, criteria='neutral')` ¶

Retrieves the benchmarked models, ranks them and calculates the evaluation metric (avg_pct) descriptors for each stored scorer.

Parameters:

Name	Type	Description	Default
`criteria`	`str`	Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'.	`'neutral'`

Returns:

Type	Description
`List`	List of dictionaries with the scorers names and information.

`get_scorers(self)` ¶

Returns all the available scorers and their info.

Returns:

Type	Description
`dict`	Dict with scorers as key and their available info in the items.

`adele.modeling.liveml2.classes.app.projects.AppProjects (App)` ¶

Project level class at the app level

Examples:

>>> client = AppProjects(
        "dev",
        local=True)

>>> client.get_projects()

`get_projects(self, filter=None)` ¶

Returns all the available projects and their info, or only the ones specified in the filter.

Parameters:

Name	Type	Description	Default
`filter`	`list`	List of project_ids which information we want to retrieve. Defaults to None.	`None`

Returns:

Type	Description
`list`	List of projects. Each project is a dictionary with all the relevant fields: 'project_id','description',etc.

File¶

`adele.modeling.liveml2.classes.file.core.File (LiveMLCore)` ¶

Client within the LiveML modeling framework that deals with the upload, deletion and validation of files within a given project, as well as validation. For validating files, they must be tagged with one of the available file types or tags. File existence checks and reading the content of the file are also within the scope of the client.

Parameters:

Name	Type	Description	Default
`environment`	`str`	Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None.	`None`

Examples:

Example 1, upload file or get content if it exists.

>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename='FEATURES.csv'
>>> if client.check_existence():
        print(client.dataframe)
    else:
        client.create_upload_url()
        print(client.upload_url)

Example 2, validate file.

>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename ='FEATURES.csv'
>>> client.tag = 'features'
>>> client.validate()

Example 3, delete file.

>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename ='FEATURES.csv'
>>> client.delete()

`get_validated_pairs(self)` ¶

Given the filename of the client, get the validation status of that file and the results of its validation against other files in the same project. Validation statuses include STARTED and COMPLETED, while validation results include SUCCESS, FAIL, and a null value if it has not yet been run.

Exceptions:

Type	Description
`FileNotInDB`	Raised if the client filename cannot be found in file_info

Returns:

Type Description

dict

complete_set is a boolean that is True if the target file is and other files it is validated against consist of at least one categorical, scale, features, and outcomes file. These files must all pass intravalidation and intervalidation against the target file in order to count towards a complete set.

missing_validations contains a list of file tags still required to make a complete set of training
files using the target file.

filelist is a list of all intervalidations run against the target file, their intervalidation 
status with the target file (verdict), their intravalidation status, their tag, and a list of 
errors found when performing intervalidation against the target file.

Example output:

{'complete_set': 'False', 'missing_validations': ['categorical', 'scale'], 'filelist': [ {'filename': 'outcomes4.csv', 'tag': 'outcomes', 'inter_validation': 'PASS', 'inter_errors': None, 'intra_validation': 'PASS', 'intra_errors': 'warning: no user-defined weights detected - will use default weights, which is to try to equalize classes'} ] 'target_file': {'filename': 'FEAUTRES.csv', 'intra_validation': 'PASS', 'intra_val_errors': None} 'final_val_status': 'PASS' }

`check_existence(self)` ¶

Checks for the presence of the file in the DB, which implies its s3 existence.

Exceptions:

Type	Description
`ValidationError`	When an input fails its pydantic validation.

Returns:

Type	Description
`bool`	True if it exists and False otherwise.

`create_upload_url(self)` ¶

In order to upload a file, this method generates an AWS presigned url, checking for prior file existance and if the parameter upload_override has been set to True before.

Exceptions:

Type	Description
`PresignedUrlError`	The AWS client had an error creating the url.
`FileExistNoOverride`	File exists and override was not set to True.
`ValidationError`	When an input fails its pydantic validation.

`create_download_url(self)` ¶

In order to download a file, this method generates an AWS presigned url, checking for prior file existance.

Exceptions:

Type	Description
`PresignedUrlError`	The AWS client had an error creating the url.
`FileExistNoOverride`	File exists and override was not set to True.
`ValidationError`	When an input fails its pydantic validation.

`preview(self, num_rows=30)` ¶

Returns a preview of the

Parameters:

Name	Type	Description	Default
`num_rows`	`int`	Number of rows at the top of the dataframe to return. Defaults to 30.	`30`

Exceptions:

Type	Description
`FileDoesNotExist`	Exception if the file does not exist in the S3 path

Returns:

Type	Description
`pd.DataFrame`	A dataframe of containing the top num_rows rows of the file.

`delete(self)` ¶

Deletes the given file from the s3 bucket after checking for its existence.

Exceptions:

Type	Description
`FileDoesNotExist`	File does not exist in the db and therefore not in s3.
`ValidationError`	When an input fails its pydantic validation.

`adele.modeling.liveml2.classes.file.engineering.FileFeatureEngineering (File)` ¶

`feature_engineering(self, new_file_name, remappings)` ¶

This endpoint creates new features files in the project directory with the new columns appended which should now be visible by the API. The user must specify either a categorical or numeric variable, specifying a dictionary for mapping values or bins for discretizing values respectively. The new features file is saved as a csv in S3.

Parameters:

Name	Type	Description	Default
`new_file_name`	`str`	the name of the new feature file	required
`remappings`	`list`	A list of dictionaries indicating how to perform the remapping. example: [{ 'old_feature_name': 'Cluster', 'new_feature_name': 'ClusterNew', 'type': 'categorical', 'value_map': { 'TO': ['t','o'], 'PAQ': ['p','a','q'] } }, { 'old_feature_name': 'v6MonthTotalUnits', 'new_feature_name': 'v6MonthTotalUnitsNew', 'type': 'numeric', 'value_bins': [0, 30, 100, 1000, 10000] }]	required

Exceptions:

Type	Description
`NoFilesFound`	Raised if the features file cannot be found in S3
`FeatureTypeError`	Raised if the indicated variable type does not match the data type in the file.
`DiscretizationError`	Error raised if there are issues descretizing a numeric variable

Returns:

Type	Description
`str`	Confirmation message with containing the filename of the new features file.

`adele.modeling.liveml2.classes.file.validate.FileValidate (File)` ¶

`validate(self, rerun=True)` ¶

For a file that has been previously tagged, it performs a series of validations.

Exceptions:

Type	Description
`TagNotAvailable`	Tag needs to be set before running.
`IdVarNotAvailable`	Id variable needs to be set before running.

Job¶

`adele.modeling.liveml2.classes.job.core.Job (LiveMLCore)` ¶

Client within the LiveML modelling framework that deals with the creation and polling of the ECS tasks that train and evaluate xgboost models for a given project and data together with other configuration parameters.

Each task produces a model. Each set of tasks that are created from within a given project, with a given set of data, one or more scorers and one or more sets of dependent variables (each called a solution) is called a job.

Parameters:

Name	Type	Description	Default
`environment`	`str`	Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None.	`None`

Examples:

Example 1, submit a job.

>>> client = Job("dev")
    client.project_id = "P123456"
    client.job = my_job
    client.submit(payload)

Example 2, poll a job.

>>> client = Job("dev")
    client.project_id = "P123456"
    client.job_id = "e7818082-b400-46cb-87f9-1761ed1149f0"
    client.poll()
    results = client.polled_tasks

`create(self)` ¶

One of the main methods of the job client, generates the job identifier as well as the model identifers that correspond to each of the possible combinations of solutions (dv) and scorers. Possible solutions and modelUIDs are stored as a TaskCreationPayload object. Needs to be called before submit.

Exceptions:

Type	Description
`ValidationError`	When an input fails its pydantic validation.
`JobOrProjectIdMissing`	The necessary project and job identifiers are missing.

`submit(self)` ¶

One of the main methods of the job client, creates the necessary ECS tasks for a given job (project and set of files), one task per combination of scorer and solution and stores them in the task_status table of the database. To be used as BackgroundTask in the LiveML API, always after calling the create method.

Exceptions:

Type	Description
`ValidationError`	When an input fails its pydantic validation.
`SubmitPayloadMissing`	The necessary TaskCreationPayload object which is set in create is missing.

`poll(self)` ¶

One of the main methods of the job client, returns status information of the task(s) associated with a given job identifier. It leverages both the information in the database (that the ECS task updates when it starts RUNNING, gets COMPLETE or has an ERROR) and the information stored in AWS in case of AWS infrasctructure errors or undetected error in the ECS task. Client class stores in polled_tasks the list with all the available tasks within the given job id their its status.

Exceptions:

Type	Description
`JobNotInDatabase`	The given job identifier is not present in the database.
`ValidationError`	When an input fails its pydantic validation.

`adele.modeling.liveml2.classes.job.summary.JobSummary (Job)` ¶

`all_jobs` `property` `readonly` ¶

This property getter for the list of jobs reads them from db.

Returns:

Type	Description
`pd.Dataframe`	dataframe with all the updated jobs info for the project

`get_summary_and_update(self)` ¶

This summary method returns the list of existing jobs together for a project with other project-level job-related metrics. To avoid race conditions if updating on the ECS training tasks, which are likely due to parellization, it calculates all metrics for existing jobs that have not been marked yet as Finished. The only way to mark jobs as Finished is for this same method to have found all their tasks are completed.

Exceptions:

Type	Description
`NoJobsAvailableForProject`	Project has no jobs

Returns:

Type	Description
`dict`	Includes a list of jobs with their metrics and some project-level jon summary metrics.

Model¶

`adele.modeling.liveml2.classes.model.core.Model (LiveMLCore)` ¶

Core Model-level class that deals with anything at the application level. This includes listing all available projects and managing the benchmark of models.

Parameters:

Name	Type	Description	Default
`model_uid`	`str`	model uuid provided by the model client	required

Examples:

`info(self)` ¶

Returns performance metrics, scorecard, P/R, etc..

Returns:

Type	Description
`dict`	Data from model_results corresponding to model_id.

`adele.modeling.liveml2.classes.model.package.ModelPackage (Model)` ¶

Subclass of Model that deals with creating a deployment package for a given model. Deployment packages are given as a download from a presigned url and include a report together with the model and different ways of deploying it: API in docker locally, API in docker in AWS, local python code. It uses jinja2 to custom templates with each models information.

`create_deployment_package(self)` ¶

Method for the creation of a deployment package.

Exceptions:

Type	Description
`PresignedUrlError`	Error creating a presigned url.

Returns:

Type	Description
`str`	Presigned url.

`adele.modeling.liveml2.classes.model.predictions.ModelPredictions (Model)` ¶

Subclass of Model that deals with creating model predictions

`prepare_predictions(self, targetfile)` ¶

Computes predictions then saved as csv for a given compatible file. Predictions contain columns for ids, class probabilities, and softmax classification.

Parameters:

Name	Type	Description	Default
`targetfile`	`str`	The name of the file to use as inputs.	required

Exceptions:

Type	Description
`IVsFileWrongFormat`	Format of target files is not proper IVs format

Returns:

Type	Description
`dict`	background_task_payload to be passed to calculate_predictions_background

`get_training_predictions(self)` ¶

Method for returning the training predictions file.

Exceptions:

Type	Description
`PresignedUrlError`	Any kind on error related to generating a presigned url.

Returns:

Type	Description
`dict`	A dictionary containing the name of the training predictions file as well as a presigned URL for retrieving the file

`calculate_out_of_sample_predictions(self, ivs, respids)` ¶

Background task function to be run in route logic.

Parameters:

Name	Type	Description	Default
`ivs`	`pd.DataFrame`	Independent variable.	required
`respids`	`pd.Series`	Description of parameter `respids`.	required

Examples:

>>> client = ModelClient(PostgresConnection, "R111111", model_id: "1b6ace9d-2986-435e-9bff-8a78e3fb52d8")
>>> background_payload = client.prepare_predictions("target_file.csv")
>>> calculate_predictions(**background_payload)

`get_out_of_sample_predictions(self, target_file=None)` ¶

Method for returning the out-of-sample predictions file.

Parameters:

Name	Type	Description	Default
`target_file`	`str`	Filename of out of sample features. Defaults to None.	`None`

Exceptions:

Type	Description
`PresignedUrlError`	Any kind on error related to generating a presigned url.

Returns:

Type	Description
`dict`	A dictionary containing the name of the out of sample predictions file as well as a presigned URL for retrieving the file

`profile_predicted(self)` ¶

Profile a specified predicted outcome using features information, split by scales and categoricals.

!!!NOTE: This needs to be updated to profile any prediction, not just the training predictions

Exceptions:

Type	Description
`FileNotFound`	Error if the client cannot find a particular file

Returns:

Type	Description
`dict`	A profile table contains the profiling variables in the rows and the classes in the columns. This dataframe is converted to a dictionary for the purpose of returning the values through the API. The means/percentages are in each row.

Project¶

`adele.modeling.liveml2.classes.project.models.ProjectModels (Project)` ¶

`list_models(self, column_list=['uuid', 'created_at', 'email'])` ¶

Lists all of the model_ids for the project

`run_no_skill_models(self, solution, proportions)` ¶

For current project and given solution, generates two no-skill models. One would be a random classifier and the other a majority class classifier. The same evaluation metrics that are calculated for xgboost models are run on them, using the holdout dataset.

Parameters:

Name	Type	Description	Default
`solution`	`str`	name of the solution (outcome classes or target column in dvs)	required
`proportions`	`pd.DataFrame`	dataframe with the proportions per class for given solution	required

Returns:

Type	Description
`list`	list of models which are defined as a list containing model id, solution and performance results.

`check_class_imbalance(self, solution)` ¶

For current project and given solution, it checks if the training data outcomes are balanced (defined as if any class has aproportion less than 1/(3*nclasses) of the share).

Parameters:

Name	Type	Description	Default
`solution`	`str`	name of the solution (outcome classes or target column in dvs)	required

Returns:

Type	Description
`bool`	True if the training data is imbalaced, False otherwise pd.DataFrame: proportions for each class

`save_benchmark(self, df)` ¶

Given a DataFrame of models that includes no-skill models and skilled ones, this method compares each model with each corresponding (same solution) no skills models and if its avg_pct metric is better than the no-skill equivalents, it gets saved in the benchmark (Benchmark column in database is set as 'Yes' or 'No' if failure)'

Parameters:

Name	Type	Description	Default
`df`	`pd.DataFrame`	DataFrame of models that includes no-skill models and skilled ones	required

`get_best_model(self, criteria, solution=None)` ¶

Detects the model for a project that has the best model performance, given a precision vs recall criteria, based on average percentile. It also saves in benchmark those models considered better than no skill models.

Parameters:

Name	Type	Description	Default
`criteria`	`str`	scoring criteria to be used for selecting the best model	required
`solution`	`str`	filters best models by solution if specified	`None`

Exceptions:

Type	Description
`SolutionNotAvailable`	Given solution must be in the outcomes_info table

Returns:

Type	Description
`dict`	dictionary containing the best model for that criteria and solution together with their performance metrics

`adele.modeling.liveml2.classes.project.flags.ProjectFlags (Project)` ¶

`toggle_star(self, filename)` ¶

Stars or unstars a file in the database

Parameters:

Name	Type	Description	Default
`filename`	`str`	target file for toggling star	required

Exceptions:

Type	Description
`UnstarrableFile`	Raised if the target file is not a scale, categorical, features, or outcomes file
`ExistingStarredFile`	Raised if another file of the same tag and project_id has already been starred

Returns:

Type	Description
`str`	Success message stating if the file has been starred or unstarred

`set_winning_model(self, model_id, overwrite=False)` ¶

Set a model as the winner for a given project_id.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The model_id to set as the winning model	required
`overwrite`	`bool`	Whether to overwrite an winning_model if a model has already been chosen. Defaults to False.	`False`

Exceptions:

Type	Description
`ExistingWinningModel`	Error if a winning model has been chosen for the project_id and the overwrite argument is set to False.

Examples:

>>> client = ProjectClient('P123456', 'dev', local=True)
>>> client.set_winning_model('019912be-d4bd-43aa-8cd2-23706eddc405', overwrite=True)

`adele.modeling.liveml2.classes.project.stats.ProjectStats (Project)` ¶

`class_counts(self, header, resolve_multiple_files=True)` ¶

Reads the class counts for a selected outcome within any file.

Parameters:

Name	Type	Description	Default
`header`	`str`	Column header to count classes	required
`resolve_multiple_files`	`bool`	Whether to produce a result when the same header is found in multiple files. The resolution is that it picks the first file.	`True`

Exceptions:

Type	Description
`NoHeaderFound`	Error if the specified header cannot be found in the outcomes_info given the project_id

Returns:

Type Description

list

An array of dictionaries giving the class name and class count. As an example:

[{'class_name': 1, 'count': 202}, {'class_name': 2, 'count': 261}, {'class_name': 3, 'count': 193}, {'class_name': 4, 'count': 328}]

`cross_table(self, header_1, header_2, resolve_multiple_files=True)` ¶

This method creates a cross table given 2 headers for a project.

Parameters:

Name	Type	Description	Default
`header_1`	`str`	The name of the first header	required
`header_2`	`str`	The name of the second header	required

Exceptions:

Type	Description
`NoFilesFound`	Raised if there are no S3 files for either of the indicated headers
`CrossTableError`	General error for anything that comes up during cross table calculations

Returns:

Type	Description
`dict`	Dictionary containing fields for the cross table, row percents, and column percents.

`calculate_bias(self, training_filename, outof_filename, scale_filename=None, categ_filename=None)` ¶

This method calculates sample bias between the training data and the out-of-sample data to ascertain data quality. We use the classic t-test of the means to assert the p-value of the null hypothesis for scale variables, and chi square for categoricals. Null hypothesis for t-test is that the means are equal, while for the chi-square is that the frequencies are equal.

Parameters:

Name	Type	Description	Default
`training_filename`	`str`	The training data filename to compare (features file)	required
`outof_filename`	`str`	The out-of-sample data filename to compare	required
`scale_filename`	`str`	Chosen scale file. If None obtained automatically.	`None`
`categ_filename`	`str`	Chosen categories file. If None obtained automatically.	`None`

Exceptions:

Type	Description
`NotBiasFileTypes`	File tags are not correct

Returns:

Type	Description
`List`	list of analyzed variables (dictionary) for scales List: list of analyzed variables (dictionary) for categories

`adele.modeling.liveml2.classes.project.validation.ProjectValidation (Project)` ¶

`validation_token(self)` ¶

Create a uuid token to be used for validation. This can be retrieved from the client to be returned to an end-user to poll for the validation.

`validate_all(self, dv_idvar=None, iv_idvar=None)` ¶

Runs intra and inter validation on all files associated with a project_id. Writes validation status and problem list into validation_info. Token must be generated and passed to this function.

Parameters:

Name	Type	Description	Default
`dv_idvar`	`str`	Column name for the dependent variable. Optional (default None)	`None`
`iv_idvar`	`str`	Column name for the independent variable. Optional (default None)	`None`

Examples:

>>> client = ProjectClient('P111111', 'dev', 'liveml', local=True)
>>> iv_idvar = 'LRW_ID'
>>> dv_idvar= 'Respondent_ID'
>>> client.validate_all(dv_idvar, iv_idvar)

`get_validation_status(self, token=None)` ¶

Retrieves the validation status from validation_info of a given validation task for the project. Return includes the pass/fail result, along with the list of problems.

Parameters:

Name	Type	Description	Default
`token`	`str`	Token for the validation attempt. The value in self.token takes precedence.	`None`

Returns:

Type	Description
`dict`	Contains token, the progress status of the validation, the validation result, and a list of problems

Validation¶

`adele.modeling.liveml2.classes.validation.core.Validator` ¶

Validator class is the parent Validator class where the generic methods for intravalidation (validation of the given file as its given type without any exterior context) and intervalidation (validation of the given file against other existing files of different tags with which certain requirementsmust be met).

`intra_validate(self, func)` ¶

Generic method for intravalidation, checks for prior existing validation in the db and if so, for the rerun parameter to be set as True to perform it.

Parameters:

Name	Type	Description	Default
`func`	`function`	Function implemented in the children validation classes that includes the different intra validation rules that the file has to follow and returns the errors.	required

Returns:

Type	Description
`list(str)`	List of database related error that may have stopped the validation from being adequately executed.

`inter_validate(self, func, targettag)` ¶

Generic method for intervalidation implemented by the children Validators, given a file and type of file to be validated against, gets pairs of files to be validated, checks for prior existing validation in the db and if so, for the rerun parameter to be set as True to perform the validation for that pair. This method might be called more than once within the children Validators if more than one kind of file is susceptible to paired validation with the given file.

Parameters:

Name	Type	Description	Default
`func`	`funcion`	Function implemented in the children validation classes that includes the different inter validation rules.	required
`targettag`	`str`	The types of file the given file is going to be validated against.	required

Returns:

Type	Description
`list(str)`	List of database related error that may have stopped the validation from being adequately executed.

`adele.modeling.liveml2.classes.validation.features.FeaturesValidator (Validator)` ¶

Interfile checks would be a “joinability” check against outcomes, and “exists” against scales and categoricals. However, we will consider ‘features’ as the base type and those checks will only be performed against it. Intrafile check is that every class for minimum missingness.

`adele.modeling.liveml2.classes.validation.outcomes.OutcomesValidator (Validator)` ¶

Intrafile check is that every class for every outcome has a minimum base size. Joinability checks for existing matching ids in both files, using a matching threshold (project parameter). It also looks to guarantee a minimum amount of data points (project parameter).

`update_outcomes_table(self, rerun=True)` ¶

Inserts headers into outcomes_info table according to their project_id and filename.

`adele.modeling.liveml2.classes.validation.outofsample.OutofValidator (Validator)` ¶

Internal checks are the same as features. Interfile check is a check against scales and categories in the same manner as they do with features files

`adele.modeling.liveml2.classes.validation.categoricals.CategValidator (Validator)` ¶

Interfile check is an “exists” check against features file. Intrafile check is that it’s a single column with regex on Vvar in the column.

`adele.modeling.liveml2.classes.validation.scales.ScalesValidator (Validator)` ¶

Interfile check is an “exists” check against features file and that selected variables can be typed as int or float by python. Intrafile check is that it’s a single column with regex on Vvar in the column.

`adele.modeling.liveml2.classes.validation.weights.WeightsValidator (Validator)` ¶

Intravalidation checks for no missing values, two columns, id_var present, all weight values must be numbers. Intervalidation with outcomes file checks that every record within the outcomes file has a corresponding weight. *We are assuming structure is first column is id second is weights, do we have a fixed var name?.

Last update: 2022-06-07

LiveML¶

LiveML Core¶

adele.modeling.liveml2.core.LiveMLS3Boto ¶

adele.modeling.liveml2.core.LiveMLCore (LiveMLS3Boto) ¶

App¶

adele.modeling.liveml2.classes.app.core.App (LiveMLCore) ¶

adele.modeling.liveml2.classes.app.benchmarks.AppBenchmarks (App) ¶

retrieve_benchmark(self, criteria='neutral') ¶

calculate_benchmark_model(self, model, criteria='neutral', by_metric=True) ¶

calculate_benchmark_scorers(self, criteria='neutral') ¶

get_scorers(self) ¶

adele.modeling.liveml2.classes.app.projects.AppProjects (App) ¶

get_projects(self, filter=None) ¶

File¶

adele.modeling.liveml2.classes.file.core.File (LiveMLCore) ¶

get_validated_pairs(self) ¶

check_existence(self) ¶

create_upload_url(self) ¶

create_download_url(self) ¶

preview(self, num_rows=30) ¶

delete(self) ¶

adele.modeling.liveml2.classes.file.engineering.FileFeatureEngineering (File) ¶

feature_engineering(self, new_file_name, remappings) ¶

adele.modeling.liveml2.classes.file.validate.FileValidate (File) ¶

validate(self, rerun=True) ¶

Job¶

adele.modeling.liveml2.classes.job.core.Job (LiveMLCore) ¶

create(self) ¶

submit(self) ¶

poll(self) ¶

adele.modeling.liveml2.classes.job.summary.JobSummary (Job) ¶

all_jobs property readonly ¶

get_summary_and_update(self) ¶

Model¶

adele.modeling.liveml2.classes.model.core.Model (LiveMLCore) ¶

info(self) ¶

adele.modeling.liveml2.classes.model.package.ModelPackage (Model) ¶

create_deployment_package(self) ¶

adele.modeling.liveml2.classes.model.predictions.ModelPredictions (Model) ¶

prepare_predictions(self, targetfile) ¶

get_training_predictions(self) ¶

calculate_out_of_sample_predictions(self, ivs, respids) ¶

get_out_of_sample_predictions(self, target_file=None) ¶

profile_predicted(self) ¶

Project¶

adele.modeling.liveml2.classes.project.models.ProjectModels (Project) ¶

list_models(self, column_list=['uuid', 'created_at', 'email']) ¶

run_no_skill_models(self, solution, proportions) ¶

check_class_imbalance(self, solution) ¶

save_benchmark(self, df) ¶

get_best_model(self, criteria, solution=None) ¶

adele.modeling.liveml2.classes.project.flags.ProjectFlags (Project) ¶

toggle_star(self, filename) ¶

set_winning_model(self, model_id, overwrite=False) ¶

adele.modeling.liveml2.classes.project.stats.ProjectStats (Project) ¶

class_counts(self, header, resolve_multiple_files=True) ¶

cross_table(self, header_1, header_2, resolve_multiple_files=True) ¶

calculate_bias(self, training_filename, outof_filename, scale_filename=None, categ_filename=None) ¶

adele.modeling.liveml2.classes.project.validation.ProjectValidation (Project) ¶

validation_token(self) ¶

validate_all(self, dv_idvar=None, iv_idvar=None) ¶

get_validation_status(self, token=None) ¶

Validation¶

adele.modeling.liveml2.classes.validation.core.Validator ¶

intra_validate(self, func) ¶

inter_validate(self, func, targettag) ¶

adele.modeling.liveml2.classes.validation.features.FeaturesValidator (Validator) ¶

adele.modeling.liveml2.classes.validation.outcomes.OutcomesValidator (Validator) ¶

update_outcomes_table(self, rerun=True) ¶

adele.modeling.liveml2.classes.validation.outofsample.OutofValidator (Validator) ¶

adele.modeling.liveml2.classes.validation.categoricals.CategValidator (Validator) ¶

adele.modeling.liveml2.classes.validation.scales.ScalesValidator (Validator) ¶

adele.modeling.liveml2.classes.validation.weights.WeightsValidator (Validator) ¶

`adele.modeling.liveml2.core.LiveMLS3Boto` ¶

`adele.modeling.liveml2.core.LiveMLCore (LiveMLS3Boto)` ¶

`adele.modeling.liveml2.classes.app.core.App (LiveMLCore)` ¶

`adele.modeling.liveml2.classes.app.benchmarks.AppBenchmarks (App)` ¶

`retrieve_benchmark(self, criteria='neutral')` ¶

`calculate_benchmark_model(self, model, criteria='neutral', by_metric=True)` ¶

`calculate_benchmark_scorers(self, criteria='neutral')` ¶

`get_scorers(self)` ¶

`adele.modeling.liveml2.classes.app.projects.AppProjects (App)` ¶

`get_projects(self, filter=None)` ¶

`adele.modeling.liveml2.classes.file.core.File (LiveMLCore)` ¶

`get_validated_pairs(self)` ¶

`check_existence(self)` ¶

`create_upload_url(self)` ¶

`create_download_url(self)` ¶

`preview(self, num_rows=30)` ¶

`delete(self)` ¶

`adele.modeling.liveml2.classes.file.engineering.FileFeatureEngineering (File)` ¶

`feature_engineering(self, new_file_name, remappings)` ¶

`adele.modeling.liveml2.classes.file.validate.FileValidate (File)` ¶

`validate(self, rerun=True)` ¶

`adele.modeling.liveml2.classes.job.core.Job (LiveMLCore)` ¶

`create(self)` ¶

`submit(self)` ¶

`poll(self)` ¶

`adele.modeling.liveml2.classes.job.summary.JobSummary (Job)` ¶

`all_jobs` `property` `readonly` ¶

`get_summary_and_update(self)` ¶

`adele.modeling.liveml2.classes.model.core.Model (LiveMLCore)` ¶

`info(self)` ¶

`adele.modeling.liveml2.classes.model.package.ModelPackage (Model)` ¶

`create_deployment_package(self)` ¶

`adele.modeling.liveml2.classes.model.predictions.ModelPredictions (Model)` ¶

`prepare_predictions(self, targetfile)` ¶

`get_training_predictions(self)` ¶

`calculate_out_of_sample_predictions(self, ivs, respids)` ¶

`get_out_of_sample_predictions(self, target_file=None)` ¶

`profile_predicted(self)` ¶

`adele.modeling.liveml2.classes.project.models.ProjectModels (Project)` ¶

`list_models(self, column_list=['uuid', 'created_at', 'email'])` ¶

`run_no_skill_models(self, solution, proportions)` ¶

`check_class_imbalance(self, solution)` ¶

`save_benchmark(self, df)` ¶

`get_best_model(self, criteria, solution=None)` ¶

`adele.modeling.liveml2.classes.project.flags.ProjectFlags (Project)` ¶

`toggle_star(self, filename)` ¶

`set_winning_model(self, model_id, overwrite=False)` ¶

`adele.modeling.liveml2.classes.project.stats.ProjectStats (Project)` ¶

`class_counts(self, header, resolve_multiple_files=True)` ¶

`cross_table(self, header_1, header_2, resolve_multiple_files=True)` ¶

`calculate_bias(self, training_filename, outof_filename, scale_filename=None, categ_filename=None)` ¶

`adele.modeling.liveml2.classes.project.validation.ProjectValidation (Project)` ¶

`validation_token(self)` ¶

`validate_all(self, dv_idvar=None, iv_idvar=None)` ¶

`get_validation_status(self, token=None)` ¶

`adele.modeling.liveml2.classes.validation.core.Validator` ¶

`intra_validate(self, func)` ¶

`inter_validate(self, func, targettag)` ¶

`adele.modeling.liveml2.classes.validation.features.FeaturesValidator (Validator)` ¶

`adele.modeling.liveml2.classes.validation.outcomes.OutcomesValidator (Validator)` ¶

`update_outcomes_table(self, rerun=True)` ¶

`adele.modeling.liveml2.classes.validation.outofsample.OutofValidator (Validator)` ¶

`adele.modeling.liveml2.classes.validation.categoricals.CategValidator (Validator)` ¶

`adele.modeling.liveml2.classes.validation.scales.ScalesValidator (Validator)` ¶

`adele.modeling.liveml2.classes.validation.weights.WeightsValidator (Validator)` ¶