LiveML¶
LiveML Core¶
adele.modeling.liveml2.core.LiveMLS3Boto
¶
LiveML Boto3 connections (can only be used in datalake account)
adele.modeling.liveml2.core.LiveMLCore (LiveMLS3Boto)
¶
Core LiveML class for all of the other clients to inherit from
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment |
str |
Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None. |
None |
local |
bool |
Is this client being deployed on local machine (which requires bastion connection to RDS instance) or not |
False |
App¶
adele.modeling.liveml2.classes.app.core.App (LiveMLCore)
¶
Core App-level class that deals with anything at the application level. This includes listing all available projects and managing the benchmark of models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment |
str |
Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None. |
required |
local |
bool |
Is this client being deployed on local machine (which requires bastion connection to RDS instance) or not |
required |
Examples:
Example 1, get the list of all available projects and their info
>>> client = AppClient("dev")
>>> result = client.list_projects()
Example 2, get the list of projects and their info given a list of project ids
>>> client = AppClient("dev")
>>> result = client.list_projects(['P123456','P555555'])
adele.modeling.liveml2.classes.app.benchmarks.AppBenchmarks (App)
¶
Benchmark class
Examples:
>>> client = AppBenchmarks(
"dev",
local=True)
>>> client.retrieve_benchmark()
>>> client.get_scorers()
retrieve_benchmark(self, criteria='neutral')
¶
Retrives all models that have been marked as part of the benchmark, returning the information of interest reformatted to fit our explorations. Models get marked as 'benchmark' when running get_best_model within a project, to avoid storing those that perform worse than no-skill classifiers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criteria |
str |
Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'. |
'neutral' |
Returns:
Type | Description |
---|---|
DataFrame |
dataframe containing all properly-formatted models. |
calculate_benchmark_model(self, model, criteria='neutral', by_metric=True)
¶
Retrieves the benchmarked models, ranks them and calculates evaluation metrics (avg_pct) for the given model uuid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str |
model uuid of the model we are interested on. |
required |
criteria |
str |
Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'. |
'neutral' |
by_metric |
bool |
description. Defaults to True. |
True |
Returns:
Type | Description |
---|---|
dict |
dictionary with all the relevant evaluation metrics, including our own avg_pct. |
calculate_benchmark_scorers(self, criteria='neutral')
¶
Retrieves the benchmarked models, ranks them and calculates the evaluation metric (avg_pct) descriptors for each stored scorer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criteria |
str |
Scoring criteria, can prioritize precision, recall or none. Defaults to 'neutral'. |
'neutral' |
Returns:
Type | Description |
---|---|
List |
List of dictionaries with the scorers names and information. |
get_scorers(self)
¶
Returns all the available scorers and their info.
Returns:
Type | Description |
---|---|
dict |
Dict with scorers as key and their available info in the items. |
adele.modeling.liveml2.classes.app.projects.AppProjects (App)
¶
Project level class at the app level
Examples:
>>> client = AppProjects(
"dev",
local=True)
>>> client.get_projects()
get_projects(self, filter=None)
¶
Returns all the available projects and their info, or only the ones specified in the filter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter |
list |
List of project_ids which information we want to retrieve. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
list |
List of projects. Each project is a dictionary with all the relevant fields: 'project_id','description',etc. |
File¶
adele.modeling.liveml2.classes.file.core.File (LiveMLCore)
¶
Client within the LiveML modeling framework that deals with the upload, deletion and validation of files within a given project, as well as validation. For validating files, they must be tagged with one of the available file types or tags. File existence checks and reading the content of the file are also within the scope of the client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment |
str |
Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None. |
None |
Examples:
Example 1, upload file or get content if it exists.
>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename='FEATURES.csv'
>>> if client.check_existence():
print(client.dataframe)
else:
client.create_upload_url()
print(client.upload_url)
Example 2, validate file.
>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename ='FEATURES.csv'
>>> client.tag = 'features'
>>> client.validate()
Example 3, delete file.
>>> client = FileClient("dev")
>>> client.project_id = "P123456"
>>> client.filename ='FEATURES.csv'
>>> client.delete()
get_validated_pairs(self)
¶
Given the filename of the client, get the validation status of that file and the results of its validation against other files in the same project. Validation statuses include STARTED and COMPLETED, while validation results include SUCCESS, FAIL, and a null value if it has not yet been run.
Exceptions:
Type | Description |
---|---|
FileNotInDB |
Raised if the client filename cannot be found in file_info |
Returns:
Type | Description |
---|---|
dict |
complete_set is a boolean that is True if the target file is and other files it is validated against consist of at least one categorical, scale, features, and outcomes file. These files must all pass intravalidation and intervalidation against the target file in order to count towards a complete set.
|
Example output:
{'complete_set': 'False', 'missing_validations': ['categorical', 'scale'], 'filelist': [ {'filename': 'outcomes4.csv', 'tag': 'outcomes', 'inter_validation': 'PASS', 'inter_errors': None, 'intra_validation': 'PASS', 'intra_errors': 'warning: no user-defined weights detected - will use default weights, which is to try to equalize classes'} ] 'target_file': {'filename': 'FEAUTRES.csv', 'intra_validation': 'PASS', 'intra_val_errors': None} 'final_val_status': 'PASS' }
check_existence(self)
¶
Checks for the presence of the file in the DB, which implies its s3 existence.
Exceptions:
Type | Description |
---|---|
ValidationError |
When an input fails its pydantic validation. |
Returns:
Type | Description |
---|---|
bool |
True if it exists and False otherwise. |
create_upload_url(self)
¶
In order to upload a file, this method generates an AWS presigned url, checking for prior file existance and if the parameter upload_override has been set to True before.
Exceptions:
Type | Description |
---|---|
PresignedUrlError |
The AWS client had an error creating the url. |
FileExistNoOverride |
File exists and override was not set to True. |
ValidationError |
When an input fails its pydantic validation. |
create_download_url(self)
¶
In order to download a file, this method generates an AWS presigned url, checking for prior file existance.
Exceptions:
Type | Description |
---|---|
PresignedUrlError |
The AWS client had an error creating the url. |
FileExistNoOverride |
File exists and override was not set to True. |
ValidationError |
When an input fails its pydantic validation. |
preview(self, num_rows=30)
¶
Returns a preview of the
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_rows |
int |
Number of rows at the top of the dataframe to return. Defaults to 30. |
30 |
Exceptions:
Type | Description |
---|---|
FileDoesNotExist |
Exception if the file does not exist in the S3 path |
Returns:
Type | Description |
---|---|
pd.DataFrame |
A dataframe of containing the top num_rows rows of the file. |
delete(self)
¶
Deletes the given file from the s3 bucket after checking for its existence.
Exceptions:
Type | Description |
---|---|
FileDoesNotExist |
File does not exist in the db and therefore not in s3. |
ValidationError |
When an input fails its pydantic validation. |
adele.modeling.liveml2.classes.file.engineering.FileFeatureEngineering (File)
¶
feature_engineering(self, new_file_name, remappings)
¶
This endpoint creates new features files in the project directory with the new columns appended which should now be visible by the API. The user must specify either a categorical or numeric variable, specifying a dictionary for mapping values or bins for discretizing values respectively. The new features file is saved as a csv in S3.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_file_name |
str |
the name of the new feature file |
required |
remappings |
list |
A list of dictionaries indicating how to perform the remapping. example: [{ 'old_feature_name': 'Cluster', 'new_feature_name': 'ClusterNew', 'type': 'categorical', 'value_map': { 'TO': ['t','o'], 'PAQ': ['p','a','q'] } }, { 'old_feature_name': 'v6MonthTotalUnits', 'new_feature_name': 'v6MonthTotalUnitsNew', 'type': 'numeric', 'value_bins': [0, 30, 100, 1000, 10000] }] |
required |
Exceptions:
Type | Description |
---|---|
NoFilesFound |
Raised if the features file cannot be found in S3 |
FeatureTypeError |
Raised if the indicated variable type does not match the data type in the file. |
DiscretizationError |
Error raised if there are issues descretizing a numeric variable |
Returns:
Type | Description |
---|---|
str |
Confirmation message with containing the filename of the new features file. |
adele.modeling.liveml2.classes.file.validate.FileValidate (File)
¶
validate(self, rerun=True)
¶
For a file that has been previously tagged, it performs a series of validations.
Exceptions:
Type | Description |
---|---|
TagNotAvailable |
Tag needs to be set before running. |
IdVarNotAvailable |
Id variable needs to be set before running. |
Job¶
adele.modeling.liveml2.classes.job.core.Job (LiveMLCore)
¶
Client within the LiveML modelling framework that deals with the creation and polling of the ECS tasks that train and evaluate xgboost models for a given project and data together with other configuration parameters.
Each task produces a model. Each set of tasks that are created from within a given project, with a given set of data, one or more scorers and one or more sets of dependent variables (each called a solution) is called a job.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment |
str |
Distinguishes between different deployment environments. Currently only dev (development) available. Defaults to None. |
None |
Examples:
Example 1, submit a job.
>>> client = Job("dev")
client.project_id = "P123456"
client.job = my_job
client.submit(payload)
Example 2, poll a job.
>>> client = Job("dev")
client.project_id = "P123456"
client.job_id = "e7818082-b400-46cb-87f9-1761ed1149f0"
client.poll()
results = client.polled_tasks
create(self)
¶
One of the main methods of the job client, generates the job identifier as well as the model identifers that correspond to each of the possible combinations of solutions (dv) and scorers. Possible solutions and modelUIDs are stored as a TaskCreationPayload object. Needs to be called before submit.
Exceptions:
Type | Description |
---|---|
ValidationError |
When an input fails its pydantic validation. |
JobOrProjectIdMissing |
The necessary project and job identifiers are missing. |
submit(self)
¶
One of the main methods of the job client, creates the necessary ECS tasks for a given job (project and set of files), one task per combination of scorer and solution and stores them in the task_status table of the database. To be used as BackgroundTask in the LiveML API, always after calling the create method.
Exceptions:
Type | Description |
---|---|
ValidationError |
When an input fails its pydantic validation. |
SubmitPayloadMissing |
The necessary TaskCreationPayload object which is set in create is missing. |
poll(self)
¶
One of the main methods of the job client, returns status information of the task(s) associated with a given job identifier. It leverages both the information in the database (that the ECS task updates when it starts RUNNING, gets COMPLETE or has an ERROR) and the information stored in AWS in case of AWS infrasctructure errors or undetected error in the ECS task. Client class stores in polled_tasks the list with all the available tasks within the given job id their its status.
Exceptions:
Type | Description |
---|---|
JobNotInDatabase |
The given job identifier is not present in the database. |
ValidationError |
When an input fails its pydantic validation. |
adele.modeling.liveml2.classes.job.summary.JobSummary (Job)
¶
all_jobs
property
readonly
¶
This property getter for the list of jobs reads them from db.
Returns:
Type | Description |
---|---|
pd.Dataframe |
dataframe with all the updated jobs info for the project |
get_summary_and_update(self)
¶
This summary method returns the list of existing jobs together for a project with other project-level job-related metrics. To avoid race conditions if updating on the ECS training tasks, which are likely due to parellization, it calculates all metrics for existing jobs that have not been marked yet as Finished. The only way to mark jobs as Finished is for this same method to have found all their tasks are completed.
Exceptions:
Type | Description |
---|---|
NoJobsAvailableForProject |
Project has no jobs |
Returns:
Type | Description |
---|---|
dict |
Includes a list of jobs with their metrics and some project-level jon summary metrics. |
Model¶
adele.modeling.liveml2.classes.model.core.Model (LiveMLCore)
¶
Core Model-level class that deals with anything at the application level. This includes listing all available projects and managing the benchmark of models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_uid |
str |
model uuid provided by the model client |
required |
Examples:
info(self)
¶
Returns performance metrics, scorecard, P/R, etc..
Returns:
Type | Description |
---|---|
dict |
Data from model_results corresponding to model_id. |
adele.modeling.liveml2.classes.model.package.ModelPackage (Model)
¶
Subclass of Model that deals with creating a deployment package for a given model. Deployment packages are given as a download from a presigned url and include a report together with the model and different ways of deploying it: API in docker locally, API in docker in AWS, local python code. It uses jinja2 to custom templates with each models information.
create_deployment_package(self)
¶
Method for the creation of a deployment package.
Exceptions:
Type | Description |
---|---|
PresignedUrlError |
Error creating a presigned url. |
Returns:
Type | Description |
---|---|
str |
Presigned url. |
adele.modeling.liveml2.classes.model.predictions.ModelPredictions (Model)
¶
Subclass of Model that deals with creating model predictions
prepare_predictions(self, targetfile)
¶
Computes predictions then saved as csv for a given compatible file. Predictions contain columns for ids, class probabilities, and softmax classification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
targetfile |
str |
The name of the file to use as inputs. |
required |
Exceptions:
Type | Description |
---|---|
IVsFileWrongFormat |
Format of target files is not proper IVs format |
Returns:
Type | Description |
---|---|
dict |
background_task_payload to be passed to calculate_predictions_background |
get_training_predictions(self)
¶
Method for returning the training predictions file.
Exceptions:
Type | Description |
---|---|
PresignedUrlError |
Any kind on error related to generating a presigned url. |
Returns:
Type | Description |
---|---|
dict |
A dictionary containing the name of the training predictions file as well as a presigned URL for retrieving the file |
calculate_out_of_sample_predictions(self, ivs, respids)
¶
Background task function to be run in route logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ivs |
pd.DataFrame |
Independent variable. |
required |
respids |
pd.Series |
Description of parameter |
required |
Examples:
>>> client = ModelClient(PostgresConnection, "R111111", model_id: "1b6ace9d-2986-435e-9bff-8a78e3fb52d8")
>>> background_payload = client.prepare_predictions("target_file.csv")
>>> calculate_predictions(**background_payload)
get_out_of_sample_predictions(self, target_file=None)
¶
Method for returning the out-of-sample predictions file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_file |
str |
Filename of out of sample features. Defaults to None. |
None |
Exceptions:
Type | Description |
---|---|
PresignedUrlError |
Any kind on error related to generating a presigned url. |
Returns:
Type | Description |
---|---|
dict |
A dictionary containing the name of the out of sample predictions file as well as a presigned URL for retrieving the file |
profile_predicted(self)
¶
Profile a specified predicted outcome using features information, split by scales and categoricals.
!!!NOTE: This needs to be updated to profile any prediction, not just the training predictions
Exceptions:
Type | Description |
---|---|
FileNotFound |
Error if the client cannot find a particular file |
Returns:
Type | Description |
---|---|
dict |
A profile table contains the profiling variables in the rows and the classes in the columns. This dataframe is converted to a dictionary for the purpose of returning the values through the API. The means/percentages are in each row. |
Project¶
adele.modeling.liveml2.classes.project.models.ProjectModels (Project)
¶
list_models(self, column_list=['uuid', 'created_at', 'email'])
¶
Lists all of the model_ids for the project
run_no_skill_models(self, solution, proportions)
¶
For current project and given solution, generates two no-skill models. One would be a random classifier and the other a majority class classifier. The same evaluation metrics that are calculated for xgboost models are run on them, using the holdout dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
solution |
str |
name of the solution (outcome classes or target column in dvs) |
required |
proportions |
pd.DataFrame |
dataframe with the proportions per class for given solution |
required |
Returns:
Type | Description |
---|---|
list |
list of models which are defined as a list containing model id, solution and performance results. |
check_class_imbalance(self, solution)
¶
For current project and given solution, it checks if the training data outcomes are balanced (defined as if any class has aproportion less than 1/(3*nclasses) of the share).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
solution |
str |
name of the solution (outcome classes or target column in dvs) |
required |
Returns:
Type | Description |
---|---|
bool |
True if the training data is imbalaced, False otherwise pd.DataFrame: proportions for each class |
save_benchmark(self, df)
¶
Given a DataFrame of models that includes no-skill models and skilled ones, this method compares each model with each corresponding (same solution) no skills models and if its avg_pct metric is better than the no-skill equivalents, it gets saved in the benchmark (Benchmark column in database is set as 'Yes' or 'No' if failure)'
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame |
DataFrame of models that includes no-skill models and skilled ones |
required |
get_best_model(self, criteria, solution=None)
¶
Detects the model for a project that has the best model performance, given a precision vs recall criteria, based on average percentile. It also saves in benchmark those models considered better than no skill models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criteria |
str |
scoring criteria to be used for selecting the best model |
required |
solution |
str |
filters best models by solution if specified |
None |
Exceptions:
Type | Description |
---|---|
SolutionNotAvailable |
Given solution must be in the outcomes_info table |
Returns:
Type | Description |
---|---|
dict |
dictionary containing the best model for that criteria and solution together with their performance metrics |
adele.modeling.liveml2.classes.project.flags.ProjectFlags (Project)
¶
toggle_star(self, filename)
¶
Stars or unstars a file in the database
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str |
target file for toggling star |
required |
Exceptions:
Type | Description |
---|---|
UnstarrableFile |
Raised if the target file is not a scale, categorical, features, or outcomes file |
ExistingStarredFile |
Raised if another file of the same tag and project_id has already been starred |
Returns:
Type | Description |
---|---|
str |
Success message stating if the file has been starred or unstarred |
set_winning_model(self, model_id, overwrite=False)
¶
Set a model as the winner for a given project_id.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id |
str |
The model_id to set as the winning model |
required |
overwrite |
bool |
Whether to overwrite an winning_model if a model has already been chosen. Defaults to False. |
False |
Exceptions:
Type | Description |
---|---|
ExistingWinningModel |
Error if a winning model has been chosen for the project_id and the overwrite argument is set to False. |
Examples:
>>> client = ProjectClient('P123456', 'dev', local=True)
>>> client.set_winning_model('019912be-d4bd-43aa-8cd2-23706eddc405', overwrite=True)
adele.modeling.liveml2.classes.project.stats.ProjectStats (Project)
¶
class_counts(self, header, resolve_multiple_files=True)
¶
Reads the class counts for a selected outcome within any file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
header |
str |
Column header to count classes |
required |
resolve_multiple_files |
bool |
Whether to produce a result when the same header is found in multiple files. The resolution is that it picks the first file. |
True |
Exceptions:
Type | Description |
---|---|
NoHeaderFound |
Error if the specified header cannot be found in the outcomes_info given the project_id |
Returns:
Type | Description |
---|---|
list |
An array of dictionaries giving the class name and class count. As an example:
|
cross_table(self, header_1, header_2, resolve_multiple_files=True)
¶
This method creates a cross table given 2 headers for a project.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
header_1 |
str |
The name of the first header |
required |
header_2 |
str |
The name of the second header |
required |
Exceptions:
Type | Description |
---|---|
NoFilesFound |
Raised if there are no S3 files for either of the indicated headers |
CrossTableError |
General error for anything that comes up during cross table calculations |
Returns:
Type | Description |
---|---|
dict |
Dictionary containing fields for the cross table, row percents, and column percents. |
calculate_bias(self, training_filename, outof_filename, scale_filename=None, categ_filename=None)
¶
This method calculates sample bias between the training data and the out-of-sample data to ascertain data quality. We use the classic t-test of the means to assert the p-value of the null hypothesis for scale variables, and chi square for categoricals. Null hypothesis for t-test is that the means are equal, while for the chi-square is that the frequencies are equal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
training_filename |
str |
The training data filename to compare (features file) |
required |
outof_filename |
str |
The out-of-sample data filename to compare |
required |
scale_filename |
str |
Chosen scale file. If None obtained automatically. |
None |
categ_filename |
str |
Chosen categories file. If None obtained automatically. |
None |
Exceptions:
Type | Description |
---|---|
NotBiasFileTypes |
File tags are not correct |
Returns:
Type | Description |
---|---|
List |
list of analyzed variables (dictionary) for scales List: list of analyzed variables (dictionary) for categories |
adele.modeling.liveml2.classes.project.validation.ProjectValidation (Project)
¶
validation_token(self)
¶
Create a uuid token to be used for validation. This can be retrieved from the client to be returned to an end-user to poll for the validation.
validate_all(self, dv_idvar=None, iv_idvar=None)
¶
Runs intra and inter validation on all files associated with a project_id. Writes validation status and problem list into validation_info. Token must be generated and passed to this function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dv_idvar |
str |
Column name for the dependent variable. Optional (default None) |
None |
iv_idvar |
str |
Column name for the independent variable. Optional (default None) |
None |
Examples:
>>> client = ProjectClient('P111111', 'dev', 'liveml', local=True)
>>> iv_idvar = 'LRW_ID'
>>> dv_idvar= 'Respondent_ID'
>>> client.validate_all(dv_idvar, iv_idvar)
get_validation_status(self, token=None)
¶
Retrieves the validation status from validation_info of a given validation task for the project. Return includes the pass/fail result, along with the list of problems.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token |
str |
Token for the validation attempt. The value in self.token takes precedence. |
None |
Returns:
Type | Description |
---|---|
dict |
Contains token, the progress status of the validation, the validation result, and a list of problems |
Validation¶
adele.modeling.liveml2.classes.validation.core.Validator
¶
Validator class is the parent Validator class where the generic methods for intravalidation (validation of the given file as its given type without any exterior context) and intervalidation (validation of the given file against other existing files of different tags with which certain requirementsmust be met).
intra_validate(self, func)
¶
Generic method for intravalidation, checks for prior existing validation in the db and if so, for the rerun parameter to be set as True to perform it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
function |
Function implemented in the children validation classes that includes the different intra validation rules that the file has to follow and returns the errors. |
required |
Returns:
Type | Description |
---|---|
list(str) |
List of database related error that may have stopped the validation from being adequately executed. |
inter_validate(self, func, targettag)
¶
Generic method for intervalidation implemented by the children Validators, given a file and type of file to be validated against, gets pairs of files to be validated, checks for prior existing validation in the db and if so, for the rerun parameter to be set as True to perform the validation for that pair. This method might be called more than once within the children Validators if more than one kind of file is susceptible to paired validation with the given file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
funcion |
Function implemented in the children validation classes that includes the different inter validation rules. |
required |
targettag |
str |
The types of file the given file is going to be validated against. |
required |
Returns:
Type | Description |
---|---|
list(str) |
List of database related error that may have stopped the validation from being adequately executed. |
adele.modeling.liveml2.classes.validation.features.FeaturesValidator (Validator)
¶
Interfile checks would be a “joinability” check against outcomes, and “exists” against scales and categoricals. However, we will consider ‘features’ as the base type and those checks will only be performed against it. Intrafile check is that every class for minimum missingness.
adele.modeling.liveml2.classes.validation.outcomes.OutcomesValidator (Validator)
¶
Intrafile check is that every class for every outcome has a minimum base size. Joinability checks for existing matching ids in both files, using a matching threshold (project parameter). It also looks to guarantee a minimum amount of data points (project parameter).
update_outcomes_table(self, rerun=True)
¶
Inserts headers into outcomes_info table according to their project_id and filename.
adele.modeling.liveml2.classes.validation.outofsample.OutofValidator (Validator)
¶
Internal checks are the same as features. Interfile check is a check against scales and categories in the same manner as they do with features files
adele.modeling.liveml2.classes.validation.categoricals.CategValidator (Validator)
¶
Interfile check is an “exists” check against features file. Intrafile check is that it’s a single column with regex on Vvar in the column.
adele.modeling.liveml2.classes.validation.scales.ScalesValidator (Validator)
¶
Interfile check is an “exists” check against features file and that selected variables can be typed as int or float by python. Intrafile check is that it’s a single column with regex on Vvar in the column.
adele.modeling.liveml2.classes.validation.weights.WeightsValidator (Validator)
¶
Intravalidation checks for no missing values, two columns, id_var present, all weight values must be numbers. Intervalidation with outcomes file checks that every record within the outcomes file has a corresponding weight. *We are assuming structure is first column is id second is weights, do we have a fixed var name?.