What is ML manager service?
Previously, you have identified a business use case that requires you to create a machine learning model, and you have transformed your raw data into a consumable format for your machine learning algorithms. Because your goal is to predict breakdown probability, you will need to train a binary classification model. To do this you will need to perform the following tasks:
- Identify which machine learning algorithms to use
- Search for the optimum model parameters
- Make sure your model is robust and do not overfit, and
- Search for the best model among all the algorithms available.
All these tasks can be performed using ML manager service that will allow you to build predictive models without coding. With this service, you can train and tune machine learning models, evaluate model performance, and deploy trained models into production. ML manager provides you the functionality to create multiple projects and perform multiple experiments using multiple algorithms.
How can ML manager service simplify your work?
ML manager simplifies your work by automating the model development processes under the hood. It takes care of your model validation, hyperparameter tuning, model selection, and model deployment without coding. This automation will shorten your development time, minimize errors, and speed-up your business time-to-value.
In ML manager users can also collaborate in a project to improve productivity and minimize work redundancy.
ML Manager Service Capabilities in ZX AnalytiX
Fig 1. ML Experiments Features in ZX AnalytiX
- Guided model development processes: validation, tuning, model selection
- Guided feature engineering and feature selection
- Train machine learning models without coding
- Support multiple algorithms for supervised learning use cases
- Tracking of model development results and versions
- Integrated with deployment service to automate ML prediction
- Support collaboration to improve productivity
How do you perform ML experiments?
To start your experiments, first you need to create a project workspace. This workspace will group all of your experiments and test runs in one place. On each experiment, you can select multiple machine learning algorithms to fit to your dataset. The detail experiments flow is as follows:
Create a new experiment inside the project workspace
Fill-in some basic information of your experiment, i.e. name, description, dataset, and model types (regression, binary or multiclass classification).
Set your experiment configuration and parameters
- Split your data for training and validation. ML manager service supports the train-test split & time-dependent split out of the box. Testing your model performance outside of the training set will minimize the overfitting risk.
- Hyperparameter tuning. ML manager service supports the functionality to find the optimal model parameters. The tuning process can be done using train-validation or cross-validation. The train-validation only evaluates each combination of parameters once, as compared to k-times in the case of cross-validation.
- Select model performance metrics. Metrics are used to evaluate your model performance and are grouped by model type. You can select one of the metrics as reference to choose the best model during the hyperparameter tuning process.
- Automated feature engineering. This (optional) step will automatically add additional columns to your dataset using selected processing, i.e., non-linear transformations, Principal Component Analysis, and scaling transformations.
- Automated feature selection This (optional) step will automatically select feature columns to include in the model training by looking at the feature correlations or the significant relation between features and target variable.
- Select algorithms to train. In this step you can select multiple algorithms to train, and set the parameters that you would like to tune. In a random forest algorithm, you can choose to tune the following parameters: number of trees, tree depth, feature subset strategy, subsampling rate, etc.
- Compare multiple algorithms and runs. ML manager service records all your experiments and model runs. This will help you to easily track your experiments and find the algorithm with the best performance.
Below is the visualization of the end-to-end structure we have created by far. From data integration to data pipeline, then ML pipeline (experimentation), and finally pipeline deployment.
Fig 2. ML Experiments Result in ZX AnalytiX