Datasets governance and Advanced settings for LGBM and XGBoost models

We are happy to bring updates to Datrics analytics platform. In this release we have enhanced the datasets and data sources management, and added monotone constraints to the LightGBM and XGBoost algorithms. The updates are intended to improve user control, flexibility, and cooperation on the platform, as well as to enhance interpretability, improve generalization, and facilitate faster convergence of LightGBM and XGBoost models.

Datasets management

This update brings several features and improvements that enhance the control and flexibility users have over their datasets. Let's explore the details of each user story included in this release.

1. Datasets access rights

In this update, we have implemented a robust rights management system for datasets. Users now have the ability to define roles for viewers and editors, allowing for more controlled access to datasets. Edit access is only available to the dataset creator, while any user can view the settings and data loaded.

2. Data source in the group

‍We have introduced a new feature that allows the management of data sources at the group level. Data sources is created, edited at the group level, which means you do not need to connect to your data warehouse or object storage for each project. Data sources created previously in the project are migrated automatically to the group.

3. Duplicating datasets

‍Datrics allows users to reuse same dataset in many pipelines, it’s useful when one has multiple data flows with the same data. You can now duplicate your dataset in the same of different project. You can create a copy of a dataset from the editor view by creating a new version, or via the context menu action.

The copying functionality has been enhanced to handle different scenarios:

Duplicating dataset as an asset. A new dataset asset is created with a unique name, and it remains connected to the same data source.
Copy pipeline with datasets to the different project. New dataset asset will be created in the selected project, the asset will be connected to the same data source as the original.

‍

Monotone constraints

Monotone constraints play a crucial role in XGBoost (Extreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine) algorithms. These constraints ensure that the model's predictions maintain the same directionality as the corresponding feature values, thereby capturing meaningful relationships between the features and the target variable.

Let's explore their importance in more detail for both XGBoost and LightGBM.

Here are key benefits of using monotone constraints for gradient boosting frameworks:

Enhanced Interpretability: By enforcing monotonicity, the model's predictions become more interpretable. For example, if a feature has a positive monotonic relationship with the target variable, an increase in the feature value will be associated with an increase in the predicted target value.
Improved Generalization: Monotone constraints help XGBoost generalize better to unseen data by incorporating prior knowledge about the relationships between features and the target. This is particularly useful when you have domain expertise or prior knowledge indicating a monotonic relationship between certain features and the target.
Faster Convergence: By constraining the model's directionality, XGBoost can converge more quickly during training. The monotonic constraints guide the optimization process towards finding the optimal splits that respect the specified directionality, reducing the search space and improving efficiency.

In the latest release, we have added monotone constraints setting to the LGBM (Binary, Multiclass, Regression) and XGBoost (Classification, Regression). Monotonicity means that as a feature value increases, the target variable should either increase or decrease consistently.

To train the model, add the relevant model brick to the scene and connect to the training dataset. All the models available in Machine Learning section or via search. Monotone constraints are part of the advanced mode settings. Select features that have monotone relations to the target variables of non-increasing or non-decreasing type.

*XGBoost model training with monotone constraints*

‍

In summary, monotone constraints are essential in both XGBoost and LightGBM algorithms as they enhance interpretability, improve generalization, facilitate faster convergence, enable accurate modeling, enhance performance, and reduce overfitting. By leveraging the monotonic relationships between features and the target variable, these constraints enable these gradient boosting frameworks to make more reliable predictions and provide valuable insights for decision-making tasks.‍

All features in the release.

Check out our previous updates

Heading