Added the possibility to filter, compare and indicate missing values in different bricks
Now it is possible to work with bricks even in the case when the data has missing values - we automatically diagnose and handle the missing values issue considering the function to be performed:
Filter Rows - added the possibility to filter by missing or values (remove or select the rows with missing values)
Compare Data - added the possibility to compare the cell value with the not-available value or infinite value
Machine Learning and Transformer bricks (Logistics Regression, Linear regression, K-Means, DBSCAN, Isolation Forest, Anomaly Detection, Dimensionality Reduction) - missing values treatment was added to the bricks. The strategy of the treatment depends on the data
Improved search through the list of bricks
Now it has become easier to find the brick because we've added keywords to the search. You can enter "cleansing" to find Missing Values Treatment brick, or "scaling" to find Normalization brick, or just enter PCA to find Dimensionality Reduction brick.
Improved notifications tab
We've moved all notifications to the separate tab in the right sidebar, now it's easier to navigate through them. All notifications are sorted by severity, errors are always on the top of the list.
Improved scene editor
Finally, you can select, copy and paste multiple bricks on the scene. You can select bricks by holding the shift button on your keyboard or just frame them with your mouse.
Data Segmentation Brick
Added functionality for the out-of-the-box data segmentation, which provides the cluster analysis results from the raw data, without the necessity to create the data preparation pipeline. Data Segmentation brick reproduces
data cleansing → feature engineering → modeling
pipeline and returns the data segmentation model and the data processing scenario, which can be implemented as a Datrics pipeline. Data processing scenario includes data cleansing, encoding, missing values treatment, and feature selection. The prepared features are used for the fitting of the K-Means clustering model with the optimal number of clusters. Brick supports simple and advanced modes. Simple mode is the completely automated mode, which includes the detection of the optimal number of clusters and feature engineering without the user's involvement, but in the advanced mode, the user can configure the model's hyperparameters manually.
SSL termination support in databases
Now you can use SSL termination to connect to your databases. Just attach certificates when you create the new data source connection.
New operations in math formula
A new operation has been added to the math formula - now we can construct the complex conditions using AND and OR and NOT logical operators, and the strings processing becomes more flexible.
Binary classification models improvements
Added thresholds to binary classification models. Now the cutoff point can be changed from the default 0.5 value. This threshold is used to determine the affiliation with a positive class based on the predicted probability. The threshold is taken into account when generating all applicable model performance metrics and visualizations.
Added Model Scores Distribution dashboard to the Model Performance tab for binary classification models.
Auto ML Bricks: Time Series
Added TimeSeries Forecasting brick which supports stratification and may be used for the time series forecasting without complex settings. Time Series Forecasting brick provides the possibility to train and apply the forecasting model based on the analysis of historical time-series data with the inline capabilities of its preprocessing. Time Series Forecasting brick performs the analysis pipeline that consists of three stages - Time Series feature extraction, which includes Time Series feature extraction, Time Series Preprocessing, and Model fitting and applying. First, we detect the time-series features like a trend, seasonality, and data logging frequency, including detecting the features that might be considered additional regressors. Next, we perform the preprocessing of time-series data - outliers and missing values treatment, denoising, and discretization. And finally, fitting the stratified forecasting model and making the forecasting. Brick has two modes of usage - simple and advanced modes. In simple mode, the data preprocessing and the model hyper-parameters settings are performed automatically based on the dependencies extracted from the time series, the user should define the target and date-time variables only. In the advanced mode, the user can configure the brick with all advantages of the simplified mode, but without its limitations - we provide a very flexible combination of the manual and automatic configurations that allows introducing the expert knowledge to the time series processing pipeline. Time Series forecasting brick is equipped with a Forecasting Dashboard, which provides a detailed description of the time-series processing stages and the forecasting results.
Updated Model Performance Dashboard for Regression and Clusterization
We have improved the models' interpretability for the binary classification via extending the Model Performance dashboard with the Model Score Distribution plot. The new plot depicts the distribution of the output scores per target clases, including probability density function, and range- and quantile-based discretization plots, which reflect the share of the class items that took the specific score range.
Pivot Spreadsheet Brick
The Datrics data processing section was extended with the Pivot Spreadsheet brick that provides a possibility to reorganize and summarize the input data using the table of grouped values that aggregates the items of an input dataset within some categorical values.
New Visual CSV Uploader and Editor
Now you can upload CSV, XLS, XLSX files with a new intuitive, visual user interface.In the new uploader interface users can:
We've made great speed improvements and now your pipelines will run 5x faster, we've also made a lot of improvements to the user interface so that it becomes easier to create pipelines. For advanced users, we've implemented several new bricks: dimensionality reduction, binning without a target, and new encoding.