Unlock the Power of Custom Features, Cloud Storage Export, JSON Transformation, and More
Datrics has a new update that lets do more cool stuff for data analysts with no or low-code! You can use non-data inputs/outputs in custom bricks, load python libraries from git, export data to AWS S3 and GCS, and transform json-like data into a dataframe in one step.
Let's dive deeper.
Low code data science platform
Datrics gives new capabilities for data engineers, analysts and scientists to expand the no-code platform with custom features. Our objective is to equip data scientists with appropriate tools that enable them to leverage their expertise and tackle intricate data analytics tasks and machine learning, without any constraints imposed by the platform. Let’s go through new features and how they may help your team.
Reusable custom bricks with custom types
We are adding the capability to use non-data (custom) inputs/outputs for custom bricks. This seemingly small feature provides the opportunity to create custom bricks for specific features and tasks and reuse those bricks in multiple steps. Let’s look at the example.
The goal is to add SVM classifier to Datrics and perform predictions using the model. With the standard approach to custom brick, data scientist would fill in all the functionality in one brick - importing the model, training and predicting. This limits the possibilities to reuse the brick in a different pipelines and different use case.
It seems more efficient to create two bricks - to train the model and the other to perform predictions. This way each brick solves the particular problem and can be used separately in a variety of pipelines. In particular example, it becomes easier to train the model on one data set and use different data set to test the quality of predictions.
Moreover, creating separate custom bricks for each task enlarges the pool of features data analysts and data scientists may use in their ETL pipelines or data experiments.
Load Python libraries from git
In one of the previous Datrics product updates, we have introduced the capability to load external python libraries from pypi. It eliminated the constraints of the no-code platform with pre-installed Python libraries, providing users with the ability to load supplementary resources.
We go one step further and add the capability to load libraries from git. It works in a similar way to loading resources from pypi.
Go to the Libraries tab in Custom brick editor
Select library source git, set thename and provide url
Press Add. We will attempt to access and install the provided resource.
Once installed, library may be used in the custom brick to create cool features in Datrics.
Export data to AWS S3 and Google Cloud Storage
There is new export connector AWS S3 and Google Cloud Storage coming to Datrics. The connector supports export to csv and parquet file formats. We also provide the possibility to export the dataset into multiple parquet files using the data in the partition columns as the key.
Here is how to set up the export brick:
Create the data source to your object storage in the Datasets tab.
In the pipeline scene, add the Export Data brick and select the data source.
Define the file format - csv or parquet
Define the target file path:
For csv set the full path to the file.
For parquet there are two options - static or dynamic path. Static path - is the full path to the file Dynamic path consists of: prefix - the static part of the path, partition columns - key defining how to split the dataset into the files. As the result, folders with files will be created: prefix/partition=value/uuid.parquet
Parse JSON - new brick to transform json into the dataframe. There are 3 json types supported: default, split, and index.
Confidence intervals in the Prophet model brick
There is an important update to the Prophet time series forecasting model in Datrics no-code platform. We are adding the confidence intervals for the prediction to the output dataframe of the brick. The confidence interval will also be displayed in the model performance dashboard. This advancement gives more options to work with the predictions.