Inside ZX AnalytiX's Data Wrangling Process

David W. SijabatSr. Product Manager, Advanced AnalyticsMarch 10 2021

WHAT IS DATA WRANGLING?

Suppose you have identified a business use case that requires you to create a machine learning model that will predict asset failure probability in the coming one month period. You have access to the operation dataset which records all the historical maintenance and failures of the asset you are investigating. Besides the maintenance data, you are considering using historical vibration data of the engine to understand the signals prior to the failure events.

You start your journey by performing an exploratory data analysis, and creating data processing steps that will cleanse, combine, and transform your raw data into consumable format required by the machine learning algorithms. These data steps are known as data wrangling processes.

Data wrangling is data steps we perform to improve the quality and value of our data by performing data selection, cleansing, aggregation, merge, and augmentation. The other goal is to re-shape our data as input for advanced analytics tasks such as business intelligence and machine learning.

HOW CAN ZX ANALYTIX HELP?

According to O’Reilly’s 2016 Data Science Salary Survey, 69% of data scientists will spend a significant amount of time in their day-to-day dealing with basic exploratory data analysis, while 53% spend time cleaning their data. These laborious tasks usually require data scientists to write a large and complex code in order to perform these data wrangling processes.

ZX AnalytiX data wrangling service has two main features that can help data scientists in building their data workload, i.e., automated exploratory data analysis (EDA), and data pipeline design canvas. Some of the key benefits of using these two features are:

  • Zero coding environment that simplify and speed-up data wrangling process
  • Fast data exploration using automated metrics calculation and visualization
  • Drag and drop functionalities inside the data pipeline canvas using comprehensive data processors
  • Easy deployment of data pipelines with scheduling capabilities
  • Minimize errors during data pipeline creation using standard data processors
  • Shorten the business use case development time
  • Help data scientists to focus more on problem solving and less on coding

DATA WRANGLING CAPABILITIES IN ZX ANALYTIX

The detail capabilities of our data wrangling service are as follows:

Automated Exploratory Data Analytics

The main purpose of this feature is to help data scientists quickly understand the data they are currently working on by providing pre-calculated information:

  1. Physical data profile which includes:

    • Data value, size, and schema
    • Basic data column statistics
    • Data distribution with visualization
  2. Association between numerical columns using Pearson correlation metric

Data Pipeline Design Canvas

Data Wrangling Pipeline Save End.png

                                                                               Fig 1. Design Canvas of ZX AnalytiX Data Wrangling

The design canvas provides a drag and drop functionalities to create data pipelines with predefined processors:

  • Column transformations
  • Data aggregation
  • Table merges and joins
  • Row filtering and column selection
  • Row deduplication
  • Missing value imputation
  • Numerical column binning
  • Save pre-processed data to a database
  • Custom SQL processor for custom ETL

Other functionalities includes:

  • Execute data pipeline & save the result to destination table
  • Review and update previous data pipeline created
  • Share pipelines for further collaboration with other users

Data Wrangling.png

                                                                                      Fig 2. Pipeline List of ZX AnalytiX Data Wrangling
Take the next step
Get a Demo/Diagnostic for your Organization's Digital Transformation Journey