Intelligent applications are only as valuable as the data they act upon. Cleansing, merging and combining data sources are among the biggest obstacles faced by data scientists in their day-to-day work. Data transformations are a critical step in any machine learning exercise allowing for a better understanding of the data and the creation of more predictive models.
Ayasdi has been expanding its Transformation Services in order to both facilitate and expedite its data ingestion process. In Release 7.9, Ayasdi added Null Imputation, Z-Scoring, Joins, and In-Source Transformations to its rapidly growing list of supported Transformation Services:
- Null Imputation Transformation – specifies replacement of null values
- Z-Scoring Transformation – specifies standard scaling of data column values
- Joining Multiple Datasets –joins multiple data sources together
- In-Source Data Transformation – option to perform transformations directly on the source data
Already available Transformation Services prior to Release 7.9:
- Group By Transformation – specifies congregation of data rows
- One Hot Encoding Transformation – converts categorical values to binary (on/off) value columns per categorical value
- Square Transformation – squares a value on a column
- Date Time Transformation – creates 10 new columns, slicing the date value into various aspects
- Lag and Lag Difference Transformations – specifies a lag or lag difference amount to apply to a column
- Logarithm Transformations – converts values to their log equivalent values for help in making highly skewed data less skewed
- Prepend Transformations – prepends a literal string to the value, transforming the new value into a string data type
- Text Feature Extraction Transformation – extracts features from unstructured text
- Transpose Transformations – transposes a file’s rows and columns
Ayasdi Transformation Services are available through the Ayasdi Python SDK and can be built into Envision applications. Full documentation for the Ayasdi SDK can be found at https://platform.ayasdi.com/sdkdocs/
The following reviews the new Ayasdi Transformation Services recently released with the Ayasdi Python SDK Version 7.9.
Null Imputation Transformations
The Ayasdi Python SDK now supports Null Imputation Transformations, which enable the conversion of null data values into a statistical calculation or a value provided by the user. With NullImputationTransformationStep, null values can be replaced with the average mean, minimum, max, median value of the original column. This is helpful since often a column will contain null values that would be better for analysis with a real value and the required transformation can be done automatically. For example, the null values might be set to the value 0, the mean, or the median values. A determination of whether nulls should be imputed or not depends upon the context. If you are not sure how to proceed, drop us a note on email@example.com and we will be sure to respond.
Z-Scoring Virtual Transformations
The Ayasdi Python SDK now supports standard scaling of a data column, or Z-scoring. Z-scoring is a standard practice of transforming numerical columns prior to applying any machine learning method. This transformation is especially useful when columns have different scales and the user would like to scale them for effective comparisons. The new StandardScalingTransformationStep method provides two standard scaling options: standard deviation (relative weight) and mean.
Ability to Join Multiple Datasets
The Ayasdi Python SDK now supports the ability to join multiple datasets, which is a key feature engineering ability. Ayasdi’s Machine Intelligence Platform Release 7.9 has added infrastructure to support merging of data from different sources for combined analysis. The new JoinTransformStep merges a primary source with any number of secondary sources.
In-Place Source Transformations
Prior to release 7.9, the Ayasdi Machine Intelligence Platform always created a new source for transformations and, therefore, a new associated Topological Network, which would not have any previously applied Comparisons, Row Groups, Column Sets, or Colorings. Release 7.9 now supports calling the ApplyTransform function without specifying a new_source_name, supporting the addition of newly transformed columns to the existing source. If the new_source_name is set to None or left blank, the new transformed columns will be added to the original source.
Transformations are critical elements. Not only do they add background experience to the data, they facilitate its understanding. Keep checking back as we will continue to roll-out new Transformation Services in future releases.