The aim of this article is to provide an overview of the tools currently available in Spotlight to perform transformations on data.
Spotlight makes available a range of tools, needed to transform the data into a suitable form for analysis. A full list of the tools is visible by pressing the button "Add Operation"
- Filter, applied to a column allows extracting only those records that satisfy the filtering condition. The operation expects in input the column, the operation, and a value to compare to.
- Replace creates an additional column where the original value, typically a substring, is substituted by another one provided by the user.
- Split, as the name suggests splits a string into an array of substrings. It takes as arguments the column name/s to be generated, a separator, and the limit of the results to keep.
- Extract, takes out a substring of an existing value. It requires as arguments an index where to start the extraction and length of a subsequence.
- Blend, joins two datasets based on matching values on one or more columns. The types of join provided are: inner join, left outer join, right outer join, and full outer join.
- SQL, This feature enables users to write SQL queries in order to receive the necessary information.
- Manage Columns, enables users to chose the columns needed in a dataset.
- Explode JSON Array, extracts the elements of a JSON array and puts each one of them in a separate row.
- Expand JSON Object creates new columns -- one for each key.
- Extract JSON Object enables to chose only specific elements to be extracted.
- Derive Column creates new columns based on existing ones. Typically new columns contain the result of a formula or aggregation functions