Goal
This guide will walk you through an example using the Union feature in Datameer. This feature is used for combining similar data from different data sources into a single worksheet.
Sample data
Download these sets of sample data to follow along with the example.
Learn
Upload the data sets
Upload the three data sets using file upload.
All three files can be uploaded as CSVs.
All files have the column names included, as well as a " quote character.
For DeviceData_2012 and DeviceData_2013, change the DeviceAge column type to INTEGER.
Click through until you can save your files. Name your uploads DeviceData_2012, DeviceData_2013, and DeviceData_2014.
Explanation of a union
A union in Datameer is an easy way to bring two or more similar datasets together to create one long dataset. Unlike a join, a union is not meant to bring in extra columns that do not exist (making your data wider). A union allows you to bring together separate files that have the same type of data. This works well when you have similar data sets living in different locations.
Deciding to use a union
A union requires two or more similar datasets. In this example, you have 3 similar datasets for device data. DeviceData_2012 and DeviceData_2013 have exactly the same columns, but in DeviceData_2014, a new column has been added to find out of the device is currently active. These data sources contain (mostly) the same data, but you want to build an analysis using all three. A union is perfect for this.
Using a union
Create a workbook with all three of the sample data sets. To do this, open up a new workbook.
Select all three of the sample data sets and then click Add Data.
All three datasets will now show up in your workbook.
To begin linking these three data sets into one worksheet, click the Union button on the tool bar or select Union from the Data menu.
The union wizard will pop up. Select DeviceData_2012 in the first drop down and DeviceData_2013 in the second drop down. Then, click the plus button (+) next to the DeviceData_2013 drop down. This will allow you to add DeviceData_2014 to the union.
Click Create Union Sheet. Now you will see all 3 data sets combined in one worksheet. Now that you have a combined union sheet, you can start building out an analysis based on this data.
You will also notice that there are blank, or null, values for the data records that did not include IsActive data. Datameer allows you to keep this column for the data that is wider, and fills the other values with null. |
“Got a question? Have an answer? Join the Datameer Community!” |
Comments
0 comments
Please sign in to leave a comment.