Once you have decided on a focus point from your data, finding outliers is a simple and repeatable process using Datameer.
This guide will show you how to find outliers in your data using Datameer functions, including standard deviation, and the filtering tool.
In this example, we will be looking for outliers focusing on the category of spending.
Download the sample data and try it yourself!
Download sample file: CreditCardData.csv
Next, upload that file to Datameer.
1) Set a Baseline
Open a new workbook and add the sample data.
Create a new sheet in your workbook and name it Baseline.
Because we are looking for outliers by category, we will create our first GROUPBY on the VendorCategoryCode data from our source:
Finding the Average is a vital step to determining a baseline, here’s your next column:
Finding the Standard Deviation will play a key role in our definition of an outlier:
2) Compare Averages
We want to compare original data with our baseline, so we have to join those sheets in order to reference information from each; here’s how:
Create a joined sheet by clicking on the Join Sheets button in the tool bar.
Preform an inner join based on the VendorCategoryCode from the source sheet and the and the Category grouping that we created on the new Baseline sheet.
3) Define Outliers
Though there are many ways to do this including a new sheet with mathematical functions, using advanced filtering keeps your workbooks clean and efficient. Here’s how:
Create a filter on the join page and use the Advanced Filter setting.
Open the filter dialogue and limit the results based on this simple equation:
Is the amount minus the average more than twice the size of the standard deviation? If so, then it's an outlier. The remaining results will represent the instances in which the filter is true!
Full expression: #CreditCardData.AmountUSD-#Baseline.AverageSpend>2*#Baseline.SpendStandardDeviation
If you used our data, run the workbook. And you can look at the results.