Unique Record count

Comments

1 comment

  • Konsta Danyliuk

    Hello Ganesh.
    There are several options to get the unique record count number for a dataset accessible via a DataLink. As a DataLink is just a pointer to data one should materialize it at the saved Workbook Sheet. Here are the methods you could use.

    1. 

    • Create a new Workbook based on the DataLink.
    • Use Deduplicating Data instrument to create a new Sheet without duplicated records (based on all or particular columns).
    • Execute the Workbook with the Dedup Sheet kept and check unique records via the Inspector -> Column -> Data Profile.

    2. 

    • Create a new Workbook based on the DataLink.
    • Create a new Sheet.
    • Introduce GROUPBY function for the desired column(s) to suppress duplicated records.
    • Execute the Workbook and check unique records via the Inspector -> Column -> Data Profile.

    3

    • Create a new Workbook based on the DataLink.
    • Create a new Sheet.
    • Introduce GROUPBY function for the desired column(s) to suppress duplicated records.
    • In the next column called Index, add a number, e.g. 1 - this will create a new column with the same value for each record.
    • Create a new Sheet
    • Add GROUPBY function for the column Index and then GROUPCOUNT.
    • Execute the Workbook to get the records to count for the whole dataset.

    Note that the last method might be resource-consuming for the cluster.

    0
    Comment actions Permalink

Please sign in to leave a comment.