How to generate a unique row number
We do not have any columns to uniquely identify a record in a workbook and so we have a need to generate a row number(unique) for every record in a workbook.
Thanks for the help.
-
Official comment
This should help you generate unique row numbers for each record:
1) Create a new sheet in the workbook.
2) In the first column, add the function GROUPBY(1)
3) In the second column, add the function GROUPROWNUMBER()
4) Add any desired columns after these first two columns have been established.
Note that using this behavior forces the processing of this worksheet to all occur in a single reducer. This is the only way to ensure that the row numbers are consistent. This may have a significant negative performance impact when generating the results.
Hope this helps!
Comment actions -
Datameer with dynamically control the number of reducers for the specific job. We generally recommend that you don't manually adjust this value.
That said...
As listed on our documentation, you can control the number of reducers with a custom Hadoop property: https://documentation.datameer.com/documentation/display/DAS50/Custom+Hadoop+Properties
Specifically with:
das.job.reduce.tasks=NumberOfReducers
Hope this helps.
Please sign in to leave a comment.
Comments
6 comments