How to generate a unique row number

Comments

6 comments

  • Official comment
    Saurabh Agashe

    This should help you generate unique row numbers for each record:

    1) Create a new sheet in the workbook.

    2) In the first column, add the function GROUPBY(1)

    3) In the second column, add the function GROUPROWNUMBER()

    4) Add any desired columns after these first two columns have been established.

    Note that using this behavior forces the processing of this worksheet to all occur in a single reducer. This is the only way to ensure that the row numbers are consistent. This may have a significant negative performance impact when generating the results.

    Hope this helps!

    Comment actions Permalink
  • Saurabh Agashe

    Thank you for the solution and the heads up. Can you please let me know how do I control the number of reducers from datameer side?

    0
    Comment actions Permalink
  • Saurabh Agashe

    Datameer with dynamically control the number of reducers for the specific job. We generally recommend that you don't manually adjust this value.

    That said...

    As listed on our documentation, you can control the number of reducers with a custom Hadoop property: https://documentation.datameer.com/documentation/display/DAS50/Custom+Hadoop+Properties

    Specifically with:

    das.job.reduce.tasks=NumberOfReducers

    Hope this helps.

    0
    Comment actions Permalink
  • Saurabh Agashe

    Thank you. But this setting will be applicable for all jobs running during that time. Right ? Is there a way to restrict these settings for a specific job ?

    0
    Comment actions Permalink
  • Saurabh Agashe

    You can add Custom Hadoop Properties in the save/configure dialog for each individual job, as opposed to applying properties cluster wide.

    0
    Comment actions Permalink
  • Saurabh Agashe

    Thank you!

    0
    Comment actions Permalink

Please sign in to leave a comment.