Datameer Workbook Best practices

Comments

5 comments

  • Official comment
    Joel Stewart

    Hi Jyoti, I recommend starting with this article: How to Optimize a Workbook 

    If you and your team have other specific questions, let us know and we'll gladly assist. 

    Comment actions Permalink
  • Jyoti Khairnar

    Joel,

    Thanks for the link to Workbook Optimization. But, my question was like, while creating workbooks that has many datasources, with lots of complex joins and calculations, is it advisable to break them into smaller workbooks with individual joins and calculations and join all these small workbooks to create a final Workbook which will be used for reporting purpose? Am I adding any overhead with memory, performance etc on the environment? Does anyone use data modeling, normalization concepts like RDBMS here too? 

    Thanks

     

    1
    Comment actions Permalink
  • Joel Stewart

    Thank you for clarifying Jyoti. There's not a single answer that I can recommend based on your additional context.

    From a teamwork perspective, it may be best to split up the content into smaller workbooks so that intermediate data results are available for other teams to access. 

    From a purely performance perspective, Datameer will optimize the job best if it knows the full pipeline in a single workbook. This can be improved even further by not saving intermediate sheets. 

    Splitting the workbooks does add some overhead to the overall calculations and does force some intermediate results (results from each workbook) to be saved to HDFS. This overhead may be recovered though if it improves reusability for your team. 

    4
    Comment actions Permalink
  • Jyoti Khairnar

    Joel, Thank you so much for your response. This is what I was looking for. Appreciate it!

    1
    Comment actions Permalink
  • Dave Olsson

    Thanks for this thread, I am a new user and had exactly the same questions. As I sandbox my first solution, I'm noticing that the thought process I'm using to develop my answer is reflected in the sheets of the book, my book has a lot of sheets. Usually my first approach isn't always the best and I usually go back and optimize for performance once I know it works. This helps me in knowing that I probably shouldn't split my process apart into separate books, but maintain a linear progression through the sheets.

    0
    Comment actions Permalink

Please sign in to leave a comment.