MapReduce framework setting DM version 6.1.8

Comments

4 comments

  • Konsta Danyliuk

    Hello Mattijs,
    Default execution engine for Datameer 6.x is Tez. MapReduce still works, but it is marked as deprecated in Datameer 6.1.x. We plan completely remove it in future major release, to let our customers work with more powerful and convenient Tez and Spark.

    In order to warn users that MapReduce engine is deprecated in Datameer 6, we've introduced below warnings. There is no way to switch this notification OFF.

     WARN [2017-02-19 16:38:53.007] [JobScheduler thread-1] (JobScheduler.java:448) - ============================================================
    WARN [2017-02-19 16:38:53.007] [JobScheduler thread-1] (JobScheduler.java:449) - == Deprecation warning: This job is running on MapReduce, which got deprecated with Datameer 6.1
    WARN [2017-02-19 16:38:53.007] [JobScheduler thread-1] (JobScheduler.java:450) - == and will be removed in future versions of Datameer. Please remove the property
    WARN [2017-02-19 16:38:53.007] [JobScheduler thread-1] (JobScheduler.java:451) - == 'das.execution-framework=MapReduce' from your Hadoop configuration to run on a current execution framework.
    WARN [2017-04-19 16:38:53.008] [JobScheduler thread-1] (JobScheduler.java:452) - ============================================================

    Do you face any problem with Tez engine in your environment or are there any restrictions to use MapReduce only?

    0
    Comment actions Permalink
  • Suurland, Mattijs M (NTR)

    Hallo Konsta,

    Thanks for quick reply.

    What I would expet from DM howerever is that it would continue to support the execution frameworks in place with the HDP distrbutions.

    IMHO: it would have been enough to mention it only the first time a job is scheduled. Like mentioned one tends to ignore default warnings and one can miss out on the important functional warnings. But your answer is clear, thanks for that.

    Reason why we use Mapreduce instead of TEZ has to do with (Python) models running on our cluster; we find that the results with TEZ are unreliable, so we aboonded TEZ and switched back to Mapreduce.

    Best

    Mattijs .

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Mattijs,

    I understand your concern regarding MR vs Tez for your existing jobs.

    Datameer is just a job compiler and it doesn't matter what execution framework will you choose MR, Tez or Spark - it will compile a job accordingly, including all required libs and send it to cluster for execution. There might be differences in job performance (execution time) among different engines, which most likely depends on data volume and cluster configuration (e.g. resource distribution or security constraints), but as soon a job is completed you will get your results.

    0
    Comment actions Permalink
  • Alan

    Hi Mattijs,

    Sorry to hear the transition to 6.1 hasn't been smooth for you.

    Unfortunately, in 6.1 the MapReduce framework was deprecated.  This is why the warnings are present.  It is not possible to disable them.

    The correct solution is to use Tez as the execution framework.  Tez is more performant than MapReduce, so it's a win-win for everyone at the end of the day.

    Alan

    0
    Comment actions Permalink

Please sign in to leave a comment.