Datameer acts as a job compiler or code generator like Hive. This means every function, filter or join that the user designs in the spreadsheet will be translated into native Tez code. Tez is great for splitting up workloads into smaller pieces. To do so, Datameer compiles a job for a Hadoop cluster, where it is sent to be executed. After the job is compiled and sent to the cluster Datameer does not control job execution, and can only receive the telemetry metrics provided by the cluster's services. The job will run with any scheduling settings and use resources granted by the scheduler.
All users working with Datameer's Excel-like User Interface (UI) are generating a Java program for distributed computing on the cluster backend. This high level of abstraction is one of the key features that makes Datameer such an outstanding technology. However, this approach does mean that business users need to keep in mind the types of problems every programmer deals with, i.e. data types, memory, and disk usage.
This separates analytics work into two stages. First, the design/edit time and second the execution/runtime of a data link/import job/workbook. Both stages are located on different parts within your distributed computing system (cluster).
The first stage is served on the Datameer application server, running the Datameer service Java Virtual Machine (JVM), started and executed under the Datameer service account user. Depending on your configuration and if (Secure) Impersonation is configured or not, calls are made from <datameerServiceAccountUser>@<datameerHost> or <loggedinUser>@<datameerHost>.
The second stage is served on random DataNodes (DN) in the cluster. The DN is running the container JVM, started by the ApplicationMaster (AM) and executed under the YARN service account user. Depending on the configuration and if (Secure) Impersonation is configured or not, calls are made from <yarnServiceAccountUser>@<dataNode> or <impersonatedUser>@<dataNode>.
Since this description is on a very high level, you may read further books about the technology and framework like Hadoop: The Definitive Guide, 4th Ed. by Tom White and Hadoop Security, 1st Ed. by Ben Spivey and Joey Echeverria.
Please sign in to leave a comment.