Data Staging

Comments

1 comment

  • Ben Weiss

    Hi Hani,

    Thanks very much for your interest in Datameer.

    Short answer to your question is that no, Datameer does not stage data to another environment. We execute our calculations directly in Hadoop. You don't need a shadow cluster of servers with Datameer. A longer answer follows.

    To clarify further, I'll define some terms from the point of view of Datameer and then go into details.

    We view the analysis phase as an activity that happens in the Datameer workbook in which data sets are analyzed via Datameer functions. The visualization phase happens in Datameer charts and graphs.

    When users build analyses in a Datameer worksheet, they work with an intelligent sample of the data in order to get immediate response. For actual calculation of the workbook when the user is ready to run it against the full data set, all analyses run directly in Hadoop via optimized MapReduce using DAG containers. This way you do not need to duplicate your data on to other servers. It also allows Datameer customers to scale as large as their Hadoop cluster can scale. No memory limitations. Further, see Datameer's Smart Execution to learn how Datameer has an industry leading approach that avoids locking you in to a particular execution framework: http://www.datameer.com/documentation/pages/viewpage.action?pageId=55217867

    Datameer visualizations operate on calculated Datameer workbooks. That means visualizations render immediately: no calculation is taking place. You can configure your visualization to work on the latest calculation of a workbook so that it is always up to date. OR you can tie your visualization to an earlier calculation of a workbook so that the visualization never changes. If you choose to keep a number of previous calculations, it allows you to create visualizations with each graph showing the same data set at a different point in time.

    Regarding Spark capabilities. Today Datameer can import and export to Spark deployments via Spark SQL. See http://www.datameer.com/documentation/current/5.10+New+and+Noteworthy#id-5.10NewandNoteworthy-Connect,Import,andExportwithSPARK! And Datameer is building out additional Spark capabilities. Stay tuned to www.datameer.com in the coming months to learn how that plays out!

    In summary, Datameer's sampling approach provides for immediate feedback as you build analyses. Datameer's Smart Execution avoids lock-in to a specific execution framework, and gives you great performance today. Datameer Visualizations render immediately. And Datameer today supports import and export to Spark.

    Ben

    0
    Comment actions Permalink

Please sign in to leave a comment.