Some Workbooks not running after update to Datameer 6.3.3
After upgrading Datameer from 6.1.23 to 6.3.3 some Workbooks cannot be processed anymore. There is no problem editing the workbooks in the browser, but when I try to run them on the cluster I get an error like this:
INFO [2017-11-23 12:58:11.501] [JobScheduler thread-1] (JobScheduler.java:412) - Starting job 3386275 (DAS Version: 6.3.3, Revision: 02d087faefb94aca6628b6b6a67ff5ed67833d8c, Hadoop-Distribution: 2.6.0-cdh5.11.0 (cdh-5.11.0), JVM: 1.7)
INFO [2017-11-23 12:58:11.504] [JobScheduler thread-1] (NormalJobDriver.java:124) - Checking if JobExecutionValueObject{_id=3386275} can be started
INFO [2017-11-23 12:58:11.537] [JobScheduler thread-1] (JobScheduler.java:444) - [Job 3386275] Preparing job in job scheduler thread for WorkbookConfigurationImpl{id=1770}...
INFO [2017-11-23 12:58:11.704] [JobScheduler thread-1] (JobScheduler.java:447) - [Job 3386275] Preparing job in job scheduler thread for WorkbookConfigurationImpl{id=1770}... done (0 sec)
INFO [2017-11-23 12:58:11.707] [JobScheduler worker1-thread-24438] (JobSchedulerJob.java:89) - [Job 3386275] Preparing job for WorkbookConfigurationImpl{id=1770}...
INFO [2017-11-23 12:58:11.731] [JobScheduler worker1-thread-24438] (JobSchedulerJob.java:94) - [Job 3386275] Preparing job for WorkbookConfigurationImpl{id=1770}... done (0 sec)
INFO [2017-11-23 12:58:11.739] [JobScheduler worker1-thread-24438] (JobSchedulerJob.java:109) - Starting job ...
INFO [2017-11-23 12:58:11.775] [JobScheduler worker1-thread-24438] (WorkbookJob.java:231) - Registering operations for sheet 'EXOP' (keep=false) with datameer.dap.common.sheet.StaticDataSheetBuilder@e00635e
INFO [2017-11-23 12:58:11.787] [JobScheduler worker1-thread-24438] (WorkbookJob.java:231) - Registering operations for sheet 'EXOP_Dedup' (keep=false) with GroupBySheetBuilder{EXOP_Dedup, source-sheet=EXOP, key-expressions=[GROUPBY(#EXOP!ID)]}
INFO [2017-11-23 12:58:11.787] [JobScheduler worker1-thread-24438] (WorkbookJob.java:231) - Registering operations for sheet 'splitMessages' (keep=false) with FormulaSheetBuilder{splitMessages, source-sheet=EXOP_Dedup, expressions=ExpressionContext[column='ID', id='0', index=0, expression=COPY(#EXOP_Dedup!ID)], ExpressionContext[column='Raw', id='1', index=5, expression=IF(STARTSWITH(#EXOP_Dedup!Messages;"[");EXPAND(JSONTOLIST(#EXOP_Dedup!Messages));#EXOP_Dedup!Messages)], ExpressionContext[column='Message_ID', id='2', index=1, expression=INT(JSON_VALUE(#Raw;"ID"))], ...}
INFO [2017-11-23 12:58:11.790] [JobScheduler worker1-thread-24438] (WorkbookJob.java:231) - Registering operations for sheet 'splitTextBlocks' (keep=false) with GroupBySheetBuilder{splitTextBlocks, source-sheet=splitMessages, key-expressions=[GROUPBY(#splitMessages!Message_ID)]}
INFO [2017-11-23 12:58:11.791] [JobScheduler worker1-thread-24438] (WorkbookJob.java:231) - Registering operations for sheet 'splitPoints' (keep=false) with FormulaSheetBuilder{splitPoints, source-sheet=EXOP_Dedup, expressions=ExpressionContext[column='ID', id='0', index=0, expression=COPY(#EXOP_Dedup!ID)], ExpressionContext[column='Points_Raw', id='1', index=10, expression=IF(STARTSWITH(#EXOP_Dedup!Features_Points;"[");EXPAND(JSONTOLIST(#EXOP_Dedup!Features_Points));#EXOP_Dedup!Features_Points)], ExpressionContext[column='Point_Geometry', id='2', index=1, expression=JSON_VALUE(#Points_Raw;"Geometry")], ...}
ERROR [2017-11-23 12:58:12.748] [JobScheduler thread-1] (JobScheduler.java:885) - Job 3386275 failed with exception.
datameer.com.google.common.base.VerifyException: RecordSource for sheet 'splitPoints' (das.internal.FormulaSheetType) of incorrect type. Is RecordType{[STRING, STRING, INTEGER, STRING, FLOAT, FLOAT, STRING, STRING]} but expected RecordType{[INTEGER, STRING, FLOAT, FLOAT, STRING, STRING, STRING, STRING]}.
at datameer.com.google.common.base.Verify.verify(Verify.java:125)
at datameer.dap.common.job.WorkbookJob.registerJobOperations(WorkbookJob.java:238)
at datameer.dap.common.job.DatameerJob.createExecutionPlan(DatameerJob.java:78)
at datameer.dap.common.job.DasJobCallable.call(DasJobCallable.java:107)
at datameer.dap.common.job.DasJobCallable.call(DasJobCallable.java:69)
at datameer.dap.conductor.jobscheduler.JobSchedulerJob$1.call(JobSchedulerJob.java:110)
at datameer.dap.conductor.jobscheduler.JobSchedulerJob$1.call(JobSchedulerJob.java:77)
at datameer.dap.common.security.DatameerSecurityService.runAsUser(DatameerSecurityService.java:117)
at datameer.dap.conductor.jobscheduler.JobSchedulerJob.call(JobSchedulerJob.java:77)
at datameer.dap.conductor.jobscheduler.JobSchedulerJob.call(JobSchedulerJob.java:40)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO [2017-11-23 12:58:12.782] [JobScheduler thread-1] (JobScheduler.java:960) - Computing after job completion operations for execution 3386275 (type=NORMAL)
INFO [2017-11-23 12:58:12.782] [JobScheduler thread-1] (JobScheduler.java:964) - Finished computing after job completion operations for execution 3386275 (type=NORMAL) [0 sec]
WARN [2017-11-23 12:58:12.787] [JobScheduler thread-1] (JobScheduler.java:782) - Job DapJobExecution{id=3386275, type=NORMAL, status=ERROR} completed with status ERROR.
-
This is the sheet summary of sheet splitPoints from the Browser:
Column 1 : =COPY(#EXOP_Dedup!ID)
Column 2 : =JSON_VALUE(#Points_Raw;"Geometry")
Column 3 : =FLOAT(JSON_VALUE(#Koordinates;"x"))
Column 4 : =FLOAT(JSON_VALUE(#Koordinates;"y"))
Column 5 : =JSON_VALUE(#Points_Raw;"Accuracy")
Column 6 : =JSON_VALUE(#Points_Raw;"Countrywide")
Column 7 : =JSON_VALUE(#Points_Raw;"Country")
Column 8 : =JSON_VALUE(#Points_Raw;"Admin0")
Column 9 : =JSON_VALUE(#Points_Raw;"Admin1")
Column 10 : =JSON_VALUE(#Points_Raw;"Admin2")
Column 11 : =IF(STARTSWITH(#EXOP_Dedup!Features_Points;"[");EXPAND(JSONTOLIST(#EXOP_Dedup!Features_Points));#EXOP_Dedup!Features_Points)
Column 12 : =WKB2JSON(#Point_Geometry) -
Can you share the REST API workbook definition for this job? The command is outlined here: REST API Workbook - Read Workbook
The exception seems to indicate that the source data is not formatted as expected.
-
Here you can get the Workbook:
https://www.dropbox.com/s/gbrcnknekobjeew/workbook.json?dl=0
-
Thank you for sharing Michael. Within the JSON, I could see a different between the columnID and the columnIndex fields for the splitPoints sheet. I manually updated these to match each other and force alignment of these values as a test. This update is available here: https://datameer.box.com/s/qf2s2k7gpyji4d0w35deskovjqy97q3i
Can you let us know if this resolves the issue? If so, we'll also work on a reproduction to resolve this programmatically moving forward.
Please sign in to leave a comment.
Comments
7 comments