Unable to create or run import job after hive schema change
Hi,
I have an import job using a hive connection.
The table is stores as parquet and is using GZIP compression.
Recently the schema for the backing table change, some columns were removed.
Now from a hive shell I can query this table without any issue.
But the import job broke.
When trying to recreate the import job to reflect the new schema, on the "Define Fields" tab.
The Message box reads.
"Not all preview records parsed successfully"
And a handful of errors all reading
"java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable"
I can still save the job, but when I run it I get 0 records In, 0 out and 0 dropped.
Another thing to note is that, if I copy entire contents of the table to a temp table and back, I
can create the import job no problem.
So it would seem that datameer is having some issue reconciling the hive and parquet schema's
Has anyone seen this kind of problem before?
Any advice?
Thanks.
-
Thanks for the detailed description and history of the Import Job and associated Hive table. It's easy to understand the changes in the environment that led up to this exception. Are you able to share a longer stacktrace of the exceptions? This can help us trace down the cause in detail.
-
Hi,
Here is the the full stack trace.
java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:28) at datameer.das.plugin.hive.HiveUtil.copyToStandardObject(HiveUtil.java:319) at datameer.das.plugin.hive.HiveUtil.copyToStandardObject(HiveUtil.java:366) at datameer.das.plugin.hive.HiveUtil.copyToStandardObject(HiveUtil.java:296) at datameer.das.plugin.hive.HiveRecordParser.extractListOfValuesFromRecordSource(HiveRecordParser.java:140) at datameer.das.plugin.hive.HiveRecordParser.parse(HiveRecordParser.java:119) at datameer.dap.sdk.importjob.DelegateRecordParser.parse(DelegateRecordParser.java:23) at datameer.dap.sdk.importjob.extensions.DecoratingRecordParser.parse(DecoratingRecordParser.java:26) at datameer.dap.sdk.importjob.RecordParserWrapper.parse(RecordParserWrapper.java:34) at datameer.dap.sdk.importjob.enrichment.RecordEnricher.parse(RecordEnricher.java:73) at datameer.dap.common.job.dapimport.ImportPreview$1.run(ImportPreview.java:248) at datameer.dap.common.job.dapimport.ImportPreview$1.run(ImportPreview.java:221) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at datameer.dap.common.filesystem.Impersonator.doAs(Impersonator.java:31) at datameer.dap.common.job.dapimport.ImportPreview.parseRawRecords(ImportPreview.java:217) at datameer.dap.common.job.dapimport.ImportPreview.parse(ImportPreview.java:182) at datameer.dap.conductor.webapp.controller.data.DefineFieldsController.parseRecords(DefineFieldsController.java:215) at datameer.dap.conductor.webapp.controller.data.DefineFieldsController.pushRecordsAndLogUrl(DefineFieldsController.java:191) at datameer.dap.conductor.webapp.controller.data.DefineFieldsController.provideRecords(DefineFieldsController.java:91) at datameer.dap.conductor.webapp.controller.data.importjob.DataSourceW3DefineFieldsController.showDefineFields(DataSourceW3DefineFieldsController.java:97) at sun.reflect.GeneratedMethodAccessor701.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:177) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:446) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:434) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:967) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:858) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:843) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at datameer.dap.conductor.webapp.filter.MinifyJsCssFilter.doFilter(MinifyJsCssFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.HeaderInjectFilter.doFilter(HeaderInjectFilter.java:24) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.CsrfFilter.doHttpFilter(CsrfFilter.java:87) at datameer.dap.conductor.webapp.filter.CsrfFilter.doFilter(CsrfFilter.java:66) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.SessionFilter.doFilter(SessionFilter.java:53) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.RestAuthenticationOriginFilter.doFilter(RestAuthenticationOriginFilter.java:52) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.OptimisticLockRetryFilter.doFilter(OptimisticLockRetryFilter.java:31) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.PermissionDeniedExceptionFilter.doFilter(PermissionDeniedExceptionFilter.java:29) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.RequestMetaDataFilter.doFilter(RequestMetaDataFilter.java:27) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.PersistenceFilter.doFilter(Unknown Source) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:316) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:126) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:122) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:157) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:169) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:48) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:158) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at datameer.dap.conductor.authentication.CompositeAuthenticationFilter$InternalFilterChain.doFilter(CompositeAuthenticationFilter.java:64) at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:205) at datameer.dap.conductor.authentication.CompositeAuthenticationFilter$InternalFilterChain.doFilter(CompositeAuthenticationFilter.java:68) at datameer.dap.conductor.authentication.CompositeAuthenticationFilter.doFilter(CompositeAuthenticationFilter.java:49) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at datameer.dap.conductor.webapp.controller.fte.AccessTokenBasedAuthenticationProcessingFilter.doFilter(AccessTokenBasedAuthenticationProcessingFilter.java:94) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at datameer.dap.conductor.webapp.filter.AutoAdminAuthenticationFilter.doFilter(AutoAdminAuthenticationFilter.java:65) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:120) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:53) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:91) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:213) at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:176) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.ClickJackingPreventionFilter.doFilter(ClickJackingPreventionFilter.java:40) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:344) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:261) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.springframework.web.multipart.support.MultipartFilter.doFilterInternal(MultipartFilter.java:118) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.OptimisticLockRetryFilter.doFilter(OptimisticLockRetryFilter.java:31) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.PermissionDeniedExceptionFilter.doFilter(PermissionDeniedExceptionFilter.java:29) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.RequestMetaDataFilter.doFilter(RequestMetaDataFilter.java:27) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.DeprecatedRestUrlsFilter.doFilter(DeprecatedRestUrlsFilter.java:90) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.BrowserExceptionFilter.doFilter(BrowserExceptionFilter.java:31) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.RestExceptionFilter.doFilter(RestExceptionFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.NoResultExceptionFilter.doFilter(NoResultExceptionFilter.java:24) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.DasExceptionFilter.doFilter(DasExceptionFilter.java:29) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:85) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.webapp.filter.RequestContextFilter.doFilter(RequestContextFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at datameer.dap.conductor.authentication.FailedLoginServletFilter.doFilter(FailedLoginServletFilter.java:63) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at com.jamonapi.http.JAMonJettyHandlerNew.handle(JAMonJettyHandlerNew.java:36) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1129) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748)
-
Davor,
Thank you for the detailed exception, we can see the issue here:
java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:28)
The underlying problem is that the Schema defined in Hive's Metadata for this column doesn't match the actual Schema stored in the Parquet file. Here we see a Hadoop IO operation using the ParquetStringInspector - so we can tell that Hadoop is expecting a string. But per the exception, it got a Long instead. Hive Shell and Beeline support when the Metadata doesn't match the underlying structure stored within the files, but the Hadoop ParquetStringInspector does not. That is to say any application using the Hadoop Parquet reader would have this same problem.
The reason why this works if you copy the data out to a new table is the new files for the new table are written with a schema that matches what's in the Metadata. So this would be your solution - copy the table to get Parquet files that match the Schema specified in the Metadata.
Please let us know if you've got any further questions.
Please sign in to leave a comment.
Comments
3 comments