Cannot create a data link

Comments

18 comments

  • Konsta Danyliuk

    Hello Michael.

    Mentioned error might be caused by some mismatches in format of dataset you are truing to load.

    • Could you please clarify what exactly data are you trying to ingest into Datameer by this DataLink?

    • Are you able to get correct Data Preview while setting up the DataLink?

    • Please try to check Drop record strategy under How to handle invalid data section and check if this helps the job to run without the error.

    0
    Comment actions Permalink
  • Michael Ahn

    I try to link to AVRO data I exported from a Workbook through a HDFS connection. I see the correct data preview in the data link wizard. Changing invalid data handling makes no difference.

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Michael, thank you for additional information.

    What if you will create a base line test for this activity, to narrow down the root cause:

    • create a very simple dataset (lets say 5 or 10 records).
    • export it into HDFS in avro format.
    • ingest it back via the DataLink.

    0
    Comment actions Permalink
  • Michael Ahn

    I tried with a very simple table:

    Almost the same error:

    [anonymous] ERROR [2016-10-13 15:37:55.508] [qtp563628874-15226] (DasExceptionFilter.java:43) - Unhandled exception handling /import-job/create/save
    org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.StringIndexOutOfBoundsException: String index out of range: -2

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Thank you, Michael.

    • Do you face the same issues with any other files export from HDFS?
    • Could you save this tiny avro file at local machine and try to upload it directly?
    0
    Comment actions Permalink
  • Michael Ahn

    Doing an import instead of DataLink works with the HDFS connection. But I need the DataLink.

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Just to confirm if I properly understand - you are able to ingest mentioned avgo file from HDFS over an ImportJob, but DataLink gives you the error, right?

    0
    Comment actions Permalink
  • Michael Ahn

    Yes

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Michael.

    Can you try the following, please

    • Create new folder in HDFS
    • Create new HDFS connection and point it into this folder
    • Copy mentioned avro file (or just the test one) into this folder
    • Create new Data Link and give it another try to ingest the file.

    0
    Comment actions Permalink
  • Michael Ahn

    I tried it with a new HDFS connection, did not work. I tried the same file with an ssh connection, did not work. I tried other file format (Excel), did not work. I tried another user (admin), did not work. I just cannot create any datalink, I always get that exception on save.

    0
    Comment actions Permalink
  • Michael Ahn

    Any news about this? Looks like a bug to me.

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Hello Michael.

    Could you please let me know what Datameer version you are using and what is your Hadoop distribution?

    0
    Comment actions Permalink
  • Michael Ahn

    Datameer 6.1.5 with CDH 5.6.1

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Michael,

    Let me arrange few tests at my environment.

    btw, have you tried to restart Datameer?

    0
    Comment actions Permalink
  • Michael Ahn

    Of course. I also tried with a fresh installed Datameer and failed as well.

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Just made few tests at fresh installed 6.1.5 (but for another Hadoop distribution HDP 2.4) - everything works fine, and I was able to create and use DataLinks for HDFS and Hive.

    To get some more inputs, could you please try to switch Datameer to embedded cluster mode, setup connection to Datameer server file system and try to create a DataLink.  

     

     

    0
    Comment actions Permalink
  • Michael Ahn

    I found the reason for the exception: The file paths.

    I used "/data" as the path in the connection and "/" as path in the datalink. It does also not work the other way round. But now I'm using "/data" in the connection and "/mydir" in the datalink and it works. So I have a workaround, but you should be able to reproduce the bug and open a ticket.

    0
    Comment actions Permalink
  • Konsta Danyliuk

    Hello Michael.

    Good to hear that you were able to find workaround. Thank you for pointing us to this issue, I'll check the problem with engineers.

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.