How to handle double quotes in JSON values

Comments

7 comments

  • Official comment
    Joel Stewart

    Thanks for the clarification on the data source. In this circumstance, you would want to use the REGEX function to specifically read the white space characters from in-between the quotes and replace them with a temporary value. Here is the formula that I used in a lab to simulate your example: 

    REGEX(#A;"([^\"]+)(\")(\\S+)(\\s+)(\\S+)(\")([^\"]+)";"$1 $3WHITESPACE$5 $7")

    Here's the documentation link for the REGEX function in case you're not familiar with it: REGEX


    This formula with the Regex inside it is quite complicated, so here is an overview of what is intended overall and within this function itself: 

    1. Use the Regex function (example above) to remove the existing double quote characters and white space within the existing double quotes. These are replaced with placeholders that are assumed to be unique within the rest of the text and will later be replaced
    2. Perform the SUBSTITUTEALL function that you referenced before. This should no longer generate pairs of double quotes.
    3. Use the REPLACEALL function to replace the "WHITESPACE" text placeholder with a space character.

    Please note that the example I shared here is limited to matching exactly one instance of a quoted section with exactly one white section of white space inside of it. It can be expanded to fit more use cases by updating the internal regex. 

    Comment actions Permalink
  • Konsta Danyliuk

    Hello Amaranth.

    May I ask you to provide more details on your request, please, would you like to keep double quotes during ingestion or you'd like to remove them? It also would be helpful if you will provide sample of data you'd like to ingest and the way you want it to be displayed in Datameer.

    Thank you in advance.

    -1
    Comment actions Permalink
  • Amarnath Reddy Athuri

    Hi, 

    Input in data in one column of worksheed:

    Columnx: 

    Input:

    0 0 0 0 0 6 0 0 0 0 7 "2017-6-16 15:35:14" 24 0 300 0 0 0 0 0 0 0

    I want to display it in columny:

    Output:

    ["0","0","0","0","0","6","0","0","0","0","7","2017-6-16 15:35:14","24","0","300","0","0","0","0","0","0","0"]

    Basically I want to add " and , around every value and remove spaces. And want to keep the space where it found between double quotees.

     

    0
    Comment actions Permalink
  • Joel Stewart

    Amarnath, when you upload, import or link to the original data, are you able to use a space-character as a column separator? If so, you could also use the double-quote character as the quote character in the configuration. This would bring in the date values that you shared without any quotes around them and as single values. 

    Once the data is brought in as separate columns, you could use the LIST function to bring the columns together into a single entity and lastly the TOJSON function on the List object to output a JSON string. 

    0
    Comment actions Permalink
  • Amarnath Reddy Athuri

    Hi Joel, Thanks for replying.

    We are parsing a xml file to get this data as a string into a column. I need to convert this data into to JSON array somehow, so that I can JSON MAP using another JSON array I have in my data.

    I think regex should help here, unfortunately I don't have knowledge around it. 

    0
    Comment actions Permalink
  • Amarnath Reddy Athuri

    Thanks very much Joel for helping with the Regex. It works fine when I have single date value with in quotes. Howeve I have different behaviour in input data where I receive some records without this date data and multiple dates in some records. In such cases regex is not returning any output. Could you please help.

    0
    Comment actions Permalink
  • Joel Stewart

    Hi Amaranth, this is the expected behavior from the example that I had given before. For those that have zero entries, you could easily introduce an IF statement to check if any quote characters are included in the value at all. If not, then you could skip over the REGEX function altogether. 

    If you have multiple entries in a column, it will be required to adjust the regex statement itself to accommodate the different formats of your data set. What's important to note is the REGEX function itself needs to match the entire value of the data entity. The parentheses are used to create referenceable groups for controlling the output. 

    0
    Comment actions Permalink

Please sign in to leave a comment.