Hello I have 3 columns of data that involve information for help desk tickets, such as description, action and resolution. I want to try and find what words are the most repeated across these 3 columns and then have that word returned in another column to try and help set up categories. For example if Description says: The database is down and not responding. The action says: User reported that the database is not responding. The resolution says: the database server was rebooted, the database is now back up. Then this would return the word database. I know there is some risk that little words like is and the could throw this off so ideal I would combine the text, and have it ignore those types of words. Right now I am doing this but it is through using the following column calculations: CONCAT (all 3 fields), LOWER (the concat result), EXTRACT_WORDS (lower result), REMOVE_STOP_WORDS (extract_words result), STEM_WORDS (remove_stop_words result), T (remove_stop_words result), COUNTMATCHES (t result and then a list of categories I had to come up with on my own first), MAX (results of COUNTMATCHES), IF(EQUALS (against the results of the Countmatches and the Max), and then COALESCE.
This meant putting in hours of work trying to manually figure out what categories I might have feed the formula for COUNTMATCHES. I want to know if there is a way to do this heavy lifting on the datameer side by having it return the most likely categories instead. In the example above if I have 6 categories then I have to have six columns of COUNTMATCHES where I have to tell it what keywords there are for each category.
Is there a better way?
Please sign in to leave a comment.