Removing tuples with the same elements in lists
Hello,
assuming I have the following table with sample data (ID is a GROUPBY and list is a GROUPCONCAT of another sheet):
ID | list
1 | [a,b,c,c,b,a]
2 | [a,a,a,c,d,b,a]
What I want to do now is remove all pairs with the same elements so that
[a,b,c,c,b,a] becomes [a,b,c,b,a] and
[a,a,a,c,d,b,a] becomes [a,c,d,b,a]
How can I achieve this? GROUPCONCATDISCTINCT is not an option since it would remove every reoccuring points it the list.
Thanks in advance and with best regards
Tim
-
Official comment
Thanks for the additional details Tim and for confirming that you have already made considerations about the sorting to ensure consistent results.
In this particular scenario, I'd recommend using the GROUP_PATH_CHANGES function.
Comment actions -
There appears to be an implied assumption in your proposed use-case. Specifically, using the GROUPBY and GROUPCONCAT functions does not guarantee the order of the results between steps. The results may shift order as they are distributed and may complete at different times.
For this reason, the GROUPCONCATDISTINCT function is recommended here because it will search globally for duplicates and only return one record per instance.
If a use case requires the order of the rows to be maintained explicitly, be sure to use a function such as GROUP_SORT_ASC to maintain the precise ordering.
Does this use case require that order be maintained?
-
Thanks for the answer and for the suggestion Joel!
In fact I have a sorting variable (date) and applied it now on the GROUPCONCAT function sorting the table beforehand by date.
The problem remains. For further explanation: Imagine these lists as paths through a network where I want to remove the hops to the same location where the subject already is:
[a,a,a] -> [a]
[a,b,b,b,a] -> [a,b,a]
Please sign in to leave a comment.
Comments
4 comments