Creating Datalinks with Parquet
Is it possible to create a datalink to parquet files to have the data type set to list in Datameer? We would like to only have the data in HDFS and not have to duplicate it in Datameer to tokenize the string into a list. What would be the best practice of storing the values in parquet? What data type should the group of values be stored as in parquet?
-
Official comment
Hi Brandon,
The Parquet LIST data type is unfortunately not supported with our current implementation. You can find our documentation for supported Parquet field types here: Data Mapping in Parquet
However, I think we can solve your solution through Workbook configuration. You do not need to keep intermediate sheets within a workbook. This means that you can create a sheet just for tokenizing the string into a list, and then set that sheet as 'unkept' and the data will not be materialized into HDFS, so there will be no duplicate.
You can still use the results from this intermediate, unkept, sheet in subsequent sheets within the same workbook. Then, only keep the final result sheet you care about. You can find more detail about this in our documentation here: Saving results and time based partitions
Let me know if that helps!
Comment actions
Please sign in to leave a comment.
Comments
1 comment