Goal
Import bz2 or other compressed files.
Learn
Usually the codec org.apache.hadoop.io.compress.BZip2Codec
is included into the list of supported codecs in the Hadoop configuration by default. You can validate this by checking the io.compression.codecs
property in the core-site.xml file or with a cluster management tool (e.g., Ambari).
<!-- COMPRESSION related -->
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
In the Datameer configuration file conf/das-conductor.properties
, the default extensions are listed for compressed files. bz2
is amongst these extensions.
## Importing files ending with one of those specified suffixes will result in an exception if
## the proper compression-codec can't be loaded. This helps to fail fast and clear instead of displaying spaghetti data.
das.import.compression.known-file-suffixes=zip,gz,bz2,lzo,lzo_deflate,Z
You should be able to import files compressed by zip, gz, bz2, etc. without special configuration changes.
Related documentation.
Comments
0 comments
Please sign in to leave a comment.