I'm trying to print one column from a parquet file using parquet-tools.jar (https://github.com/Parquet/parquet-mr/tree/master/parquet-tools). I'm using this command:
java -jar parquet-tools-1.6.1-SNAPSHOT.jar dump -c COLUMNNAME someParquet.parquet
But I get:
Invalid arguments: missing required arguments
usage: parquet-dump [option...] <input>
where option is one of:
-c,--column <arg> Dump only the given column, can be specified more than
once
-d,--disable-data Do not dump column data
--debug Enable debug output
-h,--help Show this help string
-m,--disable-meta Do not dump row group and page metadata
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
Not sure where I'm getting the syntax wrong.
Option -c,--column is thinking that you have specified multiple columns as arguments for "dump" commnad and ending up in eating all arguments. Hence you are seeing the missing requirement argument exception.
One workaround solution, i can suggest that you need to add one additional option just after the -c option. This will make CLI parser to stop eating unexpected arguments for -c option.
With Below command(added --debug option), you should be able to execute the program:
java -jar parquet-tools-1.6.1-SNAPSHOT.jar dump -c COLUMNNAME --debug someParquet.parquet
You can try --no-color instead of --debug too.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With