My script does simple job, run SQL from a file and save to CSV.
Code is up and running but there is odd behaviour while producing CSV output. Data starts at around line 70, rather then from very beginning in the CSV file.
#!/bin/bash beeline -u jdbc:hive2:default -n -p --silent=true --outputformat=csv2 -f code.sql > file_date+`%Y%m%d%H%M%`.csv
I would like my data to start at the very first row of actual data.
1 blank;blank;blank 2 blank;blank;blank 3 blank;blank;blank 4 attr;attr;attr 5 data;data;data 6 data;data;data 7 data;data;data 8 data;data;data 9 data;data;data
Why csv writer adds blank rows? The way Python handles newlines on Windows can result in blank lines appearing between rows when using csv. writer . In Python 2, opening the file in binary mode disables universal newlines and the data is written properly.
You can try this in your hql: INSERT OVERWRITE DIRECTORY '/user/user1/results' select count(*) from sample_table; This will write the output of your query into the results directory on HDFS.
Workaround embedded in next step of my automation:
sed -i '/^$/d' file.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With