How do i write following query using the hive -e "QUERY" syntax. Reason being query itself contain double quotes as well as %.
create external table tmp2(logdate string, time string, computername string, clientip string, uri string, qs string, localfile string, status string, referer string, w3status string, sc_bytes string, cs_bytes string, w3wpbytes string, cs_username string, cs_user_agent string, time_local string, timetakenms string, sc_substatus string, s_sitename string, s_ip string, s_port string, RequestsPerSecond string, s_proxy string, cs_version string, c_protocol string, cs_method string, cs_Host string, EndRequest_UTC string, date_local string, CPU_Utilization string, cs_Cookie string, BeginRequest_UTC string) ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" ="([0-9-]+) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([0-9-]+ [0-9:.]+) ([^ ]*) ([^ ]*) (\".*\"|[^ ]*) ([0-9-]+ [0-9:.]+)",
"output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s %19$s %20$s %21$s %22$s %23$s %24$s %25$s %26$s %27$s %28$s %29$s %30$s %31$s %32$s")
This would entirely depend up how you are sending it to Hive. Just running it from the command line, you have to follow standard escaping rules for double-quotes. If you have a quote within double-quotes, you have to escape it with a backslash. Similarly, you have to escape a backslash with another backslash.
Also, be sure to escape your carriage returns within the quotes.
If we simplify your example like this:
create external table tmp2(logdate string, time string) ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(\".*\"|[^ ]*) (\".*\"|[^ ]*)",
"output.format.string"="%1$s %2$s")
then you should be able to get it to run from the command line like this:
hive -e "create external table tmp2(logdate string, time string) ROW FORMAT SERDE \
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' \
WITH SERDEPROPERTIES ( \
\"input.regex\" = \"(\\\".*\\\"|[^ ]*) (\\\".*\\\"|[^ ]*)\", \
\"output.format.string\"=\"%1$s %2$s\")"
I would recommend simplifying your question before posting it. Sometimes the act of simplification will solve your problems for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With