Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Showing wrong count after importing table in Hive

Tags:

hive

sqoop

I have imported near about 10 tables in Hive from MS SQL Server. But when I try to cross check the records in Hive in one of the Table I have found more record when I run the query (select count(*) from tblName;).

Then I have drop the that Table and again imported it in Hive. I have observed in Console Logs that (Retrieved 203 records). And then I tried again for (select count(*) from tblName;) and I got the count as 298.

I dont understand this why this happens. Is anything is wrong in query or it happens due to some incorrect command of sqoop-import.

All other table records are fine.

Pls help me out from this.

like image 863
Bhavesh Shah Avatar asked Feb 08 '12 10:02

Bhavesh Shah


1 Answers

I got the solution for this problem from the mailing list and I would like to share it. Their reply came as:

we were experiencing similar issue in the past - table in hive appear to have more rows than were reported to be imported by sqoop and that were actually available in the database.

Described problem on our side was in incorrect characters in exported data that broke lines in the exported test CSV file. For example some of our rows contained data with new line characters. Because couple of exported rows were split into more lines, number of hive rows appeared to be more than the import number. You might be experiencing similar issue. We've solved the issue by using parameter --hive-drop-import-delims (or you can possibly use --hive-delims-replacement). For semantics and usage, please consider taking look at manual:

http://incubator.apache.org/sqoop/docs/1.4.0-incubating/SqoopUserGuide.html#id1765770

Thanks

like image 179
Bhavesh Shah Avatar answered Nov 15 '22 11:11

Bhavesh Shah