I have created a table in HIVE
CREATE TABLE IF NOT EXISTS daily_firstseen_analysis (
firstSeen STRING,
category STRING,
circle STRING,
specId STRING,
language STRING,
osType STRING,
count INT)
PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS orc;
count(*) is not giving me correct result for this table
hive> select count(*) from daily_firstseen_analysis;
OK
75
Time taken: 0.922 seconds, Fetched: 1 row(s)
While the number of rows in this table is 959 rows
hive> select * from daily_firstseen_analysis;
....
Time taken: 0.966 seconds, Fetched: 959 row(s)
it gives data with 959 rows
hive> ANALYZE TABLE daily_firstseen_analysis PARTITION(day) COMPUTE STATISTICS noscan;
Partition logdata.daily_firstseen_analysis{day=20140521} stats: [numFiles=6, numRows=70, totalSize=4433, rawDataSize=37202]
Partition logdata.daily_firstseen_analysis{day=20140525} stats: [numFiles=6, numRows=257, totalSize=4937, rawDataSize=136385]
Partition logdata.daily_firstseen_analysis{day=20140523} stats: [numFiles=6, numRows=211, totalSize=5059, rawDataSize=112140]
Partition logdata.daily_firstseen_analysis{day=20140524} stats: [numFiles=6, numRows=280, totalSize=5257, rawDataSize=148808]
Partition logdata.daily_firstseen_analysis{day=20140522} stats: [numFiles=6, numRows=141, totalSize=4848, rawDataSize=74938]
OK
Time taken: 5.098 seconds
I am using hive with version Hive 0.13.0.2.1.2.0-402
NOTE: I found this issue in count(*) if We are inserting into a table in more than one time. Tables created with single insert do not have this issue
I execute ANALYZE TABLE ...
at first is OK, but raise error when i try again.so i try:
hive> REFRESH TABLE daily_firstseen_analysis;
hive> SELECT COUNT(*) FROM daily_firstseen_analysis;
this is explain
I had the same problem, and using ANALYZE fixed it. Running these commands in order should give you the correct count:
hive> ANALYZE TABLE daily_firstseen_analysis PARTITION(day) COMPUTE STATISTICS;
hive> SELECT COUNT(*) FROM daily_firstseen_analysis;
i.e. you have to use the analyze command before the count. You have half the answer within your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With