I'm trying to write a pig latin script to pull the count of a dataset that I've filtered.
Here's the script so far:
/* scans by title */
scans = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans = FILTER scans BY (title MATCHES 'proactiv');
scancount = FOREACH productscans GENERATE COUNT($0);
DUMP scancount;
For some reason, I get the error:
Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.
What am I doing wrong here? I'm assuming it has something to do with the type of the field I'm passing in, but I can't seem to resolve this.
TIA, Jason
Is this what you're looking for (group by all to bring everything into one bag, then count the items):
scans = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans = FILTER scans BY (title MATCHES 'proactiv');
grouped = GROUP productscans ALL;
count = FOREACH grouped GENERATE COUNT(productscans);
dump count;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With