Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PIG - Scalar has more than one row in the output

I have the following data set for a movie database:

Ratings: UserID, MovieID, Rating Movies: MovieID, Genre Users: UserID, Gender, Age

I wrote a PIG script to get female users in the age group (20-30) who have rated the highest rated movie. The following is the code I've got so far:

users_input = load '/users.dat' USING PigStorage('\u003B') as (UserID: long, gender: chararray, age: int, occupation: int, zip: long);
movies_input = load '/movies.dat' USING PigStorage('\u003B') as (MovieID: long, title: chararray, genre: chararray);
ratings_input = load '/ratings.dat' USING PigStorage('\u003B') as (UserID: long, MovieID: long, rating: int, timestamp: chararray);

movie_filter = filter movies_input by (genre matches '.*Action.*') OR (genre matches '.*War.*');

temp = COGROUP movie_filter by MovieID, ratings_input by MovieID;

temp1 = FILTER temp BY COUNT(movie_filter) > 0;

temp2 = FOREACH temp1 GENERATE group, AVG(ratings_input.rating) AS ratings;

temp3 = ORDER temp2 BY ratings DESC;

temp4 = LIMIT temp3 1;

temp5 = FOREACH temp4 GENERATE ratings;

temp6 = FILTER temp3 BY (temp5.ratings == ratings);

female_users = filter users_input by gender == 'F';
age_users = filter female_users by age >=20 AND age <=30;
age_use = FOREACH age_users GENERATE UserID;

MovID = FOREACH temp6 GENERATE group;

all_users_records = FILTER ratings_input BY (MovID.group == MovieID);

all_users = FOREACH all_users_records GENERATE UserID;

female_aged_records = FILTER all_users BY (UserID == age_use.UserID);

female_aged_users = FOREACH female_aged_records GENERATE UserID;

store all_users into '/output_pig' using PigStorage();

I execute this but end up getting the error: "Scalar has more than one row in the output. 1st : (11), 2nd :(24)"

Could anyone please help me? Thanks in advance.

like image 758
Maddy Avatar asked Mar 20 '14 02:03

Maddy


2 Answers

As others have noted, this is not a very helpful error message. You've probably got a dot where you need a double semi-colon.

like image 87
jhofman Avatar answered Oct 19 '22 23:10

jhofman


@jhofman, I think you mean a double colon (the relation operator) '::' instead of a dot.

Finally, the pig script should look like this:

...
temp2 = FOREACH temp1 GENERATE group, AVG(ratings_input :: rating) AS ratings;
...
temp6 = FILTER temp3 BY (temp5 :: ratings == ratings);
...
all_users_records = FILTER ratings_input BY (MovID :: group == MovieID);

all_users = FOREACH all_users_records GENERATE UserID;

female_aged_records = FILTER all_users BY (UserID == age_use :: UserID);

like image 21
noob333 Avatar answered Oct 19 '22 22:10

noob333