Logo Questions Linux Laravel Mysql Ubuntu Git Menu

remove rows and ValueError Arrays were different lengths

My dataframe has subcategory, under each category (cat, dog, bird), stats information is presented. I need to remove the rows if they contain info in count and freq, and only keep rows with sd and mean values. Some values are NaN.

ValueError occurs in my codes.


 var    stats    A     B     C
 cat     mean    2     3     4
 NaN     sd      2     1     3
 NaN     count   5     2     6
 NaN     freq    3     1     19
 dog     mean    8     1     2
 NaN     sd      2     1     3
 NaN     count   4     6     1
 NaN     freq    3     1     19   
 bird    mean    2     3     4
 NaN     sd      2     1     3
 NaN     count   5     2     6
 NaN     freq    NaN   NaN   NaN 

My codes:

rows = ['count', 'freq']
df = [df.stats != rows]

Expected outcome

 var    stats    A     B     C
 cat     mean    2     3     4
 NaN     sd      2     1     3
 dog     mean    8     1     2
 NaN     sd      2     1     3   
 bird    mean    2     3     4
 NaN     sd      2     1     3


File "pandas/_libs/lib.pyx", line 805, in pandas._libs.lib.vec_compare 
ValueError: Arrays were different lengths: 819 vs 9

I am not sure how to check the array length, but in my excel spreadsheet, all columns and rows have the same length. Is this error caused by NaN/empty cell in my data?


like image 852
Lumos Avatar asked Oct 09 '17 22:10


1 Answers

!= will not work here. Use pd.Series.isin to obtain a mask you'll then use to filter your dataframe.

m = ~df.stats.isin(['count', 'freq'])
0      True
1      True
2     False
3     False
4      True
5      True
6     False
7     False
8      True
9      True
10    False
11    False
Name: stats, dtype: bool

    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   NaN    sd  2.0  1.0  3.0
4   dog  mean  8.0  1.0  2.0
5   NaN    sd  2.0  1.0  3.0
8  bird  mean  2.0  3.0  4.0
9   NaN    sd  2.0  1.0  3.0
like image 59
cs95 Avatar answered Nov 03 '22 00:11
