How to use .loc when the value of dataframe elements are list [python pandas]

Question

I have a dataframe df where the elements in df.trajec is a list.

For example, df.ix['smith']['trajec'] = ['a', 'b', 'c', 'a', 'b']

type(df.ix['smith']) = list

In this case, I found out that I cannot use such command.

aaa = ['a', 'b', 'c', 'a', 'b']
df.loc[df.trajec == aaa]

And it gives me an error message like below.

ValueError: Arrays were different lengths: 8886 vs 5

Is there any way to find the subset of the dataframe df where df.trajec is equal to a list aaa?

jezrael · Accepted Answer

You need apply for creating mask:

df = pd.DataFrame({'trajec':[['a', 'b', 'c', 'a', 'b'],
                             ['a', 'b'],
                             ['a','c', 'b']]}, 
                   index=['smith','smith1','smith2'])

print (df)
                 trajec
smith   [a, b, c, a, b]
smith1           [a, b]
smith2        [a, c, b]

aaa = ['a', 'b', 'c', 'a', 'b']
mask = df.trajec.apply(lambda x: x == aaa)
print (mask)
smith      True
smith1    False
smith2    False
Name: trajec, dtype: bool

#loc can be omit if need filter all columns
print (df[mask])
                trajec
smith  [a, b, c, a, b]

#if need apply mask and return only column `trajec`
print (df.loc[mask, 'trajec'])
smith    [a, b, c, a, b]
Name: trajec, dtype: object

Another possible mask is by list comprehension:

mask = [x == aaa for x in df.trajec.values]
print (mask)
[True, False, False]

print (df[mask])
                trajec
smith  [a, b, c, a, b]

How to use .loc when the value of dataframe elements are list [python pandas]

Tags:

python

pandas

SUNDONG

1 Answers

jezrael

Recent Activity

Donate For Us

How to use .loc when the value of dataframe elements are list [python pandas]

Tags:

python

pandas

SUNDONG

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us