I'm working in pandas doing pivot tables and when doing the groupby (to count distinct observations)
aggfunc={"person":{lambda x: len(x.unique())}}
gives me the following error:
'DataFrame' object has no attribute 'unique'
any ideas how to fix it?
The part 'DataFrame' object has no attribute 'str'' tells us that the DataFrame object we are handling does not have the str attribute. str is a Series and Index attribute. We can get a Series from a DataFrame by referring to a column name or using values.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
The unique function in pandas is used to find the unique values from a series. A series is a single column of a data frame. We can use the unique function on any possible set of elements in Python. It can be used on a series of strings, integers, tuples, or mixed elements.
If you try to call concat() on a DataFrame object, you will raise the AttributeError: 'DataFrame' object has no attribute 'concat'. You have to pass the columns to concatenate to pandas. concat() and define the axis to concatenate along.
One very easy solution to get the unique combinations of >1 columns from a DF is the following:
unique_A_B_combos = df[['A', 'B']].value_counts().index.values
DataFrames do not have that method; columns in DataFrames do:
df['A'].unique()
Or, to get the names with the number of observations (using the DataFrame given by closedloop):
>>> df.groupby('person').person.count()
Out[80]:
person
0 2
1 3
Name: person, dtype: int64
Rather than removing duplicates during the pivot table process, use the df.drop_duplicates()
function to selectively drop duplicates.
For example if you are pivoting using these index='c0'
and columns='c1'
then this simple step yields the correct counts.
In this example the 5th row is a duplicate of the 4th (ignoring the non-pivoted c2
column
import pandas as pd
data = {'c0':[0,1,0,1,1], 'c1':[0,0,1,1,1], 'person':[0,0,1,1,1], 'c_other':[1,2,3,4,5]}
df = pd.DataFrame(data)
df2 = df.drop_duplicates(subset=['c0','c1','person'])
pd.pivot_table(df2, index='c0',columns='c1',values='person', aggfunc='count')
This correctly outputs
c1 0 1
c0
0 1 1
1 1 1
df[['col1', 'col2']].nunique()
Try this instead of separate function
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With