Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas df.at() raising AttributeError: 'BlockManager' object has no attribute 'T'

Tags:

python

pandas

I have a relatively huge dataframe. Im trying to iterate to each row and update a column base on certain column value (basically trying to loop a lookup until no further column can be updated)

I have the following:

df = the huge dataframe (1K to 10K+ rows x 51 cols)

has_update = True
while has_update:
   has_update = False

   for_procdf = df.loc[df['Incident Group ID'] == '-']

   for i, row in for_procdf.iterrows():
       #Check if the row's parent ticket id is an existing ticket id in the bigger df
       resultRow = df.loc[df['Ticket ID'] == row['Parent Ticket ID']]
       resultCount = len(resultRow.index)
       if resultCount == 1:
           IncidentGroupID = resultRow.iloc[0]['Incident Group ID']
           if IncidentGroupID != '-':
               df.at[i, "Incident Group ID"] = IncidentGroupID
               has_update = True

When I execute the script, an error occurs with the following traceback:

Traceback (most recent call last):
  File "./sdm.etl.py", line 76, in <module>
    main()
  File "./sdm.etl.py", line 28, in main
    fillIncidentGroupID(sdmdf.df)
  File "./sdm.etl.py", line 47, in fillIncidentGroupID
    df.at[i, "Incident Group ID"] = IncidentGroupID
  File "/usr/local/lib/python3.6/site-packages/pandas/core/indexing.py", line 2159, in __setitem__
    self.obj._set_value(*key, takeable=self._takeable)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2580, in _set_value
    series = self._get_item_cache(col)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2490, in _get_item_cache
    res = self._box_item_values(item, values)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 3096, in _box_item_values
    return self._constructor(values.T, columns=items, index=self.index)
AttributeError: 'BlockManager' object has no attribute 'T'

However creating a similar scenario returns no error

>>> qdf = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30], [10, 13, 17]], index=[0,1,2,3], columns=['Ab 1', 'Bc 2', 'Cd 3'])
>>> qdf
   Ab 1  Bc 2  Cd 3
0     0     2     3
1     0     4     1
2    10    20    30
3    10    13    17
>>>
>>> qdf1 = qdf.loc[qdf['Ab 1'] == 0]
>>> qdf1
   Ab 1  Bc 2  Cd 3
0     0     2     3
1     0     4     1
>>>
>>> for i, row in qdf1.iterrows():
...     qdf.at[i, 'Ab 1'] = 10
...
>>>
>>> qdf
   Ab 1  Bc 2  Cd 3
0    10     2     3
1    10     4     1
2    10    20    30
3    10    13    17

What seems to be the problem with my implementation?

like image 238
JCm Avatar asked Jan 29 '19 07:01

JCm


2 Answers

Found out that, Nihal is right, the error is caused by a duplicate column name. My dataframe was too big, that I accidentally had a duplicate column name. Everything works fine now. A little time away from the code, rest and eat made me see the duplicate column. Cheers!

Below are the columns of my dataframe. "RCA Group ID" has duplicate near the end.

['Incident Group ID', 'RCA Group ID', 'Parent Ticket ID', 'Ticket ID', ..., 'RCA Group ID', 'Is Sector Down', 'Relationship Type']
like image 57
JCm Avatar answered Nov 16 '22 07:11

JCm


the error is caused by a duplicate column name

That was true in my case.

You can use the following function to quickly determine which column names are duplicates.

def get_duplicate_cols(df: pd.DataFrame) -> pd.Series:
    return pd.Series(df.columns).value_counts()[lambda x: x>1]

Source

like image 43
Serhii Kushchenko Avatar answered Nov 16 '22 07:11

Serhii Kushchenko