Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas concat dataframes with different columns: AttributeError: 'NoneType' object has no attribute 'is_extension'

I am trying to concatenate two dataframes which have different column names along the 0 axis. I found a similar question here How to use join_axes in the column-wise axis concatenation using pandas DataFrame? however this solution does not work for me since the column-names of my two dataframes are not the same. Since my original data is too large to post here the following example should illustrates what I am trying to do:

df1 = pd.DataFrame(np.random.randint(0,100,size=(1, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(1, 4)), columns=list('EFGH'))

#df1
    A   B   C   D
0   26  39  7   44

#df2
    E   F   G   H
0   12  44  26  64

pd.concat([df1,df2],axis=0).reset_index(drop=True)
# desired output looks like this
A   B   C   D   E   F   G   H
0   26.0    39.0    7.0 44.0    NaN NaN NaN NaN
1   NaN NaN NaN NaN 12.0    44.0    26.0    64.0

The above code works perfectly. However, once I input my own dataframes for df1 and df2, using the exact same syntax above, I get an error.

# my real dfs are called data1 & data2, I tried setting ignore_index=True and ignore_index=False
pd.concat([data1, data2],axis=0, ignore_index=True)

results in the following error:

Error:

 ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-194-dbee1fd0bdea> in <module>
    ----> 1 pd.concat([data1, data2],axis=0, ignore_index=True)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
        224                        verify_integrity=verify_integrity,
        225                        copy=copy, sort=sort)
    --> 226     return op.get_result()
        227 
        228 

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
        421             new_data = concatenate_block_managers(
        422                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
    --> 423                 copy=self.copy)
        424             if not self.copy:
        425                 new_data._consolidate_inplace()

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
       5414                 values = values.view()
       5415             b = b.make_block_same_class(values, placement=placement)
    -> 5416         elif is_uniform_join_units(join_units):
       5417             b = join_units[0].block.concat_same_type(
       5418                 [ju.block for ju in join_units], placement=placement)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in is_uniform_join_units(join_units)
       5438         # no blocks that would get missing values (can lead to type upcasts)
       5439         # unless we're an extension dtype.
    -> 5440         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
       5441         # no blocks with indexers (as then the dimensions do not fit)
       5442         all(not ju.indexers for ju in join_units) and

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in <genexpr>(.0)
       5438         # no blocks that would get missing values (can lead to type upcasts)
       5439         # unless we're an extension dtype.
    -> 5440         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
       5441         # no blocks with indexers (as then the dimensions do not fit)
       5442         all(not ju.indexers for ju in join_units) and

    AttributeError: 'NoneType' object has no attribute 'is_extension'

I do not quite understand what this error message is trying to tell me. I've been trying to use fillna on both dataframes such that there should be no 'NoneType' anymore:

data2 = data2.fillna(999)
data1 = data1.fillna(999)

However, I still get the same error message.

The two dataframes I am using are quite large, so I cant unfortunately post the entire example here. The content of my two dataframes are just integers, floats and strings so nothing fancy going on here that would strike a possible cause of error. Any idea on what might cause this error or what I could check to narrow down the problem?

Thank you very much!

like image 456
AaronDT Avatar asked Feb 14 '19 13:02

AaronDT


1 Answers

Turns out the problem were just duplicate column names in one of my dataframes...Getting rid of those duplicates solved the problem. Above code now works flawless.

like image 114
AaronDT Avatar answered Sep 22 '22 17:09

AaronDT