Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas- adding a series to a dataframe causes NaN values to appear

I have a dataframe that looks something like this:

d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
     'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
     'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
     'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
     'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
df = pd.DataFrame(d)

Col_1  Col_2  Col_3  Col_4  Col_5
  A      B      NaN    NaN    NaN
  A      C      D      NaN    NaN
  A      B      C      D      E
  B      D      NaN    NaN    NaN

My Goal is to end up with something along the lines of:

Col_1  Col_2  Col_3  Col_4  Col_5  ConCat
  A      B      NaN    NaN    NaN    A:B
  A      C      D      NaN    NaN    A:C:D
  A      B      C      D      E      A:B:C:D:E
  B      D      NaN    NaN    NaN    B:D

I've successfully created a dataframe that looks like the desired output from:

rows = df.values
df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])

    0
0  A:B
1  A:C:D
2  A:B:C:D:E
3  B:D

But now when I attempt to place it into the original dataframe, I get:

df['concatenated'] = df_1

Col_1  Col_2  Col_3  Col_4  Col_5  concatenated
  A      B      NaN    NaN    NaN    NaN
  A      C      D      NaN    NaN    NaN
  A      B      C      D      E      NaN
  B      D      NaN    NaN    NaN    NaN

What's strange is that when creating a simplified example, it works as expected. Below if the full code of what I'm doing. The original data comes to me transposed from what the original dataframe above looks like.

df_caregiver_type = pd.concat([df_caregiver_type[col].order().reset_index(drop=True) for col in df_caregiver_type], axis=1, ignore_index=False).T
df_caregiver_type.rename(columns=lambda x: 'Col_' + str(x), inplace=True)
rows = df_caregiver_type.values
df_caregiver_type1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
df_caregiver_type['concatenated'] = df_caregiver_type1
df_caregiver_type = df_caregiver_type.T
df_caregiver_type

Update I'm thinking I'm getting an error due to the first row of the full code. It's from a separate, but related question: pandas: sort each column individually

like image 509
DataSwede Avatar asked Jun 12 '14 15:06

DataSwede


2 Answers

For your full dataset, change the last step from df['concatenated'] = df_1 to df['concatenated'] = df_1.values will solve the issue, I think it a bug and I am very sure I have seen it in SO before.

Or just: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]

like image 181
CT Zhu Avatar answered Sep 28 '22 01:09

CT Zhu


>>> d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
...      'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
...      'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
...      'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
...      'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
>>> df = pd.DataFrame(d)
>>> 
>>> rows = df.values
>>> df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
>>> 
>>> df['concatenated'] = df_1[0]
>>> df
  Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
0     A     B   NaN   NaN   NaN          A:B
1     A     C     D   NaN   NaN        A:C:D
2     A     B     C     D     E    A:B:C:D:E
3     B     D   NaN   NaN   NaN          B:D
>>> 
like image 29
Vor Avatar answered Sep 28 '22 02:09

Vor