Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Must have equal len keys and value when setting with an iterable

Tags:

python

pandas

I have two dataframes as follows:

leader:
    0 11
    1 8
    2 5
    3 9
    4 8
    5 6
    [6065 rows x 2 columns]

DatasetLabel:    
    Unnamed: 0      0    1  ....    7     8    9  10  11  12  
    0               A    J  ....    1     2    5 NaN NaN NaN  
    1               B    K  ....    3     4   NaN  NaN NaN NaN  

    [4095 rows x 14 columns]

The Information dataset column names 0 to 6 are DatasetLabel about data and 7 to 12 are indexes that refer to the first column of leader Dataframe.

I want to create dataset where instead of the indexes in DatasetLabel Dataset I have the value of each index from the leader dataset, which is leader.iloc[index,1]

How can I do it using python features?

The output should look like:

 DatasetLabel:    
        Unnamed: 0      0    1  ....    7     8    9  10  11  12  
        0               A    J  ....    8     5    6 NaN NaN NaN  
        1               B    K  ....    9     8   NaN  NaN NaN NaN  

I have come up with the following, but I get an error:

    for column in DatasetLabel.ix[:,8:13]:
        DatasetLabel[DatasetLabel[column].notnull ()]=leader.iloc[DatasetLabel[DatasetLabel[column].notnull ()][column].values,1]

Error:

ValueError: Must have equal len keys and value when setting with an iterable
like image 511
user3806649 Avatar asked Dec 28 '17 00:12

user3806649


1 Answers

You can use apply to index into leader and exchange values with DatasetLabel, although it's not very pretty.

One issue is that Pandas won't let us index with NaN. Converting to str provides a workaround. But that creates a second issue, namely, column 9 is of type float (because NaN is float), so 5 becomes 5.0. Once it's a string, that's "5.0", which will fail to match the index values in leader. We can remove the .0, and then this solution will work - but it's a bit of a hack.

With DatasetLabel as:

   Unnamed:0  0  1  7  8    9  10  11  12
0          0  A  J  1  2  5.0 NaN NaN NaN
1          1  B  K  3  4  NaN NaN NaN NaN

And leader as:

   0   1
0  0  11
1  1   8
2  2   5
3  3   9
4  4   8
5  5   6

Then:

cols = ["7","8","9","10","11","12"]
updated = DatasetLabel[cols].apply(
    lambda x: leader.loc[x.astype(str).str.split(".").str[0], 1].values, axis=1)

updated
     7    8    9  10  11  12
0  8.0  5.0  6.0 NaN NaN NaN
1  9.0  8.0  NaN NaN NaN NaN

Now we can concat the unmodified columns (which we'll call original) with updated:

original_cols = DatasetLabel.columns[~DatasetLabel.columns.isin(cols)]
original = DatasetLabel[original_cols]
pd.concat([original, updated], axis=1)

Output:

   Unnamed:0  0  1    7    8    9  10  11  12
0          0  A  J  8.0  5.0  6.0 NaN NaN NaN
1          1  B  K  9.0  8.0  NaN NaN NaN NaN

Note: It may be clearer to use concat here, but here's another, cleaner way of merging original and updated, using assign:

DatasetLabel.assign(**updated)
like image 198
andrew_reece Avatar answered Oct 30 '22 22:10

andrew_reece