I have a df that looks like this:
user_index movie_index genre_index cast_index
3590 1514 10|12|17|35 46|534
63 563 4|2|1|8 9|27
and was generated from:
import pandas as pd
ds = pd.DataFrame({'user_index': [3590,63], 'movie_index': [1514,563],
'genre_index':['10|12|17|35', '4|2|1|8'], 'cast_index':['46|534', '9|27']})
I need to split every row by '|' (whereas converting every row to list) and to add to each element some value to get such df (here, '5' is added element-wise in column 'genre_index', '2' is added element-wise in column 'user_index'):
user_index movie_index genre_index cast_index
[3592] [1514] [15,17,22,38] [46,534]
[65] [563] [9,7,6,13] [9,27]
to achieve this, I create a function that takes column as an argument, splits it and adds a value element-wise (I don't take 'df' as argument as an added value would be different for each column) looks like this:
def df_convertion(input_series, offset):
column = input_series.str.split('|', expand=False).apply(lambda x: x + offset)
return (column)
but apparently the whole thing doesn't work as desired (I've tried for 'genre_index' column) and returns such an error:
TypeError: can only concatenate list (not "int") to list
Any help in fixing it would be very appreciated!
This is one of those rare times I'll suggest using apply. Try to see whether you can use some other form of representation for your data.
offset_dct = {'user_index': 2, 'genre_index': 5}
df = df.fillna('').astype(str).apply(lambda x: [
[int(z) + offset_dct.get(x.name, 0) for z in y.split('|')] for y in x])
df
cast_index genre_index movie_index user_index
0 [46, 534] [15, 17, 22, 40] [1514] [3592]
1 [9, 27] [9, 7, 6, 13] [563] [65]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With