I have:
df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [1, 2, 3]})
   col1  col2
0  asdf     1
1    xy     2
2     q     3
I'd like to take the "combinatoric product" of each letter from the strings in col1, with each elementwise int in col2.  I.e.:
  col1  col2
0    a    1
1    s    1
2    d    1
3    f    1
4    x    2
5    y    2
6    q    3
Current method:
from itertools import product
pieces = []
for _, s in df.iterrows():
    letters = list(s.col1)
    prods = list(product(letters, [s.col2]))
    pieces.append(pd.DataFrame(prods))
pd.concat(pieces)
Any more efficient workarounds?
Using list + str.join and np.repeat -
pd.DataFrame(
{
     'col1' : list(''.join(df.col1)), 
     'col2' : df.col2.values.repeat(df.col1.str.len(), axis=0)
})
  col1  col2
0    a     1
1    s     1
2    d     1
3    f     1
4    x     2
5    y     2
6    q     3
A generalised solution for any number of columns is easily achievable, without much change to the solution -
i = list(''.join(df.col1))
j = df.drop('col1', 1).values.repeat(df.col1.str.len(), axis=0)
df = pd.DataFrame(j, columns=df.columns.difference(['col1']))
df.insert(0, 'col1', i)
df
  col1 col2
0    a    1
1    s    1
2    d    1
3    f    1
4    x    2
5    y    2
6    q    3
Performance
df = pd.concat([df] * 100000, ignore_index=True)
# MaxU's solution
%%timeit
df.col1.str.extractall(r'(.)') \
           .reset_index(level=1, drop=True) \
           .join(df['col2']) \
           .reset_index(drop=True)
1 loop, best of 3: 1.98 s per loop
# piRSquared's solution
%%timeit
pd.DataFrame(
     [[x] + b for a, *b in df.values for x in a],
     columns=df.columns
)
1 loop, best of 3: 1.68 s per loop
# Wen's solution
%%timeit
v = df.col1.apply(list)
pd.DataFrame({'col1':np.concatenate(v.values),'col2':df.col2.repeat(v.apply(len))})
1 loop, best of 3: 835 ms per loop
# Alexander's solution
%%timeit
pd.DataFrame([(letter, i) 
              for letters, i in zip(df['col1'], df['col2']) 
              for letter in letters],
             columns=df.columns)
1 loop, best of 3: 316 ms per loop
%%timeit
pd.DataFrame(
{
     'col1' : list(''.join(df.col1)), 
     'col2' : df.col2.values.repeat(df.col1.str.len(), axis=0)
})
10 loops, best of 3: 124 ms per loop
I tried timing Vaishali's, but it took too long on this dataset.
pd.DataFrame([(letter, i) 
              for letters, i in zip(df['col1'], df['col2']) 
              for letter in letters],
             columns=df.columns)
                        Trick from the list :-)
df.col1=df.col1.apply(list)
df
Out[489]: 
           col1  col2
0  [a, s, d, f]     1
1        [x, y]     2
2           [q]     3
pd.DataFrame({'col1':np.concatenate(df.col1.values),'col2':df.col2.repeat(df.col1.apply(len))})
Out[490]: 
  col1  col2
0    a     1
0    s     1
0    d     1
0    f     1
1    x     2
1    y     2
2    q     3
                        In [86]: df.col1.str.extractall(r'(.)') \
           .reset_index(level=1, drop=True) \
           .join(df['col2']) \
           .reset_index(drop=True)
Out[86]:
   0  col2
0  a     1
1  s     1
2  d     1
3  f     1
4  x     2
5  y     2
6  q     3
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With