Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop pandas data frame

Tags:

python

pandas

i have below data frame and want to do loop:

df = name
      a
      b
      c
      d

i have tried below code:

for index, row in df.iterrows():
    for line in df['name']:
        print(index, line)

but the result i want is a dataframe as below:

df = name    name1
       a       a
       a       b
       a       c
       a       d
       b       a
       b       b
       b       c
       b       d
       etc.

is there any possible way to do it? i know its a stupid question but im new to python

like image 816
kj14 Avatar asked Apr 25 '26 10:04

kj14


2 Answers

One way using pandas.DataFrame.explode:

df["name1"] = [df["name"] for _ in df["name"]]
df.explode("name1")

Output:

  name name1
0    a     a
0    a     b
0    a     c
0    a     d
1    b     a
1    b     b
1    b     c
1    b     d
2    c     a
2    c     b
2    c     c
2    c     d
3    d     a
3    d     b
3    d     c
3    d     d
like image 121
Chris Avatar answered Apr 27 '26 22:04

Chris


Fastest solution in numpy, thank you @Ch3steR:

df = pd.DataFrame({'name':np.repeat(df['name'],len(df)),
                   'name1':np.tile(df['name'],len(df))}

Use itertools.product with DataFrame constructor:

from  itertools import product
df = pd.DataFrame(product(df['name'], df['name']), columns=['name','name1'])
#oldier pandas versions
#df = pd.DataFrame(list(product(df['name'], df['name'])), columns=['name','name1'])
print (df)
   name name1
0     a     a
1     a     b
2     a     c
3     a     d
4     b     a
5     b     b
6     b     c
7     b     d
8     c     a
9     c     b
10    c     c
11    c     d
12    d     a
13    d     b
14    d     c
15    d     d

Another idea is use cross join, best solution if performance is important:

df1 = df.assign(new=1)
df = df1.merge(df1, on='new', suffixes=('','1')).drop('new', axis=1)

Performance:

from  itertools import product

df = pd.DataFrame({'name':range(1000)})
# print (df)


In [17]: %%timeit
    ...: df["name1"] = [df["name"] for _ in df["name"]]
    ...: df.explode("name1")
    ...: 
    ...: 
18.9 s ± 1.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [18]: %%timeit
    ...: pd.DataFrame(product(df['name'], df['name']), columns=['name','name1'])
    ...: 
1.01 s ± 62.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]: %%timeit
    ...: df1 = df.assign(new=1)
    ...: df1.merge(df1, on='new', suffixes=('','1')).drop('new', axis=1)
    ...: 
    ...: 
245 ms ± 21.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [20]: %%timeit
    ...: pd.DataFrame({'name':np.repeat(df['name'],len(df)), 'name1':np.tile(df['name'],len(df))})
    ...: 
30.2 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 41
jezrael Avatar answered Apr 27 '26 23:04

jezrael