Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a dataframe to a list of tuples

I have a table pandas DF which looks like

Slave start_addr0 end_addr0 start_addr1 end_addr1 start_addr2 end_addr2
0 0 10000000 1FFFFFFF NaN NaN NaN NaN
1 1 20000000 2007FFFF 40000000 40005FFF NaN NaN
2 1 20000000 2007FFFF 20100000 201FFFFF NaN NaN
3 2 20200000 202FFFFF 20080000 20085FFF 40006000 400FFFFF
4 3 0 0FFFFFFF NaN NaN NaN NaN
5 4 20300000 203FFFFF NaN NaN NaN NaN
6 5 20400000 204FFFFF NaN NaN NaN NaN

For each slave number I need to convert it to a list of ranges (tuples). For example,

Slave1_list = ( (20000000, 2007FFFF), (40000000, 40005FFF), (20100000, 201FFFFF))

The number of slaves (rows) and address-pairs (columns) can vary.

Thanks

EDIT:

Run the following code to load sample data into dataframe:

import pandas as pd
import io

f = io.StringIO('''Slave|start_addr0|end_addr0|start_addr1|end_addr1|start_addr2|end_addr2
0|10000000|1FFFFFFF|NaN|NaN|NaN|NaN
1|20000000|2007FFFF|40000000|40005FFF|NaN|NaN
1|20000000|2007FFFF|20100000|201FFFFF|NaN|NaN
2|20200000|202FFFFF|20080000|20085FFF|40006000|400FFFFF
3|0|0FFFFFFF|NaN|NaN|NaN|NaN
4|20300000|203FFFFF|NaN|NaN|NaN|NaN
5|20400000|204FFFFF|NaN|NaN|NaN|NaN
''')
df = pd.read_csv(f, sep='|', engine='python', index_col=None)
like image 365
Mihael Fugenfirov Avatar asked Nov 15 '22 21:11

Mihael Fugenfirov


1 Answers

You can try:

One option via wide_to_long:


df = df.reset_index()
result = pd.wide_to_long(df, stubnames=['start_addr', 'end_addr'], i=['index', 'Slave'], j='add_num', sep='').dropna(
).reset_index([0, -1], drop=True).apply(tuple, 1).groupby(level=0).agg(list)

An option via groupby:

k = df.set_index('Slave').stack().reset_index()
result = k.groupby(k.index//2).agg({'Slave': 'first', 0 : tuple}).groupby('Slave').agg({0 : set})

Explanation:

df.set_index('Slave').stack().reset_index() will remove the NaN values and stack the dataframe.

k.groupby(k.index//2) will group alternate rows and perform the required aggregations(tuples are formed in this step)

.groupby('Slave').agg({0 : set}) -> Last groupby is to capture the unique tuples for each slave.

OUTPUT:

                                                                            0
Slave                                                                        
0                                                      {(10000000, 1FFFFFFF)}
1      {(40000000.0, 40005FFF), (20100000.0, 201FFFFF), (20000000, 2007FFFF)}
2      {(20080000.0, 20085FFF), (40006000.0, 400FFFFF), (20200000, 202FFFFF)}
3                                                             {(0, 0FFFFFFF)}
4                                                      {(20300000, 203FFFFF)}
5                                                      {(20400000, 204FFFFF)}

NOTE: I'm assuming for every start_addr there exists an end_addr.

like image 163
Nk03 Avatar answered Dec 22 '22 04:12

Nk03