I have an example data set that's much smaller than my actual data set, it is actually a text file and I want to read it in as a pandas table and do something with it:
import pandas as pd
d = {
'one': ['title1', 'R2G', 'title2', 'K5G', 'title2','R14G', 'title2','R2T','title3', 'K10C', 'title4', 'W7C', 'title4', 'R2G', 'title5', 'K8C']
}
df = pd.DataFrame(d)
Example dataset looks like this:
df
Out[20]:
one
0 title1
1 R2G
2 title2
3 K5G
4 title2
5 R14G
6 title2
7 R2T
8 title3
9 K10C
10 title4
11 W7C
12 title4
13 R2G
14 title5
15 K8C
I added a second column called 'value':
df.insert(1,'value','')
df
Out[22]:
one value
0 title1
1 R2G
2 title2
3 K5G
4 title2
5 R14G
6 title2
7 R2T
8 title3
9 K10C
10 title4
11 W7C
12 title4
13 R2G
14 title5
15 K8C
I want to first move every other row over to the 'value' column:
one value
0 title1 R2G
1 title2 K5G
2 title2 R14G
3 title2 R2T
4 title3 K10C
5 title4 W7C
6 title4 R2G
7 title5 K8C
I then want to group by the title name, since there might be more than 1 values for the same title:
one value
0 title1 R2G
1 title2 K5G, R14G, R2T
2 title3 K10C
3 title4 W7C , R2G
4 title5 K8C
How can this be achieved?
Use pandas. To select the rows, the syntax is df. loc[start:stop:step] ; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows.
Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
Construct a new df by slicing the column using iloc
and a step arg:
In [185]:
new_df = pd.DataFrame({'one':df['one'].iloc[::2].values, 'value':df['one'].iloc[1::2].values})
new_df
Out[185]:
one value
0 title1 R2G
1 title2 K5G
2 title2 R14G
3 title2 R2T
4 title3 K10C
5 title4 W7C
6 title4 R2G
7 title5 K8C
You can then groupby
on 'one' and apply a lambda
on the 'value' column and just join
the values:
In [188]:
new_df.groupby('one')['value'].apply(','.join).reset_index()
Out[188]:
one value
0 title1 R2G
1 title2 K5G,R14G,R2T
2 title3 K10C
3 title4 W7C,R2G
4 title5 K8C
Alternatively, you can reshape and aggregate by passing groups of values into list.
import pandas as pd
d = {
'one': ['title1', 'R2G', 'title2', 'K5G', 'title2','R14G', 'title2','R2T','title3', 'K10C', 'title4', 'W7C', 'title4', 'R2G', 'title5', 'K8C']
}
df = pd.DataFrame(d)
# because you have simple alternating pattern, you can just reshape
df = pd.DataFrame(df.values.reshape(-1, 2), columns = ['one', 'value'])
# groupby on value and aggregate by joining a string
df = df.groupby('one')['value'].apply(', '.join).reset_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With