Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging content of two rows in Pandas

I have a data frame, where I would like to merge the content of two rows, and have it separated by underscore, within the same cell. If this is the original DF:

0   eye-right   eye-right   hand
1   location    location    position
2   12          27.7        2
3   14          27.6        2.2

I would like it to become:

0   eye-right_location   eye-right_location   hand_position
1   12                   27.7                 2
2   14                   27.6                 2.2

Eventually I would like to translate row 0 to become header, and reset indexes for the entire df.

like image 207
guyts Avatar asked Jan 22 '19 15:01

guyts


People also ask

How do I combine DataFrame rows?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.

Is merge and join same in pandas?

Pandas Join vs Merge Differences The main difference between join vs merge would be; join() is used to combine two DataFrames on the index but not on columns whereas merge() is primarily used to specify the columns you wanted to join on, this also supports joining on indexes and combination of index and columns.


2 Answers

You can set your column labels, slice via iloc, then reset_index:

print(df)
#            0          1         2
# 0  eye-right  eye-right      hand
# 1   location   location  position
# 2         12       27.7         2
# 3         14       27.6       2.2

df.columns = (df.iloc[0] + '_' + df.iloc[1])
df = df.iloc[2:].reset_index(drop=True)

print(df)
#   eye-right_location eye-right_location hand_position
# 0                 12               27.7             2
# 1                 14               27.6           2.2
like image 86
jpp Avatar answered Sep 30 '22 04:09

jpp


I like jpp's answer a lot. Short and sweet. Perfect for quick analysis.

Just one quibble: The resulting DataFrame is generically typed. Because strings were in the first two rows, all columns are considered type object. You can see this with the info method.

For data analysis, it's often preferable that columns have specific numeric types. This can be tidied up with one more line:

df.columns = df.iloc[0] + '_' + df.iloc[1]
df = df.iloc[2:].reset_index(drop=True)
df = df.apply(pd.to_numeric)

The third line here applies Panda's to_numeric function to each column in turn, leaving a more-typed DataFrame:

While not essential for simple usage, as soon as you start performing math on DataFrames, or start using very large data sets, column types become something you'll need to pay attention to.

like image 40
Jonathan Eunice Avatar answered Sep 30 '22 04:09

Jonathan Eunice