Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently combine two columns into one column/ combine strings?

I have two columns(A and Date) like below, and need to combine them into one column like column C. This dataset has more than 900,000 rows.

enter image description here

Then I met with two main problems.

  1. The data type of column "Date" is timestamp, when I combine them with string type will cause error:

TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'.

  1. The code is way too time-costing. I wrote a for loop to do the combination as below:

    for i in range(0,911462): df['Combine'][i]=df['Date'][i]+df['A'][i]

I guess it is because using for-loop is doing the combination row by row, thus every single combination cost a lot of time on system IO.

Is there any method to do this job more efficiently?

like image 645
Vane Leung Avatar asked Jan 04 '23 20:01

Vane Leung


2 Answers

You have to explicitly case the Timestamp to a string e.g. with strftime:

In [11]: df = pd.DataFrame([[pd.Timestamp("2017-01-01"), 'a'], [pd.Timestamp("2017-01-02"), 'b']], columns=["A", "B"])

In [12]: df["A"].dt.strftime("%Y-%m-%d") + df["B"]
Out[12]:
0    2017-01-01a
1    2017-01-02b
dtype: object
like image 50
Andy Hayden Avatar answered Jan 08 '23 09:01

Andy Hayden


Try with astype, it can cast object like Timestamp to string:

import pandas as pd
df = pd.DataFrame({'A':['XX','YY','ZZ','AA'], 'Date':[pd.Timestamp("2016-01-01"),pd.Timestamp('2016-01-15'),pd.Timestamp('2016-12-01'),pd.Timestamp('2016-07-12')]})
df['Combine'] = df['Date'].astype(str) + '_'+df['A']
df

df will be:

    A   Date        Combine
0   XX  2016-01-01  2016-01-01_XX
1   YY  2016-01-15  2016-01-15_YY
2   ZZ  2016-12-01  2016-12-01_ZZ
3   AA  2016-07-12  2016-07-12_AA
like image 41
Tiny.D Avatar answered Jan 08 '23 09:01

Tiny.D