I have two columns(A and Date) like below, and need to combine them into one column like column C. This dataset has more than 900,000 rows.
Then I met with two main problems.
"Date"
is timestamp
, when I combine them with string type will cause error:TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'.
The code is way too time-costing. I wrote a for loop to do the combination as below:
for i in range(0,911462): df['Combine'][i]=df['Date'][i]+df['A'][i]
I guess it is because using for-loop is doing the combination row by row, thus every single combination cost a lot of time on system IO.
Is there any method to do this job more efficiently?
You have to explicitly case the Timestamp to a string e.g. with strftime:
In [11]: df = pd.DataFrame([[pd.Timestamp("2017-01-01"), 'a'], [pd.Timestamp("2017-01-02"), 'b']], columns=["A", "B"])
In [12]: df["A"].dt.strftime("%Y-%m-%d") + df["B"]
Out[12]:
0 2017-01-01a
1 2017-01-02b
dtype: object
Try with astype, it can cast object like Timestamp
to string
:
import pandas as pd
df = pd.DataFrame({'A':['XX','YY','ZZ','AA'], 'Date':[pd.Timestamp("2016-01-01"),pd.Timestamp('2016-01-15'),pd.Timestamp('2016-12-01'),pd.Timestamp('2016-07-12')]})
df['Combine'] = df['Date'].astype(str) + '_'+df['A']
df
df
will be:
A Date Combine
0 XX 2016-01-01 2016-01-01_XX
1 YY 2016-01-15 2016-01-15_YY
2 ZZ 2016-12-01 2016-12-01_ZZ
3 AA 2016-07-12 2016-07-12_AA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With