As an example, I have the following dataframe:
Date Balance
2013-04-01 03:50:00 A
2013-04-01 04:00:00 A
2013-04-01 04:15:00 B
2013-04-01 04:15:00 B
2013-04-01 04:25:00 A
2013-04-01 04:25:00 A
2013-04-01 04:35:00 B
2013-04-01 04:40:00 B
2013-04-02 04:55:00 B
2013-04-02 04:56:00 A
2013-04-02 04:57:00 A
2013-04-03 10:30:00 A
2013-04-03 16:35:00 A
2013-04-03 20:40:00 A
My goal is to add one column 'Counter' that basically shows a balance of the number of A's and B's. So, every time an A appears, the counter column increases one value. Every time B appears, the counter column decreases one value. If two A's appear at the same time (same Date) in two consecutive rows, the balance should increase by two on both of the rows (the same reasoning applies for consecutive B's or for A's and B's at the same time). Therefore, the dataframe would look like this in the end:
Date Balance Counter
2013-04-01 03:50:00 A 1
2013-04-01 04:00:00 A 2
2013-04-01 04:15:00 B 0
2013-04-01 04:15:00 B 0
2013-04-01 04:25:00 A 2
2013-04-01 04:25:00 A 2
2013-04-01 04:35:00 B 1
2013-04-01 04:40:00 B 0
2013-04-02 04:55:00 B -1
2013-04-02 04:56:00 A 0
2013-04-02 04:57:00 A 1
2013-04-03 10:30:00 A 2
2013-04-03 16:35:00 A 3
2013-04-03 20:40:00 A 4
The major problem is that the dataframe has more than 2 millions rows, therefore it is really time consuming to perform a loop. Is there any way to implement a vectorized approach to this problem?
Edit (I was able to compile a solution that works well if the dates are not the same on consecutive rows). Anyone could help me to figure out the rest?
d = {'Date': ['2013-04-01 03:50:00', '2013-04-01 04:00:00','2013-04-01
04:15:00','2013-04-01 04:15:00','2013-04-01 04:25:00',
'2013-04-01 04:25:00','2013-04-01 04:35:00','2013-04-01 04:40:00','2013-04-
02 04:55:00','2013-04-02 04:56:00',
'2013-04-02 04:57:00','2013-04-03 10:30:00','2013-04-03 16:35:00','2013-04-
03 20:40:00'], 'Balance': ['A','A','B','B','A','A','B','B','B',
'A','A','A','A','A',]}
df = pd.DataFrame(data=d)
df['plus_minus'] = np.where(df.Balance == 'A', 1, -1)
df['Counter'] = df['plus_minus'].cumsum()
Using count () method in Python Pandas we can count the rows and columns. Count method requires axis information, axis=1 for column and axis=0 for row. To count the rows in Python Pandas type df.count (axis=1), where df is the dataframe and axis=1 refers to column. Sorry, something went wrong. Reload? Sorry, we cannot display this file.
In collections, you’ll find a class specially designed to count several different objects in one go. This class is conveniently called Counter. Counter is a subclass of dict that’s specially designed for counting hashable objects in Python. It’s a dictionary that stores objects as keys and counts as values.
Updating Object Counts Once you have a Counter instance in place, you can use.update () to update it with new objects and counts. Rather than replacing values like its dict counterpart, the.update () implementation provided by Counter adds existing counts together. It also creates new key-count pairs when necessary.
This class is conveniently called Counter. Counter is a subclass of dict that’s specially designed for counting hashable objects in Python. It’s a dictionary that stores objects as keys and counts as values. To count with Counter, you typically provide a sequence or iterable of hashable objects as an argument to the class’s constructor.
One approach would be to group by the Date and sum the values. The cumulative sum of that gives you the net at end of that datetime, and then we can reindex by the Date to broadcast the result back up to the main frame:
df['plus_minus'] = np.where(df.Balance == 'A', 1, -1)
by_dt = df["plus_minus"].groupby(df["Date"]).sum().cumsum()
df["Counter2"] = by_dt.reindex(df.Date).values
gives me
Date Balance Counter plus_minus Counter2
0 2013-04-01 03:50:00 A 1 1 1
1 2013-04-01 04:00:00 A 2 1 2
2 2013-04-01 04:15:00 B 0 -1 0
3 2013-04-01 04:15:00 B 0 -1 0
4 2013-04-01 04:25:00 A 2 1 2
5 2013-04-01 04:25:00 A 2 1 2
6 2013-04-01 04:35:00 B 1 -1 1
7 2013-04-01 04:40:00 B 0 -1 0
8 2013-04-02 04:55:00 B -1 -1 -1
9 2013-04-02 04:56:00 A 0 1 0
10 2013-04-02 04:57:00 A 1 1 1
11 2013-04-03 10:30:00 A 2 1 2
12 2013-04-03 16:35:00 A 3 1 3
13 2013-04-03 20:40:00 A 4 1 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With