So I'm importing some data from a database into a DataFrame object using Pandas. The data's format is as follows:
time info from to frequency
19:00 ... A X 20
19:00 ... B Z 9
21:00 ... A Y 2
21:00 ... A Z 5
23:55 ... A X 8
Now, I have two problems I need to solve:
Count the sum of the frequencies of every movement from point to point independently of time, such that from point A to point X this sum would be 28. So as you may guess, "time" and "key" are disposable keys, I do not need them in this situation.
Since I can guarantee that all points in "from" are the same as in "to", I would like to have such sums mentioned above in form of a matrix of some sort.
I was already solving both of these with the following code:
import pandas as pd
def make_matrix(df: pd.DataFrame):
# Get grouped version, discarding date and info...
grouped = df.groupby(['from', 'to'])['frequency'].sum()
# Fill dictionary acting as matrix...
D = {}
for (_from, _to), freq in grouped.items():
if D.get(_from):
D[_from][_to] = int(freq)
else:
D[_from] = {}
For context, the first line turns the example DataFrame into:
from to frequency
A X 28
Y 2
Z 5
B Z 9
Thing is, I'm pretty sure there's a better way of doing this, but I can't find it elsewhere in StackOverflow or Google, since this is a pretty particular situation.
Also I'm looking for a better way because this dictionary ends up without the None/0 value for every instance of point X to the same point X.
I was thinking there should be an easier way to merge these columns without them ending up in the format of grouped: pd.Series, since having to iterate over each tuple like (A,X),(A,Y),(A,Z) and such as well as having to artificially add None to the trivial case of (X,X) in the dictionary felt very hacky...
Edit 1: I'm adding the desired matrix output... It should be something like this:
A B ... X Y Z
A null 0 ... 28 2 5
B 0 null ... 0 0 9
.
.
.
X 0 0 ... 0 0 0
Y 0 0 ... 0 0 0
Z 0 0 ... 0 0 0
additionally, if there was another tuple such as from X to A with frequency 25 in the matrix position M[X][A] there would be 0 instead of 25.
Edit 2: It is possible I'm indexing wrong, it could be the transposed matrix instead of the example one, either way you get the problem, it is a non-simetric squared matrix.
df.groupby(['from', 'to'])['frequency'].sum()
.unstack(fill_value=0)
Try writing this one and I am sure you will get the right output
You could try:
(df.groupby(['from', 'to'])['frequency'].sum()
.unstack(fill_value=0)
)
Output:
to X Y Z
from
A 28 2 5
B 0 0 9
Now, if you want all the destination available, you can use reindex:
all_cols = sorted(set(df['from']).union(set(df['to'])) )
(df.groupby(['from', 'to'])['frequency'].sum()
.unstack(fill_value=0)
.reindex(all_cols, fill_value=0)
.reindex(all_cols, fill_value=0, axis=1)
)
Output:
to A B X Y Z
from
A 0 0 28 2 5
B 0 0 0 0 9
X 0 0 0 0 0
Y 0 0 0 0 0
Z 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With