Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Attempting to make a matrix out of DataFrame from pandas

So I'm importing some data from a database into a DataFrame object using Pandas. The data's format is as follows:

time   info    from    to    frequency
19:00  ...      A      X        20
19:00  ...      B      Z         9
21:00  ...      A      Y         2
21:00  ...      A      Z         5
23:55  ...      A      X         8

Now, I have two problems I need to solve:

  1. Count the sum of the frequencies of every movement from point to point independently of time, such that from point A to point X this sum would be 28. So as you may guess, "time" and "key" are disposable keys, I do not need them in this situation.

  2. Since I can guarantee that all points in "from" are the same as in "to", I would like to have such sums mentioned above in form of a matrix of some sort.

I was already solving both of these with the following code:

import pandas as pd

def make_matrix(df: pd.DataFrame):

    # Get grouped version, discarding date and info...
    grouped = df.groupby(['from', 'to'])['frequency'].sum()

    # Fill dictionary acting as matrix...
    D = {}
    for (_from, _to), freq in grouped.items():
        if D.get(_from):
            D[_from][_to] = int(freq)
        else:
            D[_from] = {}

For context, the first line turns the example DataFrame into:

from    to    frequency
   A     X        28
         Y         2
         Z         5
   B     Z         9

Thing is, I'm pretty sure there's a better way of doing this, but I can't find it elsewhere in StackOverflow or Google, since this is a pretty particular situation.

Also I'm looking for a better way because this dictionary ends up without the None/0 value for every instance of point X to the same point X.

I was thinking there should be an easier way to merge these columns without them ending up in the format of grouped: pd.Series, since having to iterate over each tuple like (A,X),(A,Y),(A,Z) and such as well as having to artificially add None to the trivial case of (X,X) in the dictionary felt very hacky...

Edit 1: I'm adding the desired matrix output... It should be something like this:

    A     B    ...    X    Y    Z
A  null   0    ...   28    2    5
B   0    null  ...    0    0    9
.
.
.
X   0     0    ...    0    0    0
Y   0     0    ...    0    0    0
Z   0     0    ...    0    0    0

additionally, if there was another tuple such as from X to A with frequency 25 in the matrix position M[X][A] there would be 0 instead of 25.

Edit 2: It is possible I'm indexing wrong, it could be the transposed matrix instead of the example one, either way you get the problem, it is a non-simetric squared matrix.

like image 404
Kvothe Avatar asked Jan 24 '26 16:01

Kvothe


2 Answers

df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)

Try writing this one and I am sure you will get the right output

like image 78
its-akanksha Avatar answered Jan 27 '26 08:01

its-akanksha


You could try:

(df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)
)

Output:

to     X  Y  Z
from          
A     28  2  5
B      0  0  9

Now, if you want all the destination available, you can use reindex:

all_cols = sorted(set(df['from']).union(set(df['to'])) )

(df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)
   .reindex(all_cols, fill_value=0)
   .reindex(all_cols, fill_value=0, axis=1)
)

Output:

to    A  B   X  Y  Z
from                
A     0  0  28  2  5
B     0  0   0  0  9
X     0  0   0  0  0
Y     0  0   0  0  0
Z     0  0   0  0  0
like image 35
Quang Hoang Avatar answered Jan 27 '26 08:01

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!