Python: Attempting to make a matrix out of DataFrame from pandas

Question

So I'm importing some data from a database into a DataFrame object using Pandas. The data's format is as follows:

time   info    from    to    frequency
19:00  ...      A      X        20
19:00  ...      B      Z         9
21:00  ...      A      Y         2
21:00  ...      A      Z         5
23:55  ...      A      X         8

Now, I have two problems I need to solve:

Count the sum of the frequencies of every movement from point to point independently of time, such that from point A to point X this sum would be 28. So as you may guess, "time" and "key" are disposable keys, I do not need them in this situation.
Since I can guarantee that all points in "from" are the same as in "to", I would like to have such sums mentioned above in form of a matrix of some sort.

I was already solving both of these with the following code:

import pandas as pd

def make_matrix(df: pd.DataFrame):

    # Get grouped version, discarding date and info...
    grouped = df.groupby(['from', 'to'])['frequency'].sum()

    # Fill dictionary acting as matrix...
    D = {}
    for (_from, _to), freq in grouped.items():
        if D.get(_from):
            D[_from][_to] = int(freq)
        else:
            D[_from] = {}

For context, the first line turns the example DataFrame into:

from    to    frequency
   A     X        28
         Y         2
         Z         5
   B     Z         9

Thing is, I'm pretty sure there's a better way of doing this, but I can't find it elsewhere in StackOverflow or Google, since this is a pretty particular situation.

Also I'm looking for a better way because this dictionary ends up without the None/0 value for every instance of point X to the same point X.

I was thinking there should be an easier way to merge these columns without them ending up in the format of grouped: pd.Series, since having to iterate over each tuple like (A,X),(A,Y),(A,Z) and such as well as having to artificially add None to the trivial case of (X,X) in the dictionary felt very hacky...

Edit 1: I'm adding the desired matrix output... It should be something like this:

    A     B    ...    X    Y    Z
A  null   0    ...   28    2    5
B   0    null  ...    0    0    9
.
.
.
X   0     0    ...    0    0    0
Y   0     0    ...    0    0    0
Z   0     0    ...    0    0    0

additionally, if there was another tuple such as from X to A with frequency 25 in the matrix position M[X][A] there would be 0 instead of 25.

Edit 2: It is possible I'm indexing wrong, it could be the transposed matrix instead of the example one, either way you get the problem, it is a non-simetric squared matrix.

its-akanksha · Accepted Answer

df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)

Try writing this one and I am sure you will get the right output

Quang Hoang · Answer

You could try:

(df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)
)

Output:

to     X  Y  Z
from          
A     28  2  5
B      0  0  9

Now, if you want all the destination available, you can use reindex:

all_cols = sorted(set(df['from']).union(set(df['to'])) )

(df.groupby(['from', 'to'])['frequency'].sum()
   .unstack(fill_value=0)
   .reindex(all_cols, fill_value=0)
   .reindex(all_cols, fill_value=0, axis=1)
)

Output:

to    A  B   X  Y  Z
from                
A     0  0  28  2  5
B     0  0   0  0  9
X     0  0   0  0  0
Y     0  0   0  0  0
Z     0  0   0  0  0

Python: Attempting to make a matrix out of DataFrame from pandas

Tags:

python

pandas

dataframe

Kvothe

2 Answers

its-akanksha

Quang Hoang

Recent Activity

Donate For Us

Python: Attempting to make a matrix out of DataFrame from pandas

Tags:

python

pandas

dataframe

Kvothe

2 Answers

its-akanksha

Quang Hoang

Related questions

Recent Activity

Donate For Us