Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fill in dataframe with two for loops and if condition in python

I have two DataFrames, one looks something like this:

df1:

x    y    Counts
a    b    1
a    c    3
b    c    2
c    d    1

The other one has both as index and as columns the list of unique values in the first two columns:

df2

   a  b  c  d
a
b
c
d

What I wouldl like to do is to fill in the second DataFrame with values from the first one, given the intersection of column and index is the same line from the first DataFrame, e.g.:

   a    b   c   d
a   0   1   3   0
b   1   0   2   0
c   3   2   0   1
d   0   0   1   0

While I try to use two for loops with a double if-condition, it makes the computer block (given that a real DataFrame contains more than 1000 rows).

The piece of code I am trying to implement (and which makes calculations apparently too 'heavy' for a computer to perform):

for i in df2.index:
    for j in df2.columns:
        if (i==df1.x.any() and j==df1.y.any()):
            df2.loc[i,j]=df1.Counts

Important to notice, the list of unique values (i.e., index and columns in the second DataFrame) is longer than the number of rows in the first columns, in my example they coincided.

If it is of any relevance, the first dataframe represents basically combinations of words in the first and in the second column and their occurences in the text. Occurrences are basically the weights of edges. So, I am trying to create a matrix so as to plot a graph via igraph. I chose to first create a DataFrame, then its values taken as an array pass to igraph. As far as I could understand, python-igraph cannot use dataframe to plot a graph, a numpy array only. Tried some of the soulutions suggested for the similar issues, nothing worked out so far.

Any suggestions to improve my question are warmly welcomed (it's my first question here).

like image 574
NellyM Avatar asked Feb 06 '23 12:02

NellyM


2 Answers

You can do something like this:

import pandas as pd

#df = pd.read_clipboard()
#df2 = df.copy()
df3=df2.pivot(index='x',columns='y',values='Counts')
print df3
print
new=sorted((set(df3.columns.tolist()+df3.index.tolist())))
df3 = df3.reindex(new,columns=new).fillna(0).applymap(int)
print df3

output:

y    b    c    d
x               
a  1.0  3.0  NaN
b  NaN  2.0  NaN
c  NaN  NaN  1.0

y  a  b  c  d
x            
a  0  1  3  0
b  0  0  2  0
c  0  0  0  1
d  0  0  0  0
like image 179
Mohammad Yusuf Avatar answered Feb 08 '23 00:02

Mohammad Yusuf


stack df2 and fillna with df1

idx = pd.Index(np.unique(df1[['x', 'y']]))
df2 = pd.DataFrame(index=idx, columns=idx)

df2.stack(dropna=False).fillna(df1.set_index(['x', 'y']).Counts) \
    .unstack().fillna(0).astype(int)

   a  b  c  d
a  0  1  3  0
b  0  0  2  0
c  0  0  0  1
d  0  0  0  0
like image 24
piRSquared Avatar answered Feb 08 '23 02:02

piRSquared