fill in dataframe with two for loops and if condition in python

Question

I have two DataFrames, one looks something like this:

df1:

x    y    Counts
a    b    1
a    c    3
b    c    2
c    d    1

The other one has both as index and as columns the list of unique values in the first two columns:

df2

   a  b  c  d
a
b
c
d

What I wouldl like to do is to fill in the second DataFrame with values from the first one, given the intersection of column and index is the same line from the first DataFrame, e.g.:

   a    b   c   d
a   0   1   3   0
b   1   0   2   0
c   3   2   0   1
d   0   0   1   0

While I try to use two for loops with a double if-condition, it makes the computer block (given that a real DataFrame contains more than 1000 rows).

The piece of code I am trying to implement (and which makes calculations apparently too 'heavy' for a computer to perform):

for i in df2.index:
    for j in df2.columns:
        if (i==df1.x.any() and j==df1.y.any()):
            df2.loc[i,j]=df1.Counts

Important to notice, the list of unique values (i.e., index and columns in the second DataFrame) is longer than the number of rows in the first columns, in my example they coincided.

If it is of any relevance, the first dataframe represents basically combinations of words in the first and in the second column and their occurences in the text. Occurrences are basically the weights of edges. So, I am trying to create a matrix so as to plot a graph via igraph. I chose to first create a DataFrame, then its values taken as an array pass to igraph. As far as I could understand, python-igraph cannot use dataframe to plot a graph, a numpy array only. Tried some of the soulutions suggested for the similar issues, nothing worked out so far.

Any suggestions to improve my question are warmly welcomed (it's my first question here).

Mohammad Yusuf · Accepted Answer

You can do something like this:

import pandas as pd

#df = pd.read_clipboard()
#df2 = df.copy()
df3=df2.pivot(index='x',columns='y',values='Counts')
print df3
print
new=sorted((set(df3.columns.tolist()+df3.index.tolist())))
df3 = df3.reindex(new,columns=new).fillna(0).applymap(int)
print df3

output:

y    b    c    d
x               
a  1.0  3.0  NaN
b  NaN  2.0  NaN
c  NaN  NaN  1.0

y  a  b  c  d
x            
a  0  1  3  0
b  0  0  2  0
c  0  0  0  1
d  0  0  0  0

piRSquared · Answer

stack df2 and fillna with df1

idx = pd.Index(np.unique(df1[['x', 'y']]))
df2 = pd.DataFrame(index=idx, columns=idx)

df2.stack(dropna=False).fillna(df1.set_index(['x', 'y']).Counts) \
    .unstack().fillna(0).astype(int)

   a  b  c  d
a  0  1  3  0
b  0  0  2  0
c  0  0  0  1
d  0  0  0  0

fill in dataframe with two for loops and if condition in python

Tags:

python

pandas

dataframe

NellyM

2 Answers

Mohammad Yusuf

piRSquared

Recent Activity

Donate For Us

fill in dataframe with two for loops and if condition in python

Tags:

python

pandas

dataframe

NellyM

2 Answers

Mohammad Yusuf

piRSquared

Related questions

Recent Activity

Donate For Us