I'm an R user and I cannot figure out the pandas equivalent of match(). I need use this function to iterate over a bunch of files, grab a key piece of info, and merge it back into the current data structure on 'url'. In R I'd do something like this:
logActions <- read.csv("data/logactions.csv")
logActions$class <- NA
files = dir("data/textContentClassified/")
for( i in 1:length(files)){
tmp <- read.csv(files[i])
logActions$class[match(logActions$url, tmp$url)] <-
tmp$class[match(tmp$url, logActions$url)]
}
I don't think I can use merge() or join(), as each will overwrite logActions$class each time. I can't use update() or combine_first() either, as neither have the necessary indexing capabilities. I also tried making a match() function based on this SO post, but cannot figure out how to get it to work with DataFrame objects. Apologies if I'm missing something obvious.
Here's some python code that summarizes my ineffectual attempts to do something like match() in pandas:
from pandas import *
left = DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = DataFrame({'url': ['bar.com'], 'class': [ 1]})
# Doesn't work:
left.join(right1, on='url')
merge(left, right, on='url')
# Also doesn't work the way I need it to:
left = left.combine_first(right1)
left = left.combine_first(right2)
left
# Also does something funky and doesn't really work the way match() does:
left = left.set_index('url', drop=False)
right1 = right1.set_index('url', drop=False)
right2 = right2.set_index('url', drop=False)
left = left.combine_first(right1)
left = left.combine_first(right2)
left
The desired output is:
url action class
0 foo.com 0 0
1 foo.com 1 0
2 bar.com 0 1
BUT, I need to be able to call this over and over again so I can iterate over each file.
In conclusion, we can say that R is a programming language whereas Pandas is a library. Using the packages of R, we can perform different operations where Pandas helps us to perform different operations. This tutorial will help beginners to understand the difference between the two and also help in migrating easily.
Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.
Learn More. Heey great post, but pandas has very similar functions as dplyr. If you use those instead, you get statements very similar to your dplyr statements and you would get the same readability.
The PANDA R package (Preferential Attachment based common Neighbor Distribution derived Associations) was designed to perform the following tasks: (1) identify significantly functionally associated protein pairs, (2) predict GO and KEGG terms for proteins, (3) make a cluster of proteins based on the significant protein ...
Note the existance of pandas.match
which does precisely what R's match
does.
Edit:
If url in all right dataframes re unique, you can make the right dataframe as a Series of class
indexed by url
, then you can get the class of every url in left by index it.
from pandas import *
left = DataFrame({'url': ['foo.com', 'bar.com', 'foo.com', 'tmp', 'foo.com'], 'action': [0, 1, 0, 2, 4]})
left["klass"] = NaN
right1 = DataFrame({'url': ['foo.com', 'tmp'], 'klass': [10, 20]})
right2 = DataFrame({'url': ['bar.com'], 'klass': [30]})
left["klass"] = left.klass.combine_first(right1.set_index('url').klass[left.url].reset_index(drop=True))
left["klass"] = left.klass.combine_first(right2.set_index('url').klass[left.url].reset_index(drop=True))
print left
Is this what you want?
import pandas as pd
left = pd.DataFrame({'url': ['foo.com', 'foo.com', 'bar.com'], 'action': [0, 1, 0]})
left["class"] = NaN
right1 = pd.DataFrame({'url': ['foo.com'], 'class': [0]})
right2 = pd.DataFrame({'url': ['bar.com'], 'class': [ 1]})
pd.merge(left.drop("class", axis=1), pd.concat([right1, right2]), on="url")
output:
action url class
0 0 foo.com 0
1 1 foo.com 0
2 0 bar.com 1
if the class column in left is not all NaN, you can combine_fist it with the result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With