Basically, I have latitude and longitude (on a grid) in two different columns. I am getting fed two-element lists (could be numpy arrays) of a new coordinate set and I want to check if it is a duplicate before I add it. For example, my data: <pre class="prettyprint"><code>df = pd.DataFrame([[4,8, 'wolf', 'Predator', 10], [5,6,'cow', 'Prey', 10], [8, 2, 'rabbit', 'Prey', 10], [5, 3, 'rabbit', 'Prey', 10], [3, 2, 'cow', 'Prey', 10], [7, 5, 'rabbit', 'Prey', 10]], columns = ['lat', 'long', 'name', 'kingdom', 'energy']) newcoords1 = [4,4] newcoords2 = [7,5] </code></pre> Is it possible to write one <code>if</code> statement to tell me whether there is already a row with that latitude and longitude. In pseudo code: <pre class="prettyprint"><code>if newcoords1 in df['lat', 'long']: print('yes! ' + str(newcoords1)) </code></pre> (In the example, <code>newcoords1</code> should be <code>false</code> and <code>newcoords2</code> should be <code>true</code>. Sidenote: <code>(newcoords1[0] in df['lat']) & (newcoords1[1] in df['long'])</code> doesn't work because that checks them independently, but I need to know if that combination appears in a single row. Thank you in advance!

you can do it this way: <pre class="prettyprint"><code>In [140]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long') Out[140]: lat long name kingdom energy 5 7 5 rabbit Prey 10 In [146]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').empty Out[146]: False </code></pre> the following line will return a number of found rows: <pre class="prettyprint"><code>In [147]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').shape[0] Out[147]: 1 </code></pre> or using NumPy approach: <pre class="prettyprint"><code>In [103]: df[(df[['lat','long']].values == newcoords2).all(axis=1)] Out[103]: lat long name kingdom energy 5 7 5 rabbit Prey 10 </code></pre> this will show whether at least one row has been found: <pre class="prettyprint"><code>In [113]: (df[['lat','long']].values == newcoords2).all(axis=1).any() Out[113]: True In [114]: (df[['lat','long']].values == newcoords1).all(axis=1).any() Out[114]: False </code></pre> Explanation: <pre class="prettyprint"><code>In [104]: df[['lat','long']].values == newcoords2 Out[104]: array([[False, False], [False, False], [False, False], [False, False], [False, False], [ True, True]], dtype=bool) In [105]: (df[['lat','long']].values == newcoords2).all(axis=1) Out[105]: array([False, False, False, False, False, True], dtype=bool) </code></pre>

for people like me who came here by searching how to check if several pairs of values are in a pair of columns within a big dataframe, here an answer. Let a list <code>newscoord = [newscoord1, newscoord2, ...]</code> and you want to extract the rows of <code>df</code> matching the elements of this list. Then for the example above: <pre class="prettyprint lang-py prettyprint-override"><code>v = pd.Series( [ str(i) + str(j) for i,j in df[['lat', 'long']].values ] ) w = [ str(i) + str(j) for i,j in newscoord ] df[ v.isin(w) ] </code></pre> Which gives the same output as @MaxU, but it allows to extract several rows in once. On my computer, for a <code>df</code> with 10,000 rows, it takes 0.04s to run. Of course, if your elements are already strings, it is simpler to use <code>join</code> instead of concatenation. Furthermore, if the order of elements in the pair does not matter, you have to sort first: <pre class="prettyprint lang-py prettyprint-override"><code>v = pd.Series( [ str(i) + str(j) for i,j in np.sort( df[['lat','long']] ) ] ) w = [ str(i) + str(j) for i,j in np.sort( newscoord ) ] </code></pre> To be noted that if <code>v</code> is not converted into a series and one uses <code>np.isin(v,w)</code>, or i <code>w</code> is converted into a series, it would require more run time when <code>newscoord</code> reaches thousands of elements. Hope it helps.

check if pair of values is in pair of columns in pandas

Tags:

python

pandas

dataframe

Basically, I have latitude and longitude (on a grid) in two different columns. I am getting fed two-element lists (could be numpy arrays) of a new coordinate set and I want to check if it is a duplicate before I add it.

For example, my data:

df = pd.DataFrame([[4,8, 'wolf', 'Predator', 10],
              [5,6,'cow', 'Prey', 10],
              [8, 2, 'rabbit', 'Prey', 10],
              [5, 3, 'rabbit', 'Prey', 10],
              [3, 2, 'cow', 'Prey', 10],
              [7, 5, 'rabbit', 'Prey', 10]],
              columns = ['lat', 'long', 'name', 'kingdom', 'energy'])

newcoords1 = [4,4]
newcoords2 = [7,5]

Is it possible to write one if statement to tell me whether there is already a row with that latitude and longitude. In pseudo code:

if newcoords1 in df['lat', 'long']:
    print('yes! ' + str(newcoords1))

(In the example, newcoords1 should be false and newcoords2 should be true.

Sidenote: (newcoords1[0] in df['lat']) & (newcoords1[1] in df['long']) doesn't work because that checks them independently, but I need to know if that combination appears in a single row.

Thank you in advance!

732

asked Aug 23 '16 19:08

seth127

3 Answers

you can do it this way:

In [140]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long')
Out[140]:
   lat  long    name kingdom  energy
5    7     5  rabbit    Prey      10

In [146]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').empty
Out[146]: False

the following line will return a number of found rows:

In [147]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').shape[0]
Out[147]: 1

or using NumPy approach:

In [103]: df[(df[['lat','long']].values == newcoords2).all(axis=1)]
Out[103]:
   lat  long    name kingdom  energy
5    7     5  rabbit    Prey      10

this will show whether at least one row has been found:

In [113]: (df[['lat','long']].values == newcoords2).all(axis=1).any()
Out[113]: True

In [114]: (df[['lat','long']].values == newcoords1).all(axis=1).any()
Out[114]: False

Explanation:

In [104]: df[['lat','long']].values == newcoords2
Out[104]:
array([[False, False],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [ True,  True]], dtype=bool)

In [105]: (df[['lat','long']].values == newcoords2).all(axis=1)
Out[105]: array([False, False, False, False, False,  True], dtype=bool)

117

answered Oct 20 '22 00:10

MaxU - stop WAR against UA

for people like me who came here by searching how to check if several pairs of values are in a pair of columns within a big dataframe, here an answer.

Let a list newscoord = [newscoord1, newscoord2, ...] and you want to extract the rows of df matching the elements of this list. Then for the example above:

v = pd.Series( [ str(i) + str(j) for i,j in df[['lat', 'long']].values ] )
w = [ str(i) + str(j) for i,j in newscoord ]

df[ v.isin(w) ]

Which gives the same output as @MaxU, but it allows to extract several rows in once.

On my computer, for a df with 10,000 rows, it takes 0.04s to run.

Of course, if your elements are already strings, it is simpler to use join instead of concatenation.

Furthermore, if the order of elements in the pair does not matter, you have to sort first:

v = pd.Series( [ str(i) + str(j) for i,j in np.sort( df[['lat','long']] ) ] )
w = [ str(i) + str(j) for i,j in np.sort( newscoord ) ]

To be noted that if v is not converted into a series and one uses np.isin(v,w), or i w is converted into a series, it would require more run time when newscoord reaches thousands of elements.

Hope it helps.

answered Oct 19 '22 22:10

Michaël

x, y = newcoords1

>>> df[(df.lat == x) & (df.long == y)].empty
True  # Coordinates are not in the dataframe, so you can add it.

x, y = newcoords2

>>> df[(df.lat == x) & (df.long == y)].empty
False  # Coordinates already exist.

answered Oct 20 '22 00:10

Alexander

Related questions
                            
                                How to fill an area within a polygon in Python using matplotlib?
                            
                                socket.error: [Errno 102] Operation not supported on socket
                            
                                How to set xticks and yticks with my imshow plot?
                            
                                Venn3: How to reposition circles and labels?
                            
                                How to run multiple python file in a folder one after another [duplicate]
                            
                                RabbitMQ pika.exceptions.ConnectionClosed
                            
                                ImportError : cannot import name '_win32stdio'
                            
                                How do I put a circle with annotation in matplotlib?
                            
                                yield(x) vs. (yield(x)): parentheses around yield in python
                            
                                Pass estimator to custom score function via sklearn.metrics.make_scorer
                            
                                How to remove Python tools for Visual Studio (June 2016) update notification? It's already installed
                            
                                how can I translate efficiently a Java code to python? [closed]
                            
                                Array and __rmul__ operator in Python Numpy
                            
                                Efficient way to combine pandas data frames row-wise
                            
                                Search for a partial string match in a data frame column from a list - Pandas - Python
                            
                                Reading a .VTK polydata file and converting it into Numpy array
                            
                                How can you plot data from a .txt file using matplotlib?
                            
                                changing font attributes in jupyter notebook label widget
                            
                                pandas pivot_table keep index
                            
                                Python3 - How to convert a string to hex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With