Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Merge Two Numpy Arrays Based on Condition

How can I merge the following two arrays, by looking up a value from array A in array B?

Array A:

array([['GG', 'AB', IPv4Network('1.2.3.41/26')],
       ['GG', 'AC', IPv4Network('1.2.3.42/25')],
       ['GG', 'AD', IPv4Network('1.2.3.43/24')],
       ['GG', 'AE', IPv4Network('1.2.3.47/23')],
       ['GG', 'AF', IPv4Network('1.2.3.5/24')]],
      dtype=object)

and Array B:

array([['123456', 'A1', IPv4Address('1.2.3.5'), nan],
       ['987654', 'B1', IPv4Address('1.2.3.47'), nan]],
      dtype=object)  

The goal here is to create Array C, by looking up the IPv4Address from Array B in Array A and comparing them, and getting the corresponding array's second value and storing it:

Array C:

array([['123456', 'A1', IPv4Address('1.2.3.5'), nan, 'AF'],
       ['987654', 'B1', IPv4Address('1.2.3.47'), nan, 'AE']],
      dtype=object) 

The ip addresses are of this type: https://docs.python.org/3/library/ipaddress.html#ipaddress.ip_network

How can I achieve this?

edit:

Please note that the merging is conditioned on the IPs matching, so the resulting array C will have the same number of arrays as the Array B, but it will have one more value. The suggested duplicate links are not answering the same question.

like image 880
teebeetee Avatar asked Dec 12 '18 09:12

teebeetee


People also ask

How do I merge two NumPy arrays in Python?

Use numpy. concatenate() to merge the content of two or multiple arrays into a single array. This function takes several arguments along with the NumPy arrays to concatenate and returns a Numpy array ndarray. Note that this method also takes axis as another argument, when not specified it defaults to 0.

Does += work with NumPy arrays?

Numpy arrays are mutable objects that have clearly defined in place operations. If a and b are arrays of the same shape, a += b adds the two arrays together, using a as an output buffer.

How do I vertically concatenate NumPy arrays?

NumPy: vstack() function The vstack() function is used to stack arrays in sequence vertically (row wise). This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N). The arrays must have the same shape along all but the first axis.

How do I append a NumPy array to another NumPy array in Python?

Append NumPy array to another You can append a NumPy array to another NumPy array by using the append() method. In this example, a NumPy array “a” is created and then another array called “b” is created. Then we used the append() method and passed the two arrays.


1 Answers

This should do what you asked for (at least the output is exactly what you wanted), I made some minor assumptions to deal with your #dummydata, but that should not matter too much.

Code:

import numpy as np
import ipaddress as ip

array_A = np.array([['GG', 'AB', ip.ip_network('192.168.0.0/32')],
                    ['GG', 'AC', ip.ip_network('192.168.0.0/31')],
                    ['GG', 'AD', ip.ip_network('192.168.0.0/30')],
                    ['GG', 'AE', ip.ip_network('192.168.0.0/29')],
                    ['GG', 'AF', ip.ip_network('192.168.0.0/28')]],
                   dtype=object)

array_B = np.array([['123456', 'A1', ip.ip_network('192.168.0.0/28'), np.nan],
                    ['987654', 'B1', ip.ip_network('192.168.0.0/29'), np.nan]],
                   dtype=object)


def merge_by_ip(A, B):
    # initializing an empty array with len(B) rows and 5 columns for the values you want to save in it
    C = np.empty([len(B), 5],dtype=object)
    for n in range(len(B)):
        for a in A:
            # checking condition: if ip address in a is ip address in b
            if a[2] == B[n][2]:
                # add the entry of b with the second value of a to the new Array c
                C[n] = np.append(B[n], a[1])
    return C


print(merge_by_ip(array_A, array_B))

Output:

[['123456' 'A1' IPv4Network('192.168.0.0/28') nan 'AF']
 ['987654' 'B1' IPv4Network('192.168.0.0/29') nan 'AE']]

Note:

This solution has O(m * n) complexity, which isn't necessary, there are many out-of-the box (Pandas) and custom (e.g. using dict) ways to merge with lower complexity.

like image 121
mrk Avatar answered Sep 21 '22 23:09

mrk