Convert X and Y arrays into a frequencies grid

Tags:

I would like to convert two arrays (x and y) into a frequency n x n matrix (n = 5), indicating each cell the number of point that contains. It consists on resampling both variables into five intervals and count the existing number of points per cell.

I have tried using pandas pivot_table but don't know the way of referencing to each axis coordinate. X and Y arrays are two dependent variables that contain values between 0 and 100.

I would really appreciate some one's aid. Thank you very much in advance.

This is an example of the code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Arrays example. They are always float type and ranging 0-100. (n_size array = 15)
x = 100 * np.random.random(15)
y = 100 * np.random.random(15)

# Df created for trying to pivot and counting values per cell
df = pd.DataFrame({'X':x,'Y':y})

# Plot the example data:
df.plot(x = 'X',y = 'Y', style = 'o')

This is what I have: enter image description here

This is the objetive matrix, saved as a df: enter image description here

355

asked Oct 23 '20 16:10

Miguel Gonzalez

Video Answer

2 Answers

If you do not explicitly need to use pandas (which you don't, if it's just about a frequency matrix), consider using numpy.histogram2d:

# Sample data
x = 100*np.random.random(15)
y = 100*np.random.random(15)

Construct your bins (since your x and y bins are the same, one set is enough)

bins = np.linspace(0, 100, 5+1)

# bins = array([  0.,  20.,  40.,  60.,  80., 100.])

Now use the histogram function:

binned, binx, biny = np.histogram2d(x, y, bins = [bins, bins])

# To get the result you desire, transpose
objmat = binned.T

Note: x-values are binned along the first dimension(axis 0), which visually means 'vertical'. Hence the transpose.

Plotting:

fig, ax = plt.subplots()
ax.grid()
ax.set_xlim(0, 100)
ax.set_ylim(0, 100)

ax.scatter(x, y)
for i in range(objmat.shape[0]):
    for j in range(objmat.shape[1]):
        c = int(objmat[::-1][j,i])
        ax.text((bins[i]+bins[i+1])/2, (bins[j]+bins[j+1])/2, str(c), fontdict={'fontsize' : 16, 'ha' : 'center', 'va' : 'center'})

Result: enter image description here

177

answered Oct 18 '22 12:10

amzon-ex

You could use GroupBy.size matching group axes to the center of each grid. Then you can use Axes.text to draw them

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(20)


max_val = 100
n = 5
len_group = max_val // 5
x = max_val * np.random.random(15)
y = max_val * np.random.random(15)


# Df created for trying to pivot and counting values per cell
df = pd.DataFrame({'X':x,'Y':y})



x_groups = df['X'] // len_group * len_group + len_group / 2
y_groups = df['Y'] // len_group * len_group + len_group / 2

fig, ax= plt.subplots(figsize=(13, 6))

ax.set_ylim(0, max_val)
ax.set_xlim(0, max_val)

df.plot(x = 'X',y = 'Y', style = 'o', ax=ax)
for i, val in df.groupby([x_groups, y_groups]).size().items():
    ax.text(*i, val,fontdict={'fontsize' : 20, 'ha' : 'center',  'va':'center'})
plt.grid()

enter image description here

answered Oct 18 '22 12:10

ansev

Related questions
                            
                                Numpy: 1D array with various shape
                            
                                Python: URLError: <urlopen error [Errno 10060]
                            
                                Histogram from data which is already binned, I have bins and frequency values
                            
                                Upload a file to a python flask server using curl
                            
                                Get minimum value field name using aggregation in django
                            
                                Python: Frequency of occurrences
                            
                                How to inspect variables after Traceback?
                            
                                shuffle a large list of items without loading in memory
                            
                                Fisher's exact test for bigger than 2 by 2 contingency table
                            
                                How to find out if argparse argument has been actually specified on command line?
                            
                                Are there really only 4 Matplotlib Line Styles?
                            
                                Why does date + timedelta become date, not datetime?
                            
                                List comprehensions in Jinja
                            
                                Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
                            
                                How to insert trailing spaces in a doctest, so that it doesn't fail even when actual and expected result look the same?
                            
                                How can I use map with multi-index in pandas?
                            
                                Conda and Visual Studio Code debugging
                            
                                ValueError: continuous format is not supported
                            
                                How do I omit matplotlib printed output in Python / Jupyter notebook? [duplicate]
                            
                                Get overridden functions of subclass

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert X and Y arrays into a frequencies grid

Tags:

python

pandas

matrix

numpy

pivot-table