This is what I have thus far:
Stats2003 = np.loadtxt('/DataFiles/2003.txt')
Stats2004 = np.loadtxt('/DataFiles/2004.txt')
Stats2005 = np.loadtxt('/DataFiles/2005.txt')
Stats2006 = np.loadtxt('/DataFiles/2006.txt')
Stats2007 = np.loadtxt('/DataFiles/2007.txt')
Stats2008 = np.loadtxt('/DataFiles/2008.txt')
Stats2009 = np.loadtxt('/DataFiles/2009.txt')
Stats2010 = np.loadtxt('/DataFiles/2010.txt')
Stats2011 = np.loadtxt('/DataFiles/2011.txt')
Stats2012 = np.loadtxt('/DataFiles/2012.txt')
Stats = Stats2003, Stats2004, Stats2004, Stats2005, Stats2006, Stats2007, Stats2008, Stats2009, Stats2010, Stats2011, Stats2012
I am trying to calculate euclidean distance between each of these arrays with every other array but am having difficulty doing so.
I have the output I would like by calculating the distance like:
dist1 = np.linalg.norm(Stats2003-Stats2004)
dist2 = np.linalg.norm(Stats2003-Stats2005)
dist11 = np.linalg.norm(Stats2004-Stats2005)
etc but I would like to make these calculations with a loop.
I am displaying the calculations into a table using Prettytable.
Can anyone point me in the right direction? I haven't found any previous solutions that have worked.
Look at scipy.spatial.distance.cdist.
From the documentation:
Computes distance between each pair of the two collections of inputs.
So you could do something like the following:
import numpy as np
from scipy.spatial.distance import cdist
# start year to stop year
years = range(2003,2013)
# this will yield an n_years X n_features array
features = np.array([np.loadtxt('/Datafiles/%s.txt' % year) for year in years])
# compute the euclidean distance from each year to every other year
distance_matrix = cdist(features,features,metric = 'euclidean')
If you know the start year, and you aren't missing data for any years, then it's easy to determine which two years are being compared at coordinate (m,n) in the distance matrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With