Splitting a large ndarray

Question

I'm newish to python and even more new to pandas, numpy. I'm trying to format a GPS RINEX file so that the file is split into satellites (32 in total). Each file (i.e. satellite) should then be formatted so by epoch (30 second intervals), where each of the signals' data (7 in total) is then displayed in the correnponding columns. For example:

SV1
2014-11-07 00:00:00 L1    L2    P1    P2    C1    S1    S2 
2014-11-07 00:00:30 L1    L2    P1    P2    C1    S1    S2 
2014-11-07 00:00:30 L1    L2    P1    P2    C1    S1    S2

The code, in particular the function, which I'm working on is:

def read_data_chunk(self, RINEXfile, CHUNK_SIZE = 10000):
    obss = np.empty((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
    llis = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
    signal_strengths = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
    epochs = np.zeros(CHUNK_SIZE, dtype='datetime64[us]')
    flags = np.zeros(CHUNK_SIZE, dtype=np.uint8)

    i = 0
    while True:
        hdr = self.read_epoch_header(RINEXfile)
        #print hdr
        if hdr is None:
            break
        epoch, flags[i], sats = hdr
        epochs[i] = np.datetime64(epoch)
        sat_map = np.ones(len(sats)) * -1
        for n, sat in enumerate(sats):
            if sat[0] == 'G':
                sat_map[n] = int(sat[1:]) - 1
        obss[i], llis[i], signal_strengths[i] = self.read_obs(RINEXfile, len(sats), sat_map)
        i += 1
        if i >= CHUNK_SIZE:
            break

    print "obss.ndim: {0}".format(obss.ndim)
    print "obss.shape: {0}" .format(obss.shape)
    print "obss.size: {0}".format(obss.size)
    print "obss.dtype: {0}".format(obss.dtype)
    print "obss.itemsize: {0}".format(obss.itemsize)
    print "obss: {0}".format(obss)

    y = np.split(obss, 32, 1)
    print "y.ndim: {0}".format(y[3].ndim)
    print "y.shape: {0}" .format(y[3].shape)
    print "y.size: {0}".format(y[3].size)
    print "y_0: {0}".format(y[3])

    return obss[:i], llis[:i], signal_strengths[:i], epochs[:i], flags[:i]

The print statements are there just to understand the dimensions involved, the results of which:

obss.ndim: 3
obss.shape: (10000L, 32L, 7L)
obss.size: 2240000
obss.dtype: float64
obss.itemsize: 8
y.ndim: 3
y.shape: (10000L, 1L, 7L)
y.size: 70000

The exact problem I'm encountering is just how to manipulate exactly so that the array is split into its subsequent 32 parts (i.e. the satellites). Below is an example of the output so far:

sats = np.rollaxis(obss, 1, 0) 
sat = sats[5] #sv6 
sat.shape: (10000L, 7L) 
sat.ndim: 2 
sat.size: 70000 
sat.dtype: float64 
sat.item
size: 8 
sat: [[ -7.28308440e+06 -5.66279406e+06 2.38582902e+07 ..., 2.38582906e+07 4.70000000e+01 4.20000000e+01] [ -7.32362993e+06 -5.69438797e+06 2.38505736e+07 ..., 2.38505742e+07 4.70000000e+01 4.20000000e+01] [ -7.36367675e+06 -5.72559325e+06 2.38429526e+07 ..., 2.38429528e+07 4.60000000e+01 4.20000000e+01]

The output above is for the 6th satellite ("sat") and shows the signals for the first 3 epochs. I tried the below code to open up new files separately but the resulting text files just displayed the output below:

Code:

for i in range(32): 
    sat = obss[:, i] 
    open(((("sv{0}").format(sat)),'w').writelines(sat))

Output in text file:

ø ø ø ø ø ø ø

So obviously there's something wrong with the manipulation of the array that I'm overlooking. The read_data_chunk function is called from the read_data function:

def read_data(self, RINEXfile): 
    obs_data_chunks = [] 
    while True: 
        obss, _, _, epochs, _ = self.read_data_chunk(RINEXfile) 
        if obss.shape[0] == 0: 
            break 

        obs_data_chunks.append(pd.Panel( np.rollaxis(obss, 1, 0), items=['G%02d' % d for d in range(1, 33)], major_axis=epochs,minor_axis=self.obs_types).dropna(axis=0, how='all').dropna(axis=2, how='all'))   

    print "obs_data_chunks: {0}".format(obs_data_chunks) 
    self.data = pd.concat(obs_data_chunks, axis=1)

The next step I tried was in the above code, as I figured this array is perhaps the right one to be manipulated. The final print statement:

obs_data_chunks: [<class 'pandas.core.panel.Panel'> 
Dimensions: 32 (items) x 2880 (major_axis) x 7 (minor_axis) 
Items axis: G01 to G32 
Major_axis axis: 2014-04-27 00:00:00 to 2014-04-27 23:59:30 
Minor_axis axis: L1 to S2]

I tried to figure out how to deal with the obs_data_chunks array using:

odc = np.rollaxis(obs_data_chunks, 1) 
odc_temp = odc[5]

but received an error: AttributeError: 'list' object has no attribute 'ndim'

askewchan · Accepted Answer

It depends on what exactly you want to do with these 32 satellite subsets. As far as I can tell, the way you currently have obss, with shape (10000, 32, 7), you already have it "split" in a way. Here's how you can access them:

Slice along the 'satellite' dimension, which is axis=1:

sat = obss[:, 0]  # all the data for satellite 0, with shape (10000, 7)
sat = obss[:, i]  # for any i from 0 through 31.
sats = obss[:, :3] # the first three satellites

If you find that you are mainly indexing by satellite, you can move its axis to the front with np.rollaxis:

sats = np.rollaxis(obss, 1)
sats.shape
# (32, 10000, 7)
sat = sats[i]  # satellite i, equivalent to obss[:, i]
sat = sats[:3] # first three satellites

If you want to loop through the satellites, as you would in your y = np.split(obss) example, an easier way to do that is:
```
for i in range(32):
    sat = obss[:, i]
    ...
```
or, if you roll the axis for sats, you can just do:
```
sats = np.rollaxis(obss, 1)
for sat in sats:
    ...
```
Finally, if you really want a list of the satellites, you can do
```
sats = np.rollaxis(obss, 1)
satlist = list(sats)
```

Splitting a large ndarray

Tags:

python

split

pandas

numpy

gps

pymat

1 Answers

askewchan

Recent Activity

Donate For Us

Splitting a large ndarray

Tags:

python

split

pandas

numpy

gps

pymat

1 Answers

askewchan

Related questions

Recent Activity

Donate For Us