load csv file to numpy and access columns by name

Tags:

I have a csv file with headers like:

Given this test.csv file:

"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

I simply want to load it as a matrix/ndarray with 3 rows and 7 columns and also I want to access the column vectors from a given column name. If I use genfromtxt (like shown below) I get an ndarray with 3 rows (one per line) and no columns.

Click to copy

r = np.genfromtxt('test.csv',delimiter=',',dtype=None, names=True)
print r
print r.shape

[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)

I can get column vectors from column names like this:

Click to copy

print r['A']
  [ 611.88243  611.88243  611.88243]

If, I use load.txt then I get the array with 3 rows and 7 columns but cannot access columns by using the column names (like shown below).

Click to copy

numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)

I get

Click to copy

  [ [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12] ]

Is there any approach in Python that I can achieve both the requirements together (access columns by coluumn name like np.genfromtext and have a matrix like np.loadtxt)?

511

asked Jun 10 '14 14:06

user2481422

2 Answers

Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).

If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:

Click to copy

In [67]: import pandas as pd
In [68]: df = pd.read_csv('data'); df
Out[68]: 
           A          B     C          D           E          F     timestamp
0  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291112e+12
1  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291113e+12
2  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291121e+12    

In [70]: df['A']
Out[70]: 
0    611.88243
1    611.88243
2    611.88243
Name: A, dtype: float64

In [71]: df.shape
Out[71]: (3, 7)

A pure NumPy/Python alternative would be to use a dict to map the column names to indices:

Click to copy

import numpy as np
import csv
with open(filename) as f:
    reader = csv.reader(f)
    columns = next(reader)
    colmap = dict(zip(columns, range(len(columns))))

arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
print(arr[:, colmap['A']])

yields

Click to copy

[[ 611.88243]
 [ 611.88243]
 [ 611.88243]]

This way, arr is a NumPy matrix, with columns that can be accessed by label using the syntax

Click to copy

arr[:, colmap[column_name]]

114

answered Oct 23 '22 03:10

unutbu

Because your data is homogeneous--all the elements are floating point values--you can create a view of the data returned by genfromtxt that is a 2D array. For example,

Click to copy

In [42]: r = np.genfromtxt("test.csv", delimiter=',', names=True)

Create a numpy array that is a "view" of r. This is a regular numpy array, but it is created using the data in r:

Click to copy

In [43]: a = r.view(np.float64).reshape(len(r), -1)

In [44]: a.shape
Out[44]: (3, 7)

In [45]: a[:, 0]
Out[45]: array([ 611.88243,  611.88243,  611.88243])

In [46]: r['A']
Out[46]: array([ 611.88243,  611.88243,  611.88243])

r and a refer to the same block of memory:

Click to copy

In [47]: a[0, 0] = -1

In [48]: r['A']
Out[48]: array([  -1.     ,  611.88243,  611.88243])

answered Oct 23 '22 03:10

Warren Weckesser

Related questions
                            
                                Comsuming MassTransit from Python or other languages
                            
                                Rotated picture looks like it's missing pixels
                            
                                How to download and use python on ubuntu? [closed]
                            
                                django uploading files without model
                            
                                Inverse of numpy.dot
                            
                                Open file for read/write, create if needed
                            
                                Making Probability Distribution Functions (PDFs) from histograms
                            
                                Converting a very small python Decimal into a non-scientific notation string
                            
                                How can I download a PyPI package for pip installation at a later date?
                            
                                How does HAProxy achieves its speed?
                            
                                Functions access to global variables
                            
                                cv2.createTrackbar using python
                            
                                Make a Custom Class JSON serializable
                            
                                memcache.get returns wrong object (Celery, Django)
                            
                                Adding an additional index to an existing multi-index dataframe
                            
                                add a new column to an existing csv file
                            
                                error_perm: 550 Permission denied
                            
                                Pandas: decompress date range to individual dates
                            
                                Regular expression negative lookbehind of non-fixed length
                            
                                What are Constants and Literal constants?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

load csv file to numpy and access columns by name

Tags:

python

arrays

csv

numpy

user2481422

People also ask

2 Answers

unutbu

Warren Weckesser

Recent Activity

Donate For Us