Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read specific columns in csv using python

Tags:

python

sql

csv

I have a csv file that look like this:

+-----+-----+-----+-----+-----+-----+-----+-----+
| AAA | bbb | ccc | DDD | eee | FFF | GGG | hhh |
+-----+-----+-----+-----+-----+-----+-----+-----+
|   1 |   2 |   3 |   4 |  50 |   3 |  20 |   4 |
|   2 |   1 |   3 |   5 |  24 |   2 |  23 |   5 |
|   4 |   1 |   3 |   6 |  34 |   1 |  22 |   5 |
|   2 |   1 |   3 |   5 |  24 |   2 |  23 |   5 |
|   2 |   1 |   3 |   5 |  24 |   2 |  23 |   5 |
+-----+-----+-----+-----+-----+-----+-----+-----+

...

How can I only read the columns "AAA,DDD,FFF,GGG" in python and skip the headers? The output I want is a list of tuples that looks like this: [(1,4,3,20),(2,5,2,23),(4,6,1,22)]. I'm thinking to write these data to a SQLdatabase later.

I referred to this post:Read specific columns from a csv file with csv module?. But I don't think it is helpful in my case. Since my .csv is pretty big with whole bunch of columns, I hope I can tell python the column names I want, so python can read the specific columns row by row for me.

like image 896
Ni Yan Avatar asked Nov 19 '13 03:11

Ni Yan


4 Answers

I realize the answer has been accepted, but if you really want to read specific named columns from a csv file, you should use a DictReader (if you're not using Pandas that is).

import csv
from StringIO import StringIO

columns = 'AAA,DDD,FFF,GGG'.split(',')


testdata ='''\
AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
1,2,3,4,50,3,20,4
2,1,3,5,24,2,23,5
4,1,3,6,34,1,22,5
2,1,3,5,24,2,23,5
2,1,3,5,24,2,23,5
'''

reader = csv.DictReader(StringIO(testdata))

desired_cols = (tuple(row[col] for col in columns) for row in reader)

Output:

>>> list(desired_cols)
[('1', '4', '3', '20'),
 ('2', '5', '2', '23'),
 ('4', '6', '1', '22'),
 ('2', '5', '2', '23'),
 ('2', '5', '2', '23')]
like image 87
JaminSore Avatar answered Oct 06 '22 01:10

JaminSore


def read_csv(file, columns, type_name="Row"):
  try:
    row_type = namedtuple(type_name, columns)
  except ValueError:
    row_type = tuple
  rows = iter(csv.reader(file))
  header = rows.next()
  mapping = [header.index(x) for x in columns]
  for row in rows:
    row = row_type(*[row[i] for i in mapping])
    yield row

Example:

>>> import csv
>>> from collections import namedtuple
>>> from StringIO import StringIO
>>> def read_csv(file, columns, type_name="Row"):
...   try:
...     row_type = namedtuple(type_name, columns)
...   except ValueError:
...     row_type = tuple
...   rows = iter(csv.reader(file))
...   header = rows.next()
...   mapping = [header.index(x) for x in columns]
...   for row in rows:
...     row = row_type(*[row[i] for i in mapping])
...     yield row
... 
>>> testdata = """\
... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
... 1,2,3,4,50,3,20,4
... 2,1,3,5,24,2,23,5
... 4,1,3,6,34,1,22,5
... 2,1,3,5,24,2,23,5
... 2,1,3,5,24,2,23,5
... """
>>> testfile = StringIO(testdata)
>>> for row in read_csv(testfile, "AAA GGG DDD".split()):
...   print row
... 
Row(AAA='1', GGG='20', DDD='4')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='4', GGG='22', DDD='6')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='2', GGG='23', DDD='5')
like image 38
Roger Pate Avatar answered Oct 05 '22 23:10

Roger Pate


import csv

DESIRED_COLUMNS = ('AAA','DDD','FFF','GGG')

f = open("myfile.csv")
reader = csv.reader(f)

headers = None
results = []
for row in reader:
    if not headers:
        headers = []
        for i, col in enumerate(row):
        if col in DESIRED_COLUMNS:
            # Store the index of the cols of interest
            headers.append(i)

    else:
        results.append(tuple([row[i] for i in headers]))

print results
like image 30
Brad M Avatar answered Oct 06 '22 00:10

Brad M


Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.

Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

from petl import fromcsv, look, cut, tocsv    

    #Load the table
    table1 = fromcsv('table1.csv')
    # Alter the colums
    table2 = cut(table1, 'Song_Name','Artist_ID')
    #have a quick look to make sure things are ok.  Prints a nicely formatted table to your console
    print look(table2)
    # Save to new file
    tocsv(table2, 'new.csv')
like image 27
PeteBeat Avatar answered Oct 06 '22 01:10

PeteBeat