Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for a 'text' or 'number' in a csv file with Python AND if exists print only first and second column values to a new csv file

I want to do the following using Python.

Step-1: Read a specific third column on a csv file using Python.
Step-2: Create a list with values got from step-1
Step-3: Take the value of index[0], search in csv file, if present print the values of column 1 and 2 only to a new csv file(There are 6 columns). If Not presents just ignore and goto next search.

file1.csv:

Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,

Python script written for this:

import csv
import time
from collections import defaultdict

columns = defaultdict(list)

with open('file1.csv') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
#print(columns[2])

b=(columns[2])
for x in b[:]:
    time.sleep(1)
    print x

Output of above script:

MacBook-Pro:test_usr$ python csv_file.py 
1
1
2
3
5
6
7
8
8
MacBook-Pro:test_usr$

I am able to do the steps 1 and 2.

Please guide me on doing Step-3. That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?

Output file should look like:

a,ab
b,cd
c,ef
d,gh
e,ij
f,kl
g,mn
h,op
i,qr

Note : Search string will be from another csv file. Please don't suggest the direct answer for printing values of column 1 and 2 directly.

FINAL CODE is looks this:

import csv
import time
from collections import defaultdict

columns = defaultdict(list)

with open('file1.csv') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
            b=(columns[2])

            for x in b[:]:
                with open('file2.csv') as f, open('file3.csv', 'a') as g:
                    reader = csv.reader(f)
                    #next(reader, None) # discard the header
                    writer = csv.writer(g)
                    for row in reader:
                        if row[2] == x:
                            writer.writerow(row[:2])

file1.csv:

Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,

file2.csv:

count,name,number,Type,status,Config Version,,IP1,port
1,bob,1,TRAFFIC,end,1.2,,1.1.1.1,1
2,john,1,TRAFFIC,end,2.1,,1.1.1.2,2
4,foo,2,TRAFFIC,end,1.1,,1.1.1.3,3
5.333333333,test,3,TRAFFIC,end,3.1,,1.1.1.4,4
6.833333333,raa,5,TRAFFIC,end,5.1,,1.1.1.5,5
8.333333333,kaa,6,TRAFFIC,end,7.1,,1.1.1.6,6
9.833333333,thaa,7,TRAFFIC,end,9.1,,1.1.1.7,7
11.33333333,paa,8,TRAFFIC,end,11.1,,1.1.1.8,8
12.83333333,maa,8,TRAFFIC,end,13.1,,1.1.1.9,9

If I run the above script, output of file3.csv:

1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
.
.
.

Its goes like this in loop

But output should be like this:

count,name
1,bob,
2,john,
4,foo,
5.333333333,test,
6.833333333,raa,
8.333333333,kaa,
9.833333333,thaa,
11.33333333,paa,
12.83333333,maa,
like image 747
rcubefather Avatar asked Oct 12 '25 12:10

rcubefather


2 Answers

I think you should reconsider your approach. You can achieve your goal simply by iterating over the CSV file, without creating intermediate dicts and lists..., and since you want to work with specific columns, you'll make your life easier and your code more readable by using DictReader and DictWriter

import csv
import time

search_string = "whatever"

with open('file1.csv', 'rb') as f, open('file2.csv', 'wb') as g:
    reader = csv.DictReader(f)
    c1, c2, c3, *_ = reader.fieldnames
    writer = csv.DictWriter(g, fieldnames=(c1, c2))
    for row in reader:
        if row[c3] == search_string:
            writer.writerow({c1:row[c1], c2:row[c2]})

Keep in mind that csv module will always return strings. You have to handle data-type conversions yourself, if you need them (I've left that out form above).

If you don't want to use DictReader/DictWriter, I suppose it is a little more verbose, and don't want a header in your output file:

with open('file1.csv') as f, open('file2.csv', 'w') as g:
    reader = csv.reader(f)
    next(reader, None) # discard the header
    writer = csv.writer(g)
    for row in reader:
        if row[2] == search_string:
            writer.writerow(row[:2])
like image 172
juanpa.arrivillaga Avatar answered Oct 15 '25 04:10

juanpa.arrivillaga


That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?

This is two questions.

First question: to search for text in a file, the simplest answer would be to read the file text into memory and look for the text. If you want to look for the text in a specific column of the csv you're reading in, you can use a DictReader to make life easy:

for row in reader:
  if search_target in row[header]:
    # found it!

Second question: One way to write specific columns to a new csv would be as follows:

keys = ["Country", "Location"]
new_rows = [{key: row[key] for key in keys} for row in reader]
writer = csv.DictWriter(somefile, keys)
writer.writerows(new_rows)
like image 34
Jon Kiparsky Avatar answered Oct 15 '25 04:10

Jon Kiparsky