Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop a specific column of csv file while reading it using pandas?

I need to remove a column with label name at the time of loading a csv using pandas. I am reading csv as follows and want to add parameters inside it to do so. Thanks.

pd.read_csv("sample.csv")

I know this to do after reading csv:

df.drop('name', axis=1)
like image 831
Anon George Avatar asked Feb 21 '18 05:02

Anon George


People also ask

How do I drop a column in a CSV file in Python?

There are 2 ways to remove a column entirely from a CSV in python. Let us now focus on the techniques : With pandas library — drop() or pop() Without pandas library.

How do I remove a column from a CSV file?

Question Detail Do select [column_names] from [table] store this in a CSV file. Now you will have only required columns in the output CSV. You can use alter SQL command to delete a column but I never tried it. Another option is open the CSV using excel command and use delete column to delete the entire column.


3 Answers

If you know the column names prior, you can do it by setting usecols parameter

When you know which columns to use

Suppose you have csv file with columns ['id','name','last_name'] and you want just ['name','last_name']. You can do it as below:

import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])

when you want first N columns

If you don't know the column names but you want first N columns from dataframe. You can do it by

import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])

Edit

When you know name of the column to be dropped

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
like image 192
Sociopath Avatar answered Oct 20 '22 04:10

Sociopath


Get the column headers from your CSV using pd.read_csv with nrows=1, then do a subsequent read with usecols to pull everything but the column(s) you want to omit.

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))
like image 11
cs95 Avatar answered Oct 20 '22 04:10

cs95


The only parameter to read_csv() that you can use to select the columns you use is usecols. According to the documentation, usecols accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:

pd.read_csv("sample.csv", 
            usecols=lambda x: x != 'name'
            )

And you could of course say x not in ['unwanted', 'column', 'names'] if you had a list of column names you didn't want to use.

like image 10
jalopezp Avatar answered Oct 20 '22 04:10

jalopezp