I need to remove a column with label name at the time of loading a csv using pandas
. I am reading csv as follows and want to add parameters inside it to do so. Thanks.
pd.read_csv("sample.csv")
I know this to do after reading csv:
df.drop('name', axis=1)
There are 2 ways to remove a column entirely from a CSV in python. Let us now focus on the techniques : With pandas library — drop() or pop() Without pandas library.
Question Detail Do select [column_names] from [table] store this in a CSV file. Now you will have only required columns in the output CSV. You can use alter SQL command to delete a column but I never tried it. Another option is open the CSV using excel command and use delete column to delete the entire column.
If you know the column names prior, you can do it by setting usecols
parameter
When you know which columns to use
Suppose you have csv file with columns ['id','name','last_name']
and you want just ['name','last_name']
. You can do it as below:
import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])
when you want first N columns
If you don't know the column names but you want first N columns from dataframe. You can do it by
import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])
Edit
When you know name of the column to be dropped
# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)
# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
Get the column headers from your CSV using pd.read_csv
with nrows=1
, then do a subsequent read with usecols
to pull everything but the column(s) you want to omit.
headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))
Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,
import csv
with open("sample.csv", 'r') as f:
header = next(csv.reader(f))
# For python 2, use
# header = csv.reader(f).next()
df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))
The only parameter to read_csv()
that you can use to select the columns you use is usecols
. According to the documentation, usecols
accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:
pd.read_csv("sample.csv",
usecols=lambda x: x != 'name'
)
And you could of course say x not in ['unwanted', 'column', 'names']
if you had a list of column names you didn't want to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With