I have a csv file, is it possible to have usecols
take all columns except the last one when utilizing read_csv
without listing every column needed.
For example, if I have a 13 column file, I can do usecols=[0,1,...,10,11]
. Doing usecols=[:-1]
will give me syntax error?
Is there another alternative? I'm using pandas 0.17
This can be done with the help of the pandas. read_csv() method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns.
usecols is supposed to provide a filter before reading the whole DataFrame into memory; if used properly, there should never be a need to delete columns after reading.
We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.
Starting from version 0.20
the usecols
method in pandas accepts a callable filter, i.e. a lambda
expression. Hence if you know the name of the column you want to skip you can do as follows:
columns_to_skip = ['foo','bar']
df = pd.read_csv(file, usecols=lambda x: x not in columns_to_skip )
Here's the documentation reference.
You can just read a single line using nrows=1
to get the cols and then re-read in the full csv skipping the last col by slicing the column array from the first read:
cols = pd.read_csv(file, nrows=1).columns
df = pd.read_csv(file, usecols=cols[:-1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With