Goal
I want to read in a csv to a DASK dataframe without getting “Unnamed: 0” column.
CODE
mydtype = {'col1': 'object',
'col2': 'object',
'col3': 'object',
'col4': 'float32',}
do = dd.read_csv('/folder/somecsvname.csv',
dtype = mydtype,
low_memory=False,
parse_dates=['col3'],
)
Result Columns
Tried solutions
index_col=False ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') insteadindex_col=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') insteaddo = dd.read_csv('/folder/somecsvname.csv',
dtype = mydtype,
low_memory=False,
parse_dates=['col3'],
).set_index('col3')
index_col=None ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') insteadindex_col=None, header=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') insteadThe problem is that this column (Unnamed: 0) is present in the original csv file. It's best to address it upstream, at the time this file is generated. If that's not possible, then the best you can do with dask.dataframe is:
ddf = dd.read_csv(my_file)
ddf = ddf.drop('Unnamed: 0', axis=1)
Here's a reproducible example:
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(range(5))
df.to_csv('abc.csv')
ddf = dd.read_csv('abc.csv')
ddf = ddf.drop('Unnamed: 0', axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With