How to read in csv with to to a DASK dataframe so it will not have “Unnamed: 0” column?

Question

Goal

I want to read in a csv to a DASK dataframe without getting “Unnamed: 0” column.

CODE

mydtype = {'col1': 'object',
           'col2': 'object',
           'col3': 'object',
           'col4': 'float32',}


do = dd.read_csv('/folder/somecsvname.csv', 
                 dtype = mydtype, 
                 low_memory=False,
                 parse_dates=['col3'],
                )

Result Columns

Unnamed: 0
col1
col2
col3
col4

Tried solutions

1.works with pandas not with dask - pd.read_csv add column named "Unnamed: 0
2.works with pandas not with dask - How to get rid of "Unnamed: 0" column in a pandas DataFrame?
CODE added to read in: index_col=False ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
CODE added to read in: index_col=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
CODE that recommended by previouse 2 error messages-> DISFUCTION: this just sets up a value as an index but still generates that 'Unnamed: 0' column

do = dd.read_csv('/folder/somecsvname.csv', 
                 dtype = mydtype, 
                 low_memory=False,
                 parse_dates=['col3'],
                ).set_index('col3')

CODE added to read in: index_col=None ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
CODE added to read in: index_col=None, header=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead

SultanOrazbayev · Accepted Answer

The problem is that this column (Unnamed: 0) is present in the original csv file. It's best to address it upstream, at the time this file is generated. If that's not possible, then the best you can do with dask.dataframe is:

ddf = dd.read_csv(my_file)
ddf = ddf.drop('Unnamed: 0', axis=1)

Here's a reproducible example:

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame(range(5))
df.to_csv('abc.csv')

ddf = dd.read_csv('abc.csv')
ddf = ddf.drop('Unnamed: 0', axis=1)

How to read in csv with to to a DASK dataframe so it will not have “Unnamed: 0” column?

Tags:

python

pandas

csv

dask

dask-dataframe

sogu

1 Answers

SultanOrazbayev

Recent Activity

Donate For Us

How to read in csv with to to a DASK dataframe so it will not have “Unnamed: 0” column?

Tags:

python

pandas

csv

dask

dask-dataframe

sogu

1 Answers

SultanOrazbayev

Related questions

Recent Activity

Donate For Us