When using Python Pandas to read a CSV it is possible to specify the index column. Is this possible using Python Dask when reading the file, as opposed to setting the index afterwards?
For example, using pandas:
df = pandas.read_csv(filename, index_col=0)
Ideally using dask could this be:
df = dask.dataframe.read_csv(filename, index_col=0)
I have tried
df = dask.dataframe.read_csv(filename).set_index(?)
but the index column does not have a name (and this seems slow).
To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type <code>df. set_index(“Year”)</code>. Now, the set_index() method will return the modified dataframe as a result.
This schema, allows your csv database to accept record additions, deletions and updates. Additions are made at the end of the file. To delete a record, just change the first character of the record with a unique character like 0x0 and of course delete the entry from the index file.
No, these need to be two separate methods. If you try this then Dask will tell you in a nice error message.
In [1]: import dask.dataframe as dd
In [2]: df = dd.read_csv('*.csv', index='my-index')
ValueError: Keyword 'index' not supported dd.read_csv(...).set_index('my-index') instead
But this won't be any slower or faster than doing it the other way.
I know I'm a bit late, but this is the first result on google so it should get answered.
If you write your dataframe with:
# index = True is default
my_pandas_df.to_csv('path')
#so this is same
my_pandas_df.to_csv('path', index=True)
And import with Dask:
import dask.dataframe as dd
my_dask_df = dd.read_csv('path').set_index('Unnamed: 0')
It will use column 0 as your index (which is unnamed thanks to pandas.DataFrame.to_csv() ).
my_dask_df = dd.read_csv('path')
my_dask_df.columns
which returns
Index(['Unnamed: 0', 'col 0', 'col 1',
...
'col n'],
dtype='object', length=...)
Now you can write: df = pandas.read_csv(filename, index_col='column_name')
(Where column name is the name of the column you want to set as the index).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With