Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dask dataframe from python list of tuples

I am really new to dask. I want to create a dask dataframe from a python list of tuples. In pandas, you can use DataFrame.from_records to convert a list of tuples to a dataframe. What function can give me same functionality in dask. My data looks a bit like this

[(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')]

I am using this code to perform the task right now. Is this correct way of doing this.

import pandas as pd
import dask
import dask.dataframe as dd

names = ['id', 'status', 'reg_entry']
dfs = dask.delayed(pd.DataFrame.from_records)(cursor.fetchall(), columns=names)

df = dd.from_delayed(dfs)
like image 806
Ali. K Avatar asked Oct 16 '25 07:10

Ali. K


1 Answers

You can try creating a dask dataframe from an existing pandas dataframe (to be able to use all pandas constructors):

df = pd.DataFrame([(21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', 'NULL'), (21262, 'booking', ''), (21262, 'booking', 'NULL')])
ddf = dd.from_pandas(df, npartitions=2)
like image 173
Tina Iris Avatar answered Oct 17 '25 21:10

Tina Iris