I have a dask dataframe created from parquet file on HDFS. When creating setting index using api: set_index, it fails with below error.
File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py", line 64, in set_index divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py", line 206, in compute results = get(dsk, keys, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1949, in get results = self.gather(packed, asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1391, in gather asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 561, in sync return sync(self.loop, func, *args, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 241, in sync six.reraise(*error[0]) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 229, in f result[0] = yield make_coro() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "", line 4, in raise_exc_info File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(*exc_info) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1269, in _gather traceback) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py", line 144, in _read_parquet_row_group open=open, assign=views, scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'
Can some one point me to the reason of this error and how to fix it.
Upgrade fastparquet to version 0.1.3.
Dask 0.15.4, used for your example, includes this commit, which adds the argument scheme to read_row_group_file(). This throws an error for fastparquet versions before 0.1.3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With