Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PicklingError when copying a very large cassandra table using cqlsh

When I try to copy a table to cassandra using the command:

copy images from 'images.csv'

I get the error:

'PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed'

I have successfully imported all of my other tables, but this one is not working. The only difference with this one is that it contains large binary blobs for images.

Here is a sample row from the csv file:

b267ba01-5420-4be5-b962-7e563dc245b0,,0x89504e...[large binary blob]...426082,0,7e700538-cce3-495f-bfd2-6a4fa968bdf6,pentium_e6600,01fa819e-3425-47ca-82aa-a3eec319a998,0,7e700538-cce3-495f-bfd2-6a4fa968bdf6,,,png,0

And here is the file that causes the error: https://www.dropbox.com/s/5mrl6nuwelpf3lz/images.csv?dl=0

Here is my schema:

CREATE TABLE dealtech.images (
    id uuid PRIMARY KEY,
    attributes map<text, text>,
    data blob,
    height int,
    item_id uuid,
    name text,
    product_id uuid,
    scale double,
    seller_id uuid,
    text_bottom int,
    text_top int,
    type text,
    width int
)

The tables were exported using cassandra 2.x and I am currently using cassandra 3.0.9 to import them.

like image 534
Robert Nelson Avatar asked May 16 '17 15:05

Robert Nelson


1 Answers

I ran into this same issue with apache cassandra 3.9, although my datasets were fairly small (46 rows in one table, 262 rows in another table).

PicklingError: Can't pickle <class 'cqlshlib.copyutil.link'>: attribute lookup cqlshlib.copyutil.link failed

PicklingError: Can't pickle <class 'cqlshlib.copyutil.attribute'>: attribute lookup cqlshlib.copyutil.attribute failed

Where link and attribute are types I defined.

The COPY commands were apart of a .cql script that was being run inside a Docker container as apart of it's setup process.

I read in a few places where people were seeing this PicklingError on Windows (seemed to be related to NTFS), but the Docker container in this case was using Alpine Linux.

The fix was to add these options to the end of my COPY commands:

WITH MINBATCHSIZE=1 AND MAXBATCHSIZE=1 AND PAGESIZE=10;

http://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html

I was not seeing the PicklingError running these .cql scripts containing COPY commands locally, so it seems to be an issue that only rears it's head in a low memory situation.

Related issues:

  • Pickling Error running COPY command: CQLShell on Windows
  • Cassandra multiprocessing can't pickle _thread.lock objects
like image 124
bpgriner Avatar answered Oct 09 '22 04:10

bpgriner