Cassandra buffered read of millions of columns

Question

I've got a cassandra cluster with a small number of rows (< 100). Each row has about 2 million columns. I need to get a full row (all 2 million columns), but things start failing all over the place before I can finish my read. I'd like to do some kind of buffered read.

Ideally I'd like to do something like this using Pycassa (no this isn't the proper way to call get, it's just so you can get the idea):

results = {}
start = 0
while True:
    # Fetch blocks of size 500
    buffer = column_family.get(key, column_offset=start, column_count=500)
    if len(buffer) == 0:
        break

    # Merge these results into the main one
    results.update(buffer)

    # Update the offset
    start += len(buffer)

Pycassa (and by extension Cassandra) don't let you do this. Instead you need to specify a column name for column_start and column_finish. This is a problem since I don't actually know what the start or end column names will be. The special value "" can indicate the start or end of the row, but that doesn't work for any of the values in the middle.

So how can I accomplish a buffered read of all the columns in a single row? Thanks.

Chris K · Accepted Answer

From the pycassa 1.0.8 documentation

it would appear that you could use something like the following [pseudocode]:

results = {}
start = 0
startColumn = ""
while True:
    # Fetch blocks of size 500

   buffer = get(key, column_start=startColumn, column_finish="", column_count=100)
   # iterate returned values. 
   # set startColumn == previous column_finish.

Remember that on each subsequent call you're only get 99 results returned, because it's also returning startColumn, which you've already seen. I'm not skilled enough in Python yet to iterate on buffer to extract the column names.

user1987428 · Answer

In v1.7.1+ of pycassa you can use xget and get a row as wide as 2**63-1 columns.

for col in cf.xget(key, column_count=2**63-1):
    # do something with the column.

Cassandra buffered read of millions of columns

Tags:

python

cassandra

pycassa

Chris Eberle

2 Answers

Chris K

user1987428

Recent Activity

Donate For Us

Cassandra buffered read of millions of columns

Tags:

python

cassandra

pycassa

Chris Eberle

2 Answers

Chris K

user1987428

Related questions

Recent Activity

Donate For Us