Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get an accurate count of items in a bucket

Tags:

The couchbase admin console (I'm using version 5.0, community) shows a count of items in each bucket. I'm wondering if that count is just a rough estimate and not an exact count of the number of items in the bucket. Here's the behavior I'm seeing that leads me to this reasoning:

  • When I use XDCR to replicate a bucket to a backup node, the count in the backup bucket after the XDCR has finished will be significantly higher than the count of documents in the source bucket, sometimes by tens of thousands (in a bucket that contains hundreds of millions of documents).
  • When I use the Java DCP client to clone a bucket to a table in a different database, the other database shows numbers of records that are close, but off by possibly even a few million (again, in a bucket with hundreds of millions of documents).

How can I get an accurate count of the exact number of items in a bucket, so that I can be sure, after my DCP or XDCR processes have completed, that all documents have made it to the new location?

like image 818
Murphy Randle Avatar asked Jul 20 '18 15:07

Murphy Randle


2 Answers

There can be a number of different reasons why the count could be different without more details it would be hard to say. The common cases are:

The couchbase admin console (I'm using version 5.0, community) shows a count of items in each bucket.

The Admin console is accurate but does not auto updated, so a refresh is required.

When I use the Java DCP client to clone a bucket to a table in a different database, the other database shows numbers of records that are close, but off by possibly even a few million (again, in a bucket with hundreds of millions of documents).

DCP will include tombstones (deleted documents) and possibly multiple mutations for the same document. Which could explain why the DCP count is out.

With regards to using N1QL, if the query is a simple SELECT COUNT(*) FROM bucketName then depending on the Couchbase Server version it will use the bucket stats directly.

In other words as mentioned previously the bucket stats via the REST interface or by asking the Data service directly will be accurate.

like image 77
Paddy Avatar answered Oct 11 '22 12:10

Paddy


The most accurate answer would be to go directly to the bucket info something like

curl http://hostname:8091/pools/default/buckets/beer-sample/ -u user:password | jq '.basicStats | {itemCount: .itemCount }'

the result would be immediate, no need for indexing:

{
  "itemCount": 7303
}

or not in Json format

curl http://centos:8091/pools/default/buckets/beer-sample/ -u roi:password | jq '.basicStats.itemCount'
like image 45
Roi Katz Avatar answered Oct 11 '22 14:10

Roi Katz