Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficient insert and fetch UUID in Core Data

Tags:

I am looking for an efficient way to store and search UUID in Core Data. Those UUID are generated by many iOS devices in a distributed system. Each of those devices may store about 20-50k UUIDs.

It is obvious that storing UUID as String in Core Data will hurt the efficiency of indexing on it. But after a series of research I found that storing UUID as Binary Data in Core Data (and index it) may be less efficient than storing it as String.

As there is no BINARY-like or VARBINARY-like data type in SQLit is supported. I guess that any Binary Data type of data in Core Data is stored as BLOB in SQLit. Since BLOB could be slowest data type to be indexed, it will cause bad influence on the performance.

So can anyone help to answer, is there a more efficient way to store UUID in Core Data?

like image 285
Cable W Avatar asked Jul 05 '12 02:07

Cable W


1 Answers

Store them as a ASCII string, and make the field an index.

EDIT

Egads, I happened to be doing some poking about, and came across this. What a shameful answer. I must have been in a bit of a mood that day. If I could, I'd just delete it and move on. However, that's not possible, so I'll provide a snip of an update.

First, the only way to know what is "efficient" is to measure, considering program time and space as well as source code complexity and programmer effort.

Fortunately, this one is pretty easy.

I wrote a very simple OSX application. The model consists of a single attribute: identifier.

None of this matters, if you do not mark your attribute as an index. It will take a whole lot more time when creating the store, but it will make queries much faster.

Also, note that creating a predicate for a binary attribute is exactly the same as creating one for a string:

fetchRequest.predicate =     [NSPredicate predicateWithFormat:@"identifier == %@", identifier]; 

The application is very simple. First, it creates N objects, and assigns a UUID to the identifier attribute. It saves the MOC every 500 objects. We then store all identifiers into an array and randomly shuffle them. The whole CD stack is then torn down completely to remove it all from memory.

Next, we build the stack again, and then iterate over the identifiers, and do a simple fetch. The fetch object is constructed, with a simple predicate to fetch that one object. All of this is done inside an autoreleasepool to keep each fetch as pristine as possible (I acknowledge that there will be some interaction with the CD caches). That's not so important, as we are just comparing the different techniques.

Binary identifier is the 16-bytes for the UUID.

UUID String is a 36-byte string, the result of calling [uuid UUIDString], and it looks like this (B85E91F3-4A0A-4ABB-A049-83B2A8E6085E).

Base64 String is a 24-byte string, the result of base-64 encoding the 16-byte UUID binary data, and it looks like this (uF6R80oKSrugSYOyqOYIXg==) for the same UUID.

Count is the number of objects for that run.

SQLite size is the size of the actual sqlite file.

WAL size is how big the WAL (write-ahead-logging) file gets - just FYI...

Create is the number of seconds to create the database, including saving.

Query is the number of seconds to query each object.

Data Type     | Count (N) | SQLite Size | WAL Size  | Create  | Query --------------+-----------+-------------+-----------+---------+--------- Binary        |   100,000 |   5,758,976 | 5,055,272 |  2.6013 |  9.2669 Binary        | 1,000,000 |  58,003,456 | 4,783,352 | 59.0179 | 96.1862 UUID String   |   100,000 |  10,481,664 | 4,148,872 |  3.6233 |  9.9160 UUID String   | 1,000,000 | 104,947,712 | 5,792,752 | 68.5746 | 93.7264 Base64 String |   100,000 |   7,741,440 | 5,603,232 |  3.0207 |  9.2446 Base64 String | 1,000,000 |  77,848,576 | 4,931,672 | 63.4510 | 94.5147 

The first thing to note here is that the actual database size is much larger than the bytes stored (1,600,000 and 16,000,000) - which is to be expected for a database. The amount of extra storage will be somewhat relative to the size of your actual objects... this one only stores the identifier so the percentage of overhead will be higher).

Second, on the speed issues, for reference, doing the same 1,000,000 object query, but using the object-id in the fetch took about 82 seconds (note the stark difference between that and calling existingObjectWithID:error: which took a whopping 0.3065 seconds).

You should profile your own database, including a judicious use of instruments on the running code. I imagine the numbers would be somewhat different if I did multiple runs, but they are so close that it's not necessary for this analysis.

However, based on these numbers, let's look at efficiency measurements for the code execution.

  • As expected, storing the raw UUID binary data is more efficient in terms of space.
  • The creation time is pretty close (the difference appearing to be based on the time to create the strings and the extra storage space required).
  • The query times seem almost identical, with the binary string appearing to be a tiny bit slower. I think this was the original concern -- doing a query on a binary attribute.

Binary wins space by a lot, and it can be considered a close draw on both creation time and query time. If we just consider those, storing the binary data is the clear winner.

How about source code complexity and programmer time?

Well, if you are using a modern version of iOS and OSX, there is virtually no difference, especially with a simple category on NSUUID.

However, there is one consideration for you, and that's ease of using the data in the database. When you store binary data, it's hard to get a good visual on the data.

So, if, for some reason, you want the data in the database to be stored in a more efficient manner for humans, then storing it as a string is a better choice. So, you may want to consider a base64 encoding (or some other encoding -- though remember it's already in base-256-encoding).

FWIW, here's an example category to provide easier access to the UUID as both NSData and base64 string:

- (NSData*)data {     uuid_t rawuuid;     [self getUUIDBytes:rawuuid];     return [NSData dataWithBytes:rawuuid length:sizeof(rawuuid)]; }  - (NSString*)base64String {     uuid_t rawuuid;     [self getUUIDBytes:rawuuid];     NSData *data = [NSData dataWithBytesNoCopy:rawuuid length:sizeof(rawuuid) freeWhenDone:NO];     return [data base64EncodedStringWithOptions:0]; }  - (instancetype)initWithBase64String:(NSString*)string {     NSData *data = [[NSData alloc] initWithBase64EncodedString:string options:0];     if (data.length == sizeof(uuid_t)) {         return [self initWithUUIDBytes:data.bytes];     }     return self = nil; }  - (instancetype)initWithString:(NSString *)string {     if ((self = [self initWithUUIDString:string]) == nil) {         self = [self initWithBase64String:string];     }     return self; } 
like image 109
Jody Hagins Avatar answered Mar 13 '23 14:03

Jody Hagins