Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why relationship must be optional when using Core Data with CloudKit?

Below is one of the requirements to use Core Data with Cloudkit in Apple's doc:

All relationships must be optional. Due to operation size limitations, relationship changes may not be saved atomically.

Attempting to use an optional relationship with CloudKit results in the error:

Thread 1: Fatal error: Unresolved error Error Domain=NSCocoaErrorDomain Code=134060 "A Core Data error occurred." UserInfo={NSLocalizedFailureReason=CloudKit integration requires that all relationships be optional, the following are not: Some_Managed_Object: some_attribute}, ["NSLocalizedFailureReason": CloudKit integration requires that all relationships be optional, the following are not: Some_Managed_Object: some_attribute]

I wonder, doesn't that completely defeat the purpose of using relationship?

For example, suppose I have two entities: Account and Transfer. Since a transfer is always associated with a source account and a destination account, Transfer should have two non-optional relationships with Account. But due to the above requirement, these relationships have to be optional.

The doc gives an explanation: "(It's because) relationship changes may not be saved atomically". That seems to suggest that, during the sync between Cloudkit and Core Data, relationship may be incomplete and the incomplete relationship is exposed to App code. That seems a serious issue to me, because:

  1. In my above example, the two relationships are non-optional by their nature. Changing them to optional makes the modal meaningless.

  2. Even in those examples where the relationships should be optional, while incomplete relationship is syntactically correct, it may cause unexpected inconsistency issue.

So I wonder how this is supposed to work in real apps? It seems quite broken to me. Am I misunderstanding something? Could it be that using Cloudkit to sync Core Data is only applicable to a small set of apps which only use optional relationships? (If so, I wonder how the other Core Data apps sync their data among devices.)


On a related note: like many others I tried hard to search for details on the sync and conflict resolving algorithms used by Cloudkit and Core Data. The only few information I can find are:

  • https://developer.apple.com/forums/thread/121196

In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that

  • https://mjtsai.com/blog/2019/06/04/syncing-core-data-with-cloudkit-and-nspersistentcloudkitcontainer/

Yup, Core Data CloudKit implements to-many relationships using CRDTs!

  • https://developer.apple.com/videos/play/wwdc2019/202/

Conflict resolution is implemented automatically by NSPersistentCloudKitContainer using a last writer wins merge policy.

While I roughly understand each piece of those information, they don't give direct conclusion about 1) Are data changes synced between Cloudkit and Core Data in an atomic way or not? and more importantly 2) Are incomplete data exposed to App code during the sync?

My guess is 1) No and 2) Yes. But it's hard for me to understand how to write a real app if incomplete data change are exposed to App code during the sync. Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

I would greatly appreciate it if anyone could share how you understand it.

like image 678
rayx Avatar asked Nov 14 '22 23:11

rayx


2 Answers

Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

That is basically it — the model and code which work with the model need to meet this criteria.

When CloudKit delivers changed records from a zone, an operation is not guaranteed to contain the complete object graph in a single “delivery” (see: recordZoneFetchResultBlock) so the Core Data team decided that partial datasets are of higher priority than atomic ones (as noted). I can’t speak for them, but my assumption for this direction is due to performance and complexity reasons.

Take a device which is a new client or hasn’t been connected in awhile, requiring 1,000 records to be consumed: the delivery of that data may be broken up into 2 trips (fetch result block calls), the first containing 700 records (with its own partial transfer change token) and the second with the last 300 (and the up to date store change token). CloudKit makes no promises on complete or ordered delivery of what is needed to complete a graph in either of those trips (there are circumstances where sending the full graph in a trip might not even be possible) which would result in required relationships being unfulfilled during incremental saves (see this answer). Otherwise, Core Data would need to churn every single record from a cloud store in memory before committing anything to disk in order to properly maintain that integrity.

Unfortunately, this means your code needs to handle relationships by ensuring it is valid before accessing / doing work on it. If you need to guarantee a relationship client side because there’s no other way to decouple the object graph functionality, you might need to dive into the CloudKit framework and either build a query operation to confirm the relationship in CloudKit’s dataset or a fetch operation to handle importing that data atomically instead of relying on automatic behaviors.

like image 109
Dandy Avatar answered Feb 06 '23 18:02

Dandy


Well, CoreData is a relational database and CloudKit can be perceived as a NoSQL database. Apple is trying their best to bridge the gap. The complains about relationships, and constraints can be better understood if you understand the many design considerations of NoSQL databases that are currently popular.

Simply said, the reason is for distributed scalability and performance. Having relationships is one of the key reasons why Relational Databases can not be used in many cloud environments that needs a lot of data, and are not very "distributed" in nature.

like image 32
Phuah Yee Keat Avatar answered Feb 06 '23 17:02

Phuah Yee Keat