I am binding a large collection (250,000+ records) to a DataGrid. For this to perform well, it must use both UI Virtualization and Data Virtualization. After some research I figured out how to get both virtualizations to work. But as soon as I do a sort, by clicking on a column header in DataGrid, it abandons data virtualization and attempts to read the entire dataset into memory.
Instead, I want it to pass the sort command to the underlying collection so that the database performs the sort before retrieving the data from disk. Is there a way to do this?
WPF DataGrid (SfDataGrid) provides support to handle the large amount of data through built-in virtualization features. With Data virtualization, SfDataGrid.View process the data in on-demand for better performance while loading large amount of data. Below are the different virtualization concepts available,
Data virtualization is not provided by WPF. For relatively small collections of basic data objects, the memory consumption is not significant; however, for large collections, the memory consumption can become very significant.
DataGrid element represents WPF DataGrid control in XAML. When you drag and drop a DataGrid control from Toolbox to your designer, position the control, this action adds the following code to XA. The Width and Height properties represent the width and the height of a DataGrid.
Surprisingly, many challenges were encountered when changing some WPF DataGrid data from code behind which required a new sorting of the rows and scrolling the DataGrid to show the initially selected rows. The article focuses on describing the problems encountered and how to solve it. At the end is the complete sample code.
I'm answering my own question here in hopes of helping others dealing with this same issue. The information is spread across multiple articles and the Stack Overflow community was immensely helpful in figuring it out.
First, the basics. UI Virtualization means that the control (DataGrid in this case) only creates UI objects for what can be seen on the screen (plus a few more to enable rapid scrolling). It's built into DataGrid and enabled by default. So, there's not much you have to do to enable it. See this article for details.
Data Virtualization means only reading in the corresponding data that's visible on the screen. The rest is left in the database. There are lots of references to data virtualization but I found it difficult to find the right article. This is the one from Microsoft.
In my case, I'm doing random-access virtualization. The summary is that my collection should implement IList, and INotifyCollectionChanged. Optionally, I can also implement IItemsRangeInfo and ISelectionInfo if they will help.
So far, so good. I created a test collection to emulate random access to data from a database. In this case, it created row data algorithmically from the index so that I could test with arbitrarily large virtual collections and eliminate database performance as a factor in these tests. Implementing IList and INotifyCollectionChanged works. I can create a collection with a billion records and the DataGrid performance with near-instantaneous performance. You can grab the scroll bar and move from beginning to end instantaneously.
Two hints that help with making collections intended for Data Virtualization. IList inherits from IEnumerable. With a large, random-access collection you don't want any callers to enumerate the collection. However, DataGrid does call Enumerate once during initialization. You can satisfy this by returning an empty collection. I created a singleton empty collection class for this purpose.
The other IList method you don't want to be called is CopyTo. I simply have that method throw an InvalidOperationException.
This all works. However, as soon as you click on a column header to perform a sort, the control attempts to make a copy of the whole collection. With a billion records I get an out-of-memory error. It seems like implementing IBindingList should fix this since it provides the sorting methods that DataGrid needs. However, implementing IBindingList disables Data Virtualization altogether causing the control to attempt to read all data during initialization.
The answer is in the documentation for CollectionView. When a control, such as DataGrid or ListView binds to a collection it uses a CollectionView as intermediary. The idea is that there's a shared collection (model in MVVM terms) and that sorting and filtering are implemented in the CollectionView rather than the collection itself. That way, if the same collection appears in multiple controls, sorting one doesn't affect others. The various CollectionView implementations accomplish this by making a shadow copy of the bound collection and sorting the shadow. It works well in small collections but it's a disaster for Data Virtualization.
The data binding code selects the view according to the interfaces manifest by the collection being bound. A collection that implements IList is bound by ListCollectionView. If that collection also implements INotifyCollectionChanged then the ListCollectionView will perform data virtualization (until sorting or filtering is invoked). A collection that implements IBindingListView is bound by BindingListCollectionView which does not perform Data Virtualization.
To add sorting to Data Virtualization you have to subclass ListCollectionView, capture the sorting requests, pass them on to your collection class, and stop ListCollectionView from making shadow copies. This is surprisingly easy though I had to consult the source code to ListCollectionView to figure it out. Here's the code:
class VirtualListCollectionView : ListCollectionView
{
VirtualCollection m_collection;
public VirtualListCollectionView(VirtualCollection collection)
: base(collection)
{
m_collection = collection;
}
protected override void RefreshOverride()
{
m_collection.SetSortInternal(SortDescriptions);
// Notify listeners that everything has changed
OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
// The implementation of ListCollectionView saves the current item before updating the search
// and restores it after updating the search. However, DataGrid, which is the primary client
// of this view, does not use the current values. So, we simply set it to "beforeFirst"
SetCurrent(null, -1);
}
}
The key is overriding "RefreshOverride()". That's where the unwanted shadow copy would be made. Instead, the override passes the sort requirements to the associated collection. The special "SetSortInternal()" method on the custom class does not generate an INotifyCollectionChanged event. That's important because the event would cause a recursive call to RefreshOverride().
Next you have to make data binding use your custom CollectionView class rather than the default. There are two ways of accomplishing this. One is to create the VirtualListCollectionView yourself (either in XAML or codebehind) and bind to the view instead of the collection (by assigning it to DataGrid.ItemsSource). The other way is to implement ICollectionViewFactory on your collection and let it create its own view.
In this framework, the CollectionView delegates sorting and filtering to the underlying collection class (IList implementation). Therefore, the collection class becomes part of the view (or ModelView using MVVM terminology) and there should be a 1:1 relationship between them. The shared collection (or Model using MVVM terminology) is the underlying database. To emphasize this, I have experimented with merging both into the same class. It can be done but it gets tricky because both classes implement IList. It's easier to have two objects, each with a reference to the other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With