I implemented a data virtualization solution using some ideas from CodePlex and the blog of Bea Stollnitz and Vincent Da Ven Berhge's paper (same link). However I needed a different approach so I decided to write my own solution.
I am using a DataGrid
to display about a million rows with this solution. I am using UI virtualization as well. My solution is feasible, but I experience some weird behavior in certain situations on how the DataGrid
requests data from its source.
About the solution
I ended up writing a list which does all the heavy work. It is a generic class named VirtualList<T>.
It implements the ICollectionViewFactory
interface, so the collection view creation mechanism can create a VirtualListCollectionView<T>
instance to wrap it. This class inherits from ListCollectionView
. I did not follow the suggestions to write my own ICollectionView
implementation. Inheriting seems to work fine as well.
The VirtualList<T>
splits the whole data into pages. It gets the total item count and every time the DataGrid
requests for a row via the list indexer it loads the appropriate page or returns it from the cache. The pages are recycled inside and a DispatcherTimer
disposes unused pages in idle time.
Data request patterns
The first thing I learned, that VirtualList<T>
should implement IList
(non generic). Otherwise the ItemsControl
will treat it as an IEnumerable
and query/enumerate all the rows. This is logical, since the DataGrid
is not type safe, so it cannot use the IList<T>
interface.
The row with 0 index is frequently asked by the DataGrid
. It is seem to be used for visual item measurement (according to the call stack). So, I simply cache this one.
The caching mechanism inside the DataGrid
uses a predictable pattern to query the rows it shows. First it asks for the visible rows from top to bottom (two times for every row), then it queries a couple of rows (depending on the size of the visible area) before the visible area (including the first visible row) in a descending order so, from bottom to top. After that it requests for a same amount of rows after the visible rows (including the last visible row) from top to bottom.
If the visible row indexes are 4,5,6. The data request would be: 4,4,5,5,6,6,4,3,2,1,6,7,8,9.
If my page size is properly set, I can serve all these requests from the current and previously loaded page.
If CanSelectMultipleItems
is True
and the user selects multiple items using the SHIFT button or mouse drag, the DataGrid
enumerates all the rows from the beginning of the list to the end of the selection. This enumeration happens via the IEnumerable
interface regardless of that IList
is implemented or not.
If the selected row is not visible and the current visible area is "far" from the selected row, sometimes DataGrid starts requesting all the items, from the selected row to the end of the visible area. Including all the rows in between which are not even visible. I could not figure out the exact pattern of this behavior. Maybe my implementation is the reason for that.
My questions
I am wondering, why the DataGrid
requests for non visible rows, since those rows will be requested again when become visible?
Why is it necessary to request every row two or three times?
Can anyone tell me how to make the DataGrid not to use IEnumerable
, except turning off multiple item selection?
I at least found some way to fool the VirtualList. You can read it here.
If you have found another solution (that is even better than mine), please tell me!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With