When you create a custom aggregate function you need to specified the enumeration format:
Format Enumeration is used by SqlUserDefinedTypeAttribute and SqlUserDefinedAggregateAttribute to indicate the serialization format of a user-defined type (UDT) or aggregate.
and when UserDefined
format is used, your class need to implement IBinarySerialize Interface and override its read
and write
methods.
My question is what exactly these methods need to do?
Looking at the examples, I guess they should be able to read/write the aggregation result?
For example, I am trying to create a SQL CLR function that concatenates distinct numbers. In the T-SQL I can have from 1 to 255 distinct numbers (TINYINT value). I need to create a string from them (using delimiter), but sorting the numbers as well. The function seems to work, but I am not exactly sure I have override the methods as expected:
[Serializable]
[
Microsoft.SqlServer.Server.SqlUserDefinedAggregate
(
Microsoft.SqlServer.Server.Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = true,
IsInvariantToOrder = false,
MaxByteSize = 1024
)
]
public class ConcatenateAnswersPos : Microsoft.SqlServer.Server.IBinarySerialize
{
private List<byte> intermediateResult;
public void Init()
{
intermediateResult = new List<byte>();
}
public void Accumulate(SqlByte value)
{
intermediateResult.Add((byte)value);
}
public void Merge(ConcatenateAnswersPos other)
{
intermediateResult.AddRange(other.intermediateResult);
}
public SqlString Terminate()
{
if (intermediateResult != null)
{
intermediateResult.Sort();
return new SqlString(string.Join(";", intermediateResult));
}
else
{
return new SqlString("");
}
}
public void Read(BinaryReader r)
{
if (r == null) throw new ArgumentNullException("r");
intermediateResult = new List<byte>();
string[] answers = r.ReadString().Split(';');
foreach (string answer in answers)
{
intermediateResult.Add(Convert.ToByte(answer));
}
}
public void Write(BinaryWriter w)
{
if (w == null) throw new ArgumentNullException("w");
intermediateResult.Sort();
w.Write(string.Join(";", intermediateResult));
}
}
Any particular instance of the User-Defined Aggregate (UDA) is not guaranteed to exist throughout the entire lifetime of the query. It needs to have a storable representation. When you only use value types (as noted in the "enumeration format" link in the question), the provided Read
and Write
methods understand how to serialize and deserialize the UDA, in which case you would use Format.Native
. But when you start using reference types (String, collections, custom types, etc) then it is up to you to define how those values get serialized and deserialized, in which case you need to use Format.UserDefined
and override the Read
and Write
methods so you can be in control of those operations.
The values that need to be serialized are anything that is needed to bring the new instance of the UDA back to the exact state that it was in prior to it getting serialized. This means: do not rely upon the Init() method running (it runs once per group!) or variable initializers (they run once per instantiation, and the UDA can be re-used for multiple groups without re-creating!). So you need to serialize all base values, even if they are not directly related to the final output.
That said, you should, at the very least, do the optimizations noted in @Damien_The_Unbeliever's answer:
Don't do the sort in the Write
method. You are already doing it in the Terminate
method (the appropriate place), so it is useless to do twice, not to mention very inefficient.
Store the count of the collection and then the individual elements
Beyond that:
When you say that your UDA "concatenates distinct numbers" if you really meant "distinct" then you need to check each number as it comes in to see if it is in the list already. I suspect this is your desire since you have IsInvariantToDuplicates
set to true. You would do this both in the Accumulate
method:
if (!intermediateResult.Contains(value.Value))
{
intermediateResult.Add(value.Value);
}
and in the Merge
method (called when parallelism is involved):
foreach (byte _NewValue in other.intermediateResult)
{
if (!intermediateResult.Contains(_NewValue))
{
intermediateResult.Add(_NewValue);
}
}
Please note that I changed your cast -- (byte)value
-- in the Accumulate
method into using the Value
property. All of the SqlTypes
(e.g. SqlByte
, SqlString
, SqlInt32
, etc.) have a Value
property that returns the .NET type that you would expect. This means that there is no need to call ToString()
on a SqlString
as many people seem to do.
I would be cautious with a MaxByteSize
of 1024. This concern would be partially mitigated by implementing @Damien's suggestions given that saving "165;207" in a string (current method) is technically 14 bytes (7 characters * 2 bytes per char) whereas saving the count and individual bytes is only 6 bytes (4 bytes for the Int32 to store the count
+ 2 individual bytes). And this disparity is just for storing 2 values. Storing 200? Yeesh!
You do not have the IsNullIfEmpty property specified. You need to specify this, especially considering that your Terminate
method is returning an empty string if the internal collection is null
. You should add IsNullIfEmpty = false
since you don't want to return NULL
if this is never called.
The extra logic in the Terminate
method to handle a null
collection is probably not necessary. The collection is initialized in the Init
and Read
methods, so I am not sure how it could ever be null
by the time Terminate
is called.
If you want an example of creating a User-Defined Aggregate with Format.UserDefined
, then take a look at Getting The Most Out of SQL Server 2005 UDTs and UDAs (free registration required). I wrote that prior to SQL Server 2008 coming out which allowed for more than 8000 bytes to be serialized, so you can ignore (for the moment) the aspects regarding compressing the data to serialize.
Also, if you want to learn more about SQLCLR in general, I am writing a series on it for SQL Server Central: Stairway to SQLCLR (same site as the first linked article).
I would say that you're doing more work in your methods than you need to. All you need to do is write enough in the Write
method so that your Read
method can reconstruct your internal state. Since your internal state is just a List<byte>
, there no need to treat everything as strings:
public void Read(BinaryReader r)
{
if (r == null) throw new ArgumentNullException("r");
var count= r.ReadInt32();
intermediateResult = new List<byte>(count);
for (int i=0;i<count;i++)
{
intermediateResult.Add(r.ReadByte());
}
}
public void Write(BinaryWriter w)
{
if (w == null) throw new ArgumentNullException("w");
w.Write(intermediateResult.Count);
foreach(byte b in intermediateResult)
{
w.Write(b);
}
}
And as I suggested in the comments, I've also removed the Sort
from the Write
method since there'll always be a final Sort
call in Terminate
before the data you've constructed is passed to consumers of your aggregate.
We store the Count
first in the data so that we know how many times to call ReadByte
in the Read
method. This also allows a (probably pointless) optimization that we can tell the List<byte>
constructor exactly how much items it needs space for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With