I need unique guid for every row i'm transforming from source.
below is sample script; code Guid.NewGuid() returns same always for all rows
@Person =
EXTRACT SourceId int,
AreaCode string,
AreaDetail string,
City string
FROM "/Staging/Person"
USING Extractors.Tsv(nullEscape:"#NULL#");
@rs1 =
SELECT
Guid.NewGuid() AS PersonId,
AreaCode,
AreaDetail,
City
FROM @Person;
OUTPUT @rs1
TO "/Datamart/DimUser.tsv"
USING Outputters.Tsv(quoting:false, dateTimeFormat:null);
Please note that U-SQL is a declarative language and as such will snapshot known non-deterministic functions such as Guid.NewGuid()
or DateTime.Now
to one value per script.
While you can work around that by wrapping such functions into a C# function, this practice is highly discouraged, since you are making the script non-deterministic, which can lead to script failures if a node in the execution has to be retried and does not produce a repeatable result!
So how can you provide a unique number?
The options are:
ROW_NUMBER() OVER ()
on the data that you read. If you already have data that you need to guarantee uniqueness against, either add the time ticks of the time the job is run, or get the highest existing value, or get a large enough interval bump, depending on your requirements.Here is a sample that uses the time ticks plus ROW_NUBER()
to make sure that the id is unique for each row everytime you run the script since as mentioned above, U-SQL will evaluate DateTime.Now
once per script invocation:
@data =
SELECT *
FROM (VALUES
( "John", "Doe" ),
( "Paul", "Miller" ),
( "Tracy", "Smith" ),
( "Jane", "Doe")
) AS T(firstname, lastname);
@res =
SELECT DateTime.Now.Ticks+ROW_NUMBER() OVER () AS id,
firstname, lastname
FROM @data;
OUTPUT @res
TO "/output/data.csv"
USING Outputters.Csv();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With