Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Having some trouble understanding Linq's INTO keyword

Tags:

c#

linq

1)

into keyword creates temporary identifier for storing results of join, group or select clauses.

I assume into keyword can only be used as part of group, join or select clauses?

2)

a) I've read that when into is used as a part of group or select clauses, it splices the query in two halves and because of that range variables declared in first half of the query ALWAYS go out of scope in the second half of the query. Correct?

b) But when into is used as part of the join clause, rangle variables NEVER go out of the scope within the query ( unless query also contains group...into or select...into ). I assume this is due to into not splicing the query in two halves when used with join clause?

c) A query expression consists of a from clause followed by optional query body ( from,where,let clauses ) and must end with either select of group clause.

d) If into indeed splices query into two halves, is in the following example group clause part of the body:

        var result = from c1 in a1
                     group c1 by c1.name into GroupResult
                     select ...

thank you


Reply to Ufuk:

a)

After a group by you get a sequence of like this IEnumerable>

Doesn't a GroupBy operator return a result of type IEnumerable<IGrouping<Key,Foo>> and not IEnumerable<Key,IEnumerable<Foo>>

b) Couldn't we arguee that group...by...into or join...into do splice the query in a sense that first half of the query at least conceptually must run before the second half of the query can run?

Reply to Robotsushi:

the more I'm thinking about it, the more I get the feeling that my question is pretty pointless since it has no practical value what so ever. Still...

When you say it gets split. Do you mean the scope of the variables gets split or the sql query generated gets split

Here is the quote:

In many cases the range variables on one side of this divide cannot be mixed with the range variables on the other side. The into keyword that is part of this group-by clause is used to link or splice the two halves of this query. As such, it marks the boundary in the midst of the query over which range variables typically cannot climb. The range variables above the into keyword go out of scope in the last part of this query.

My question is whether both halves are still considered a single query and as such the entire query still consists of just three parts. If that is the case, then in my code example ( under d) ) group clause is part of the body. But if both halves are considered two queries, then each of the two queries will consist of three parts


2. reply to Robotsushi:

This chunk of your query is evaluated as one data pull.

I'm not familiar with the term "data pull", so I'm going to guess that what you were trying to say is that first half of the query executes/evaluates as a unit, and then second half of the query takes the results from the first half and uses the results in its execution/evaluation? In other words, conceptually we have two queries?

like image 425
user702769 Avatar asked Sep 11 '11 17:09

user702769


2 Answers

group... by ...into

A group by has to provide a different kind of sequence after the operation.

You have a sequence like this:

IEnumerable<Foo>

After a group by you get a sequence of like this

IEnumerable<Key,IEnumerable<Foo>>

Now your items are in nested sequences and you don't have direct access to them. That's why identifiers in first part are out of scope. Since your first part is out of scope, you are left with the identifier after the into. It has ended and a new query can begin. Your second part of the query works on a total different sequence from the first one. It's a continuation.

from foo in foolist
group foo by foo.name into grouped
//foo is out of scope, you are working on a different sequence now
//and you have a ready to use range variable for your second query

join ... on ... into

On the other hand, group join is not that kind of operation.They operate on two sequences where group by operates on one. They will provide matching elements on the right sequence for the left sequence.

IEnumerable<Left> and IEnumerable<Right>

After the operation it lets you use the identifier from the left sequence, but the identifier in right is out of scope. That's because join returns a sequence of them now. So again you don't have direct access to them. The outcome of group join is like:

IEnumerable<Left,IEnumerable<Right>>

When you use group join, only right range variable goes out of scope. While the left part still remains, you are still working on the same sequence. You haven't provided a projection yet, so you can't continue a second query.

from left in leftList
join right from rightList
    on left.Key equals right.Key into joinedRights
// left is still your range variable, you are still enumerating leftList
// you have to provide a projection here but you won't have a ready to use range variable
// that's why it's not a continuation.
like image 199
Ufuk Hacıoğulları Avatar answered Oct 07 '22 00:10

Ufuk Hacıoğulları


1) correct... to be more specific into provides a reference to the results of a join, group, or select clause that will be out of scope.

2) I don't think your query is split as a result of using into as it is usage is most commonly:

The use of into in a group clause is only necessary when you want to perform additional query operations on each group

Added Response

I've read that when into is used as a part of group or select clauses, it splices the query in two halves and because of that range variables declared in first half of the query ALWAYS go out of scope in the second half of the query. Correct?

This chunk of your query is evaluated as one data pull. The group keyword requires a sort operation to continue evaluation of your LINQ Query:

from c1 in a1
group c1 by c1.name into GroupResult

So in the following select:

select ...

The variables from the first part of the query would have been evaluated, however since you include the into keyword you can work with the results of the query in the select because they are stored into the GroupResult variable.

But when into is used as part of the join clause, rangle variables NEVER go out of the scope within the query ( unless query also contains group...into or select...into ). I assume this is due to into not splicing the query in two halves when used with join clause?

The query is still evaluated in two parts however the GroupResult gives you access to what was declared before the group keyword.

A query expression consists of a from clause followed by optional query body ( from,where,let clauses ) and must end with either select of group clause.

This is a definition not a question.

If into indeed splices query into two halves, is in the following example group clause part of the body:

The group is part of the first half of the query.

This LINQ query shown would generate one sql statement just in case you were curious.

2nd Update

I'm not familiar with the term "data pull", so I'm going to guess that what you were trying to say is that first half of the query executes/evaluates as a unit, and then second half of the query takes the results from the first half and uses the results in its execution/evaluation? In other words, conceptually we have two queries?

Yes there are two different parts of the query.

like image 45
BentOnCoding Avatar answered Oct 07 '22 00:10

BentOnCoding