Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concat String by Group

I want to group records by _id and create a string by combining client_id values.

Here are examples of my documents:

{
  "_id" : ObjectId("59e955e633d64c81875bfd2f"),
  "tag_id" : 1,
  "client_id" : "10001"
}
{
  "_id" : ObjectId("59e955e633d64c81875bfd30"),
  "tag_id" : 1,
  "client_id" : "10002"
}

I'd like to have this output:

{
  "_id" : 1
  "client_id" : "10001,10002"
}
like image 912
shiyiwan Avatar asked Oct 20 '17 03:10

shiyiwan


People also ask

Can I use Concat with GROUP BY?

To concatenate strings in MySQL with GROUP BY, you need to use GROUP_CONCAT() with a SEPARATOR parameter which may be comma(') or space (' ') etc.

How do I concatenate multiple rows in a single string in SQL?

You can concatenate rows into single string using COALESCE method. This COALESCE method can be used in SQL Server version 2008 and higher. All you have to do is, declare a varchar variable and inside the coalesce, concat the variable with comma and the column, then assign the COALESCE to the variable.

Can you concatenate a string and a list?

You can concatenate a list of strings into a single string with the string method, join() . Call the join() method from 'String to insert' and pass [List of strings] . If you use an empty string '' , [List of strings] is simply concatenated, and if you use a comma , , it makes a comma-delimited string.

What is the use of group concat in MySQL?

The MySQL GROUP_CONCAT () function is an aggregate function that concatenates strings from a group into a single string with various options. The following shows the syntax of the GROUP_CONCAT () function: GROUP_CONCAT ( DISTINCT expression ORDER BY expression SEPARATOR sep );

What is the difference between group_concat () and concat_WS ()?

Note that GROUP_CONCAT () function concatenates string values in different rows while the CONCAT_WS () or CONCAT () function concatenates two or more string values in different columns. The GROUP_CONCAT () function returns a single string, not a list of values.

How to concatenate strings from using groupby in pandas?

You can use the following basic syntax to concatenate strings from using GroupBy in pandas: This particular formula groups rows by the group_var column and then concatenates the strings in the string_var column. The following example shows how to use this syntax in practice.

How do I Group and concatenate data in a Dataframe?

Group the data using Dataframe.groupby () method whose attributes you need to concatenate. Concatenate the string by using the join function and transform the value of that column using lambda statement.


1 Answers

You can do it with the aggregation framework as a "two step" operation. Which is to first accumulate the items to an array via $push withing a $group pipeline, and then to use $concat with $reduce on the produced array in final projection:

db.collection.aggregate([
  { "$group": {
    "_id": "$tag_id",
    "client_id": { "$push": "$client_id" }
  }},
  { "$addFields": {
    "client_id": {
      "$reduce": {
        "input": "$client_id",
        "initialValue": "",
        "in": {
          "$cond": {
            "if": { "$eq": [ "$$value", "" ] },
            "then": "$$this",
            "else": {
              "$concat": ["$$value", ",", "$$this"]
            }
          }
        }
      }
    }
  }}
])

We also apply $cond here to avoid concatenating an empty string with a comma in the results, so it looks more like a delimited list.

FYI There is an JIRA issue SERVER-29339 which does ask for $reduce to be implemented as an accumulator expression to allow it's use directly in a $group pipeline stage. Not likely to happen any time soon, but it theoretically would replace $push in the above and make the operation a single pipeline stage. Sample proposed syntax is on the JIRA issue.

If you don't have $reduce ( requires MongoDB 3.4 ) then just post process the cursor:

db.collection.aggregate([
  { "$group": {
    "_id": "$tag_id",
    "client_id": { "$push": "$client_id" }
  }},
]).map( doc =>
  Object.assign(
    doc,
   { "client_id": doc.client_id.join(",") }
  )
)

Which then leads to the other alternative of doing this using mapReduce if you really must:

db.collection.mapReduce(
  function() {
    emit(this.tag_id,this.client_id);
  },
  function(key,values) {
    return [].concat.apply([],values.map(v => v.split(","))).join(",");
  },
  { "out": { "inline": 1 } }
)

Which of course outputs in the specific mapReduce form of _id and value as the set of keys, but it is basically the output.

We use [].concat.apply([],values.map(...)) because the output of the "reducer" can be a "delimited string" because mapReduce works incrementally with large results and therefore output of the reducer can become "input" on another pass. So we need to expect that this can happen and treat it accordingly.

like image 99
Neil Lunn Avatar answered Oct 11 '22 16:10

Neil Lunn