Best way to simulate "group by" from bash?

Tags:

scripting

sort ip_addresses | uniq -c

This will print the count first, but other than that it should be exactly what you want.

The quick and dirty method is as follows:

cat ip_addresses | sort -n | uniq -c

If you need to use the values in bash you can assign the whole command to a bash variable and then loop through the results.

If the sort command is omitted, you will not get the correct results as uniq only looks at successive identical lines.

for summing up multiple fields, based on a group of existing fields, use the example below : ( replace the $1, $2, $3, $4 according to your requirements )

cat file

US|A|1000|2000
US|B|1000|2000
US|C|1000|2000
UK|1|1000|2000
UK|1|1000|2000
UK|1|1000|2000

awk 'BEGIN { FS=OFS=SUBSEP="|"}{arr[$1,$2]+=$3+$4 }END {for (i in arr) print i,arr[i]}' file

US|A|3000
US|B|3000
US|C|3000
UK|1|9000

The canonical solution is the one mentioned by another respondent:

sort | uniq -c

It is shorter and more concise than what can be written in Perl or awk.

You write that you don't want to use sort, because the data's size is larger than the machine's main memory size. Don't underestimate the implementation quality of the Unix sort command. Sort was used to handle very large volumes of data (think the original AT&T's billing data) on machines with 128k (that's 131,072 bytes) of memory (PDP-11). When sort encounters more data than a preset limit (often tuned close to the size of the machine's main memory) it sorts the data it has read in main memory and writes it into a temporary file. It then repeats the action with the next chunks of data. Finally, it performs a merge sort on those intermediate files. This allows sort to work on data many times larger than the machine's main memory.

Related questions
                            
                                Is there a "goto" statement in bash?
                            
                                Meaning of "[: too many arguments" error from if [] (square brackets)
                            
                                How do I read the first line of a file using cat?
                            
                                Multiple commands in an alias for bash
                            
                                VSCode Change Default Terminal
                            
                                Bash: Strip trailing linebreak from output
                            
                                Switching from zsh to bash on OS X, and back again?
                            
                                Which characters need to be escaped when using Bash?
                            
                                How to create a temporary directory?
                            
                                What is the Linux equivalent to DOS pause?
                            
                                How to sort a file in-place
                            
                                How to place the ~/.composer/vendor/bin directory in your PATH?
                            
                                How to get the process ID to kill a nohup process?
                            
                                Returning a boolean from a Bash function
                            
                                In Bash, how to add "Are you sure [Y/n]" to any command or alias?
                            
                                How do I list the functions defined in my shell? [duplicate]
                            
                                Why is #!/usr/bin/env bash superior to #!/bin/bash?
                            
                                Git doesn't work on MacOS Catalina: "xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing" [duplicate]
                            
                                How to list running screen sessions?
                            
                                How to insert a text at the beginning of a file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best way to simulate "group by" from bash?

Tags:

bash

scripting

Related questions

Recent Activity

Donate For Us