Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join omitting output lines when input sorted numerically

Tags:

join

unix

i have two files, aa and bb:

 $ cat aa 
84 xxx
85 xxx
10101 sdf
10301 23

 $ cat bb
82 asd
83 asf
84 asdfasdf
10101 22232
10301 llll

i use the join command to join them:

 $ join aa bb
84 xxx asdfasdf

but what expected is 84, 10101 and 10301 all joined. Why only 84 has been joined?

like image 794
qiuxiafei Avatar asked May 14 '12 14:05

qiuxiafei


People also ask

How to sort the data in descending order in Linux?

-r Option: Sorting In Reverse Order: You can perform a reverse-order sort using the -r flag. the -r flag is an option of the sort command which sorts the input file in reverse order i.e. descending order by default.

Which command from the following can be used to rearrange the output based on the contents of one or more fields?

The Unix sort command is a simple command that can be used to rearrange the contents of text files line by line. The command is a filter command that sorts the input text and prints the result to stdout. By default, sorting is done line by line, starting from the first character.

How do I sort numbers in Linux?

To sort by number pass the -n option to sort . This will sort from lowest number to highest number and write the result to standard output. Suppose a file exists with a list of items of clothing that has a number at the start of the line and needs to be sorted numerically.

Which command is used for sorting?

Sort-r command is used to sort the lines of data in a file in reverse order.


3 Answers

Use a lexicographical sort rather than a numeric sort.

To do this as part of the process:

$ join <(sort aa) <(sort bb)

This gives the output:

10101 sdf 22232
10301 23 llll
84 xxx asdfasdf
like image 70
Charles Duffy Avatar answered Oct 18 '22 01:10

Charles Duffy


You failed to include the fact that an error message is output:

$ join aa bb
join: file 2 is not in sorted order
84 xxx asdfasdf
join: file 1 is not in sorted order

You can use a normal lexicographic sort:

join <(sort aa) <(sort bb) | sort -k1,1n
like image 44
Dennis Williamson Avatar answered Oct 18 '22 01:10

Dennis Williamson


If you want to avoid sorting, then you can zero pad with awk:

join \
 <(awk '{printf("%05d %s\n", $1, $2)}' aa) \
 <(awk '{printf("%05d %s\n", $1, $2)}' bb) \
| awk '{print int($1),$2,$3}'

Generates this output that preserves the original sort order:

84 xxx asdfasdf
10101 sdf 22232
10301 23 llll

You want to avoid sort, because Unix sort is O(n log n).

like image 30
tommy.carstensen Avatar answered Oct 18 '22 03:10

tommy.carstensen