I think of xargs
as the map function of the UNIX shell. What is the filter
function?
EDIT: it looks like I'll have to be a bit more explicit.
Let's say I have to hand a program which accepts a single string as a parameter and returns with an exit code of 0 or 1. This program will act as a predicate over the strings that it accepts.
For example, I might decide to interpret the string parameter as a filepath, and define the predicate to be "does this file exist". In this case, the program could be test -f
, which, given a string, exits with 0 if the file exists, and 1 otherwise.
I also have to hand a stream of strings. For example, I might have a file ~/paths
containing
/etc/apache2/apache2.conf
/foo/bar/baz
/etc/hosts
Now, I want to create a new file, ~/existing_paths
, containing only those paths that exist on my filesystem. In my case, that would be
/etc/apache2/apache2.conf
/etc/hosts
I want to do this by reading in the ~/paths
file, filtering those lines by the predicate test -f
, and writing the output to ~/existing_paths
. By analogy with xargs
, this would look like:
cat ~/paths | xfilter test -f > ~/existing_paths
It is the hypothesized program xfilter
that I am looking for:
xfilter COMMAND [ARG]...
Which, for each line L
of its standard input, will call COMMAND [ARG]... L
, and if the exit code is 0, it prints L
, else it prints nothing.
To be clear, I am not looking for:
I am looking for either:
xargs
, orIf map is xargs
, filter is... still xargs
.
Example: list files in the current directory and filter out non-executable files:
ls | xargs -I{} sh -c "test -x '{}' && echo '{}'"
This could be made handy trough a (non production-ready) function:
xfilter() {
xargs -I{} sh -c "$* '{}' && echo '{}'"
}
ls | xfilter test -x
Alternatively, you could use a parallel filter implementation via GNU Parallel:
ls | parallel "test -x '{}' && echo '{}'"
So, youre looking for the:
reduce( compare( filter( map(.. list()) ) ) )
what can be rewiritten as
list | map | filter | compare | reduce
The main power of bash
is a pipelining, therefore isn't need to have one special filter
and/or reduce
command. In fact nearly all unix commands could act in one (or more) functions as:
Imagine:
find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^------list+filter------^ ^--------map-----------^ ^--filter--^ ^compare^ ^reduce^
Creating a test case:
mkdir ./testcase
cd ./testcase || exit 1
for i in {1..10}
do
strings -1 < /dev/random | head -1000 > file.$i.txt
done
mkdir emptydir
You will get a directory named testcase
and in this directory 10 files and one directory
emptydir file.1.txt file.10.txt file.2.txt file.3.txt file.4.txt file.5.txt file.6.txt file.7.txt file.8.txt file.9.txt
each file contains 1000 lines of random strings some lines are contains only numbers
now run the command
find testcase -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
and you will get the largest number-only line from each files like: 42
. (of course, this can be done more effectively, this is only for demo)
decomposed:
The find testcase -type f -print
will print every plain files so, LIST (and reduced only to files). ouput:
testcase/file.1.txt
testcase/file.10.txt
testcase/file.2.txt
testcase/file.3.txt
testcase/file.4.txt
testcase/file.5.txt
testcase/file.6.txt
testcase/file.7.txt
testcase/file.8.txt
testcase/file.9.txt
the xargs grep -H '^[0-9]*$'
as MAP will run a grep
command for each file from a list. The grep is usually using as filter, e.g: command | grep
, but now (with xargs) changes the input (filenames) to (lines containing only digits). Output, many lines like:
testcase/file.1.txt:1
testcase/file.1.txt:8
....
testcase/file.9.txt:4
testcase/file.9.txt:5
structure of lines: filename colon number
, want only numbers so calling a pure filter, what strips out the filenames from each line cut -d: -f2
. It outputs many lines like:
1
8
...
4
5
Now the reduce (getting the largest number), the sort -nr
sorts all number numerically and reverse order (desc), so its output is like:
42
18
9
9
...
0
0
and the head -1
print the first line (the largest number).
Of course, you can write your own list/filter/map/reduce functions directly with bash
programming constructions (loops, conditions and such), or you can employ any fullblown scripting language like perl
, special languages like awk
, sed
"language", or dc
(rpn) and such.
Having an special filter command such:
list | filter_command cut -d: -f 2
is simple doesn't needed, because you can use directly the
list | cut
You can have awk
do the filter
and reduce
function.
Filter:
awk 'NR % 2 { $0 = $0 " [EVEN]" } 1'
Reduce:
awk '{ p = p + $0 } END { print p }'
I totally understand your question here as a long time functional programmer and here is the answer: Bash/unix command pipelining isn't as clean as you'd hoped.
In the example above:
find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^------list+filter------^ ^--------map-----------^ ^--filter--^ ^compare^ ^reduce^
a more pure form would look like:
find mydir | xargs -L 1 bash -c 'test -f $1 && echo $1' _ | grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^---list--^^-------filter---------------------------------^^------map----------^^--map-------^ ^reduce^
But, for example, grep also has a filtering capability: grep -q mypattern
which simply return 0 if it matches the pattern.
To get a something more like what you want, you simply would have to define a filter bash function and make sure to export it so it was compatible with xargs
But then you get into some problems. Like, test has binary and unary operators. How will your filter function handle this? Hand, what would you decide to output on true for these cases? Not insurmountable, but weird. Assuming only unary operations:
filter(){
while read -r LINE || [[ -n "${LINE}" ]]; do
eval "[[ ${LINE} $1 ]]" 2> /dev/null && echo "$LINE"
done
}
so you could do something like
seq 1 10 | filter "> 4"
5
6
7
8
9
As I wrote this I kinda liked it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With