Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If xargs is map, what is filter?

I think of xargs as the map function of the UNIX shell. What is the filter function?

EDIT: it looks like I'll have to be a bit more explicit.

Let's say I have to hand a program which accepts a single string as a parameter and returns with an exit code of 0 or 1. This program will act as a predicate over the strings that it accepts.

For example, I might decide to interpret the string parameter as a filepath, and define the predicate to be "does this file exist". In this case, the program could be test -f, which, given a string, exits with 0 if the file exists, and 1 otherwise.

I also have to hand a stream of strings. For example, I might have a file ~/paths containing

/etc/apache2/apache2.conf
/foo/bar/baz
/etc/hosts

Now, I want to create a new file, ~/existing_paths, containing only those paths that exist on my filesystem. In my case, that would be

/etc/apache2/apache2.conf
/etc/hosts

I want to do this by reading in the ~/paths file, filtering those lines by the predicate test -f, and writing the output to ~/existing_paths. By analogy with xargs, this would look like:

cat ~/paths | xfilter test -f > ~/existing_paths

It is the hypothesized program xfilter that I am looking for:

xfilter COMMAND [ARG]...

Which, for each line L of its standard input, will call COMMAND [ARG]... L, and if the exit code is 0, it prints L, else it prints nothing.

To be clear, I am not looking for:

  • a way to filter a list of filepaths by existence. That was a specific example.
  • how to write such a program. I can do that.

I am looking for either:

  • a pre-existing implementation, like xargs, or
  • a clear explanation of why this doesn't exist
like image 382
jameshfisher Avatar asked Jul 27 '14 08:07

jameshfisher


4 Answers

If map is xargs, filter is... still xargs.

Example: list files in the current directory and filter out non-executable files:

ls | xargs -I{} sh -c "test -x '{}' && echo '{}'"

This could be made handy trough a (non production-ready) function:

xfilter() {
    xargs -I{} sh -c "$* '{}' && echo '{}'"
}
ls | xfilter test -x

Alternatively, you could use a parallel filter implementation via GNU Parallel:

ls | parallel "test -x '{}' && echo '{}'"
like image 123
mrucci Avatar answered Oct 24 '22 18:10

mrucci


So, youre looking for the:

 reduce(  compare(  filter( map(.. list()) ) ) )

what can be rewiritten as

 list | map | filter | compare | reduce

The main power of bash is a pipelining, therefore isn't need to have one special filter and/or reduce command. In fact nearly all unix commands could act in one (or more) functions as:

  • list
  • map
  • filter
  • reduce

Imagine:

find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head  -1
^------list+filter------^   ^--------map-----------^   ^--filter--^   ^compare^  ^reduce^

Creating a test case:

mkdir ./testcase
cd ./testcase || exit 1
for i in {1..10}
do
    strings -1 < /dev/random | head -1000 > file.$i.txt
done
mkdir emptydir

You will get a directory named testcase and in this directory 10 files and one directory

emptydir  file.1.txt  file.10.txt file.2.txt  file.3.txt  file.4.txt  file.5.txt  file.6.txt  file.7.txt  file.8.txt  file.9.txt

each file contains 1000 lines of random strings some lines are contains only numbers

now run the command

find testcase -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1

and you will get the largest number-only line from each files like: 42. (of course, this can be done more effectively, this is only for demo)

decomposed:

The find testcase -type f -print will print every plain files so, LIST (and reduced only to files). ouput:

testcase/file.1.txt
testcase/file.10.txt
testcase/file.2.txt
testcase/file.3.txt
testcase/file.4.txt
testcase/file.5.txt
testcase/file.6.txt
testcase/file.7.txt
testcase/file.8.txt
testcase/file.9.txt

the xargs grep -H '^[0-9]*$' as MAP will run a grep command for each file from a list. The grep is usually using as filter, e.g: command | grep, but now (with xargs) changes the input (filenames) to (lines containing only digits). Output, many lines like:

testcase/file.1.txt:1
testcase/file.1.txt:8
....
testcase/file.9.txt:4
testcase/file.9.txt:5

structure of lines: filename colon number, want only numbers so calling a pure filter, what strips out the filenames from each line cut -d: -f2. It outputs many lines like:

1
8
...
4
5

Now the reduce (getting the largest number), the sort -nr sorts all number numerically and reverse order (desc), so its output is like:

42
18
9
9
...
0
0

and the head -1 print the first line (the largest number).

Of course, you can write your own list/filter/map/reduce functions directly with bash programming constructions (loops, conditions and such), or you can employ any fullblown scripting language like perl, special languages like awk, sed "language", or dc (rpn) and such.

Having an special filter command such:

list | filter_command cut -d: -f 2

is simple doesn't needed, because you can use directly the

list | cut
like image 21
jm666 Avatar answered Oct 24 '22 16:10

jm666


You can have awk do the filter and reduce function.

Filter:

awk 'NR % 2 { $0 = $0 " [EVEN]" } 1'

Reduce:

awk '{ p = p + $0 } END { print p }'
like image 2
konsolebox Avatar answered Oct 24 '22 16:10

konsolebox


I totally understand your question here as a long time functional programmer and here is the answer: Bash/unix command pipelining isn't as clean as you'd hoped.

In the example above:

find mydir -type f -print | xargs grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head  -1
^------list+filter------^   ^--------map-----------^   ^--filter--^   ^compare^  ^reduce^

a more pure form would look like:

find mydir | xargs -L 1 bash -c 'test -f $1 && echo $1' _ | grep -H '^[0-9]*$' | cut -d: -f 2 | sort -nr | head -1
^---list--^^-------filter---------------------------------^^------map----------^^--map-------^  ^reduce^

But, for example, grep also has a filtering capability: grep -q mypattern which simply return 0 if it matches the pattern.

To get a something more like what you want, you simply would have to define a filter bash function and make sure to export it so it was compatible with xargs

But then you get into some problems. Like, test has binary and unary operators. How will your filter function handle this? Hand, what would you decide to output on true for these cases? Not insurmountable, but weird. Assuming only unary operations:

filter(){
    while read -r LINE || [[ -n "${LINE}" ]]; do
        eval "[[ ${LINE} $1 ]]" 2> /dev/null && echo "$LINE"
    done
}

so you could do something like

seq 1 10 | filter "> 4"
5
6
7
8
9

As I wrote this I kinda liked it

like image 2
Christian Bongiorno Avatar answered Oct 24 '22 16:10

Christian Bongiorno