Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep: match all characters up to (not including) first blank space

I have a text file that has the following format:

characters(that I want to keep) (space) characters(that I want to remove) 

So for example:

foo garbagetext hello moregarbage keepthis removethis (etc.) 

So I was trying to use the grep command in Linux to keep only the characters in each line up to and not including the first blank space. I have tried numerous attempts such as:

grep '*[[:space:]]' text1.txt > text2.txt grep '*[^\s]' text1.txt > text2.txt grep '/^[^[[:space:]]]+/' text1.txt > text2.txt 

trying to piece together from different examples, but I have had no luck. They all produce a blank text2.txt file. I am new to this. What am I doing wrong?

*EDIT:

The parts I want to keep include capital letters. So I want to keep any/all characters up to and not including the blank space (removing everything from the blank space onward) in each line.

**EDIT:

The garbage text (that I want to remove) can contain anything, including spaces, special characters, etc. So for example:

AA rough, cindery lava [n -S] 

After running grep -o '[^ ]*' text1.txt > text2.txt, the line above becomes:

AA rough, cindery lava [n -S] 

in text2.txt. (All I want to keep is AA)


SOLUTION (provided by Rohit Jain with further input by beny23):

 grep -o '^[^ ]*' text1.txt > text2.txt 
like image 723
lord_sneed Avatar asked Feb 03 '13 20:02

lord_sneed


People also ask

How do you grep everything except?

How to Exclude a Single Word with grep. The most simple way to exclude lines with a string or syntax match is by using grep and the -v flag. The output will be the example. txt text file but excluding any line that contains a string match with “ThisWord”.

How do you grep in blank space?

For any specific space character, you just use it. If you want to allow for ANY space character (tab, space, newline, etc), then if you have a “grep” that supports EXTENDED regular expressions (with the '-E' option), you can use '[[:space:]]' to represent any space character.

What does \b do in grep?

\b in a regular expression means "word boundary". With this grep command, you are searching for all words i in the file linux. txt . i can be at the beginning of a line or at the end, or between two space characters in a sentence.

How do I stop grep after first match?

The grep command has an -m or --max-count parameter, which can solve this problem, but it might not work like you'd expect. This parameter will make grep stop matching after finding N matching lines, which works great as it will limit the output to one line, always containing the first match.


2 Answers

You are putting quantifier * at the wrong place.

Try instead this: -

grep '^[^\s]*' text1.txt > text2.txt 

or, even better: -

grep '^\S*' text1.txt > text2.txt   

\S means match non-whitespace character. And anchor ^ is used to match at the beginning of the line.

like image 186
Rohit Jain Avatar answered Nov 07 '22 00:11

Rohit Jain


I realize this has long since been answered with the grep solution, but for future generations I'd like to note that there are at least two other solutions for this particular situation, both of which are more efficient than grep.

Since you are not doing any complex text pattern matching, just taking the first column delimited by a space, you can use some of the utilities which are column-based, such as awk or cut.

Using awk

$ awk '{print $1}' text1.txt > text2.txt 

Using cut

$ cut -f1 -d' ' text1.txt > text2.txt 

Benchmarks on a ~1.1MB file

$ time grep -o '^[^ ]*' text1.txt > text2.txt  real    0m0.064s user    0m0.062s sys     0m0.001s $ time awk '{print $1}' text1.txt > text2.txt  real    0m0.021s user    0m0.017s sys     0m0.004s $ time cut -f1 -d' ' text1.txt > text2.txt  real    0m0.007s user    0m0.004s sys     0m0.003s 

awk is about 3x faster than grep, and cut is about 3x faster than that. Again, there's not much difference for this small file for just one run, but if you're writing a script, e.g., for re-use, or doing this often on large files, you might appreciate the extra efficiency.

like image 40
Steve Avatar answered Nov 07 '22 01:11

Steve