I have such a example input.txt
file:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
Now I can easly grep for a word and get it's byte offset:
$ grep -ob incididunt /dev/null input.txt
input.txt:80:incididunt
Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80
byte offset. I want to print the whole line that contains that byte offset inside the file.
So ideally that would be to get a script.sh
that with two parameters, a file name and a byte offset, outputs the searched line:
$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
Another examples:
For the file=input.txt and the byte offset=130 the output should be:
enim ad minim veniam, quis nostrud exercitation ullamco laboris
For the file=input.txt and any byte offset between 195 up until 253 the output should be:
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
For the file=input.txt and the byte offset=400 the output should be:
sunt in culpa qui officia deserunt mollit anim id est laborum.
I have tried:
I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor
part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.
$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt
incididunt ut labore et dolore magna aliqua. Ut
I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read
, as it omits newlines. I think I can get it to work with using dd
, but there's surely must be a simpler solution.
set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
IFS= read -r -u 10 -N 1 c
pos=$((pos+1))
# this will not work..., read omits newlines
if [ "$c" = $'\n' ]; then
lastnewlinepost="$pos"
fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"
How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?
With GNU awk, keep the number of bytes read so far in a variable, and when it reaches your byte offset print the current line and exit. E.g.:
$ awk -b '{ nb += length + 1 } nb >= 80 { print; exit }' file
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
The keyword length
is a shorthand for length($0)
, which returns the length of the current line in bytes (thanks to -b
). We need to add 1 to it as awk strips off the line terminator.
Please try the following, you can adjust input/output according to your needs, but this outputs you the actual offset of the word and the line containing the word:
#!/bin/bash
SEARCH_TERM="$1"
SEARCH_FILE="$2"
OFFSET_OF_WORD="`grep -ob $SEARCH_TERM $SEARCH_FILE | cut -d':' -f1`"
lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
echo "Offset: $OFFSET_OF_WORD"
echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
break
fi
lastNewLinePos=$newLinePos
let lineNumber++
done
EDIT: Tested with your given input and executed as
./getLineByOffset.sh incididunt input.txt
Edit 2: If you only know the offset, not the actual search term
#!/bin/bash
OFFSET_OF_WORD="$1"
SEARCH_FILE="$2"
lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
echo "Offset: $OFFSET_OF_WORD"
echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
break
fi
lastNewLinePos=$newLinePos
let lineNumber++
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With