Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print the whole line that contains a specified byte offset in a file?

I have such a example input.txt file:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.

Now I can easly grep for a word and get it's byte offset:

$ grep -ob incididunt /dev/null input.txt 
input.txt:80:incididunt

Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80 byte offset. I want to print the whole line that contains that byte offset inside the file.

So ideally that would be to get a script.sh that with two parameters, a file name and a byte offset, outputs the searched line:

$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

Another examples:

For the file=input.txt and the byte offset=130 the output should be:

enim ad minim veniam, quis nostrud exercitation ullamco laboris

For the file=input.txt and any byte offset between 195 up until 253 the output should be:

nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

For the file=input.txt and the byte offset=400 the output should be:

sunt in culpa qui officia deserunt mollit anim id est laborum.

I have tried:

I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.

$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt 
incididunt ut labore et dolore magna aliqua. Ut

I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read, as it omits newlines. I think I can get it to work with using dd, but there's surely must be a simpler solution.

set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
        IFS= read -r -u 10 -N 1 c
        pos=$((pos+1))
        # this will not work..., read omits newlines
        if [ "$c" = $'\n' ]; then
                lastnewlinepost="$pos"
        fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"

How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?

like image 286
KamilCuk Avatar asked May 15 '19 09:05

KamilCuk


2 Answers

With GNU awk, keep the number of bytes read so far in a variable, and when it reaches your byte offset print the current line and exit. E.g.:

$ awk -b '{ nb += length + 1 } nb >= 80 { print; exit }' file
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

The keyword length is a shorthand for length($0), which returns the length of the current line in bytes (thanks to -b). We need to add 1 to it as awk strips off the line terminator.

like image 191
oguz ismail Avatar answered Nov 15 '22 03:11

oguz ismail


Please try the following, you can adjust input/output according to your needs, but this outputs you the actual offset of the word and the line containing the word:

#!/bin/bash
SEARCH_TERM="$1"
SEARCH_FILE="$2"
OFFSET_OF_WORD="`grep -ob $SEARCH_TERM $SEARCH_FILE | cut -d':' -f1`"

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done

EDIT: Tested with your given input and executed as

./getLineByOffset.sh incididunt input.txt

Edit 2: If you only know the offset, not the actual search term

#!/bin/bash
OFFSET_OF_WORD="$1"
SEARCH_FILE="$2"

lastNewLinePos=0
lineNumber=0
for newLinePos in $(grep -b '$' $SEARCH_FILE | cut -d':' -f1)
do
    if (( $OFFSET_OF_WORD >= lastNewLinePos && $OFFSET_OF_WORD < $newLinePos )); then
        echo "Offset: $OFFSET_OF_WORD"
        echo "Line: `sed -n ${lineNumber}p $SEARCH_FILE`"
        break
    fi
    lastNewLinePos=$newLinePos
    let lineNumber++
done
like image 44
Florian Schlag Avatar answered Nov 15 '22 03:11

Florian Schlag