I am receiving the following error:
awk: cmd. line:1: (FILENAME=- FNR=798) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.
The command I'm running is the following:
cat file.txt | awk 'length($0)<10000' > output-file.txt
The weird part is that if I pipe to other commands like awk '{ sub("\r$", ""); print }'
, it works just fine without an error.
Anyone see why I would get this error? Or, should I just ignore it?
The output prints the first and second fields of those records whose third field is greater than ten and the fourth field is less than 20. The awk command has built-in field variables, which break the input file into separate parts called fields. The awk assigns the following variables to each data field:
FS variable is used in awk command as a field separator. Space is used as a default value of FS. The following command will read the file customer.txt using space as field separator and print the file content. Run the command from the terminal.
NF variable is used in awk command to count the total number of fields in each line of a file. The following awk script is applied for the file, student.txt which is created in the previous example. The script will print those lines from student.txt file where the total fields are less than 3. Run the command from the terminal.
How user-defined, built-in and shell variables can be used in awk command is shown in this tutorial by using different examples. `awk` command uses ‘-v’ option to define the variable. In this example, the myvar variable is defined in the `awk` command to store the value, “AWK variable” that is printed later.
Make the locale as C
to use only ASCII character set with single byte encoding, pass LC_ALL=C
to awk
's environment:
LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt
Also you don't need to use cat
as awk
takes filename(s) as argument(s).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With