Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fix Mismatch Between Data And Local In Awk Command

I am receiving the following error:

awk: cmd. line:1: (FILENAME=- FNR=798) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.

The command I'm running is the following:

cat file.txt | awk 'length($0)<10000' > output-file.txt

The weird part is that if I pipe to other commands like awk '{ sub("\r$", ""); print }', it works just fine without an error.

Anyone see why I would get this error? Or, should I just ignore it?

like image 223
DomainsFeatured Avatar asked Oct 14 '16 18:10

DomainsFeatured


People also ask

What is the output of the awk command?

The output prints the first and second fields of those records whose third field is greater than ten and the fourth field is less than 20. The awk command has built-in field variables, which break the input file into separate parts called fields. The awk assigns the following variables to each data field:

How to use FS variable in AWK as field separator?

FS variable is used in awk command as a field separator. Space is used as a default value of FS. The following command will read the file customer.txt using space as field separator and print the file content. Run the command from the terminal.

How to count the number of fields in a file using AWK?

NF variable is used in awk command to count the total number of fields in each line of a file. The following awk script is applied for the file, student.txt which is created in the previous example. The script will print those lines from student.txt file where the total fields are less than 3. Run the command from the terminal.

How to use user-defined variables in awk command?

How user-defined, built-in and shell variables can be used in awk command is shown in this tutorial by using different examples. `awk` command uses ‘-v’ option to define the variable. In this example, the myvar variable is defined in the `awk` command to store the value, “AWK variable” that is printed later.


1 Answers

Make the locale as C to use only ASCII character set with single byte encoding, pass LC_ALL=C to awk's environment:

LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt

Also you don't need to use cat as awk takes filename(s) as argument(s).

like image 196
heemayl Avatar answered Oct 13 '22 01:10

heemayl