How to check whether a character is a newline character in any encoding in C?
I have a task to write my own wc program. And if I use just if (s[i] == '\n')
it has another answer than original wc if I call it to itself.
Here is the code:
typedef struct
{
int newline;
int word;
int byte;
} info;
info count(int descr)
{
info kol;
kol.newline = 0;
kol.word = 0;
kol.byte = 0;
int len = 512;
char s[512];
int n;
errno = 0;
int flag1 = 1;
int flag2 = 1;
while(n = read(descr, s, len))
{
if(n == -1)
error("Error while reading.", errno);
errno = 0;
kol.byte+=n;
for(int i=0; i<n; i++)
{
if(flag1)
{
kol.newline++;
flag1 = 0;
}
if(isblank(s[i]) || s[i] == '\n')
flag2 = 1;
else
{
if(flag2)
{
kol.word++;
flag2 = 0;
}
}
if(s[i] == '\n')
flag1 = 1;
}
}
return kol;
}
It works fine for all text files, but when I call it to file I got after compiling itself it does't give the answer wc gives.
Check if a string contains a newline (\n) in Python # Use the in operator to check if a string contains a newline character, e.g. if '\n' in string: . The in operator will return True if the string contains a newline character and False otherwise.
Adding Newline Characters in a String Operating systems have special characters denoting the start of a new line. For example, in Linux a new line is denoted by “\n”, also called a Line Feed. In Windows, a new line is denoted using “\r\n”, sometimes called a Carriage Return and Line Feed, or CRLF.
In ASCII, newline is X'0A'. In EBCDIC, newline is X'15'. (For example, ASCII code page ISO8859-1 and EBCDIC code page IBM-1047 translate back and forth between these characters.) Windows programs normally use a carriage return followed by a line feed character at the end of each line of a text file.
The way to check whether a character s[i]
is a newline character is simply:
if (s[i] == '\n')
If you're reading from a file that's been opened in text mode (including stdin
), then whatever representation the underlying system uses to mark the end of a line will be translated to a single '\n'
character.
You say you're trying to write your own wc
program, and by comparing to '\n'
you're getting different results than the system's wc
. You haven't told us enough to guess why that's happening. Show us your code and tell us exactly what's happening.
You might run into problems if you're reading a file that's encoded differently -- say, trying to read a Unix-format text file on a Windows system. But then wc
would have the same problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With