Am I correct in assuming that the only difference between "windows files" and "unix files" is the linebreak?
We have a system that has been moved from a windows machine to a unix machine and are having troubles with the format.
I need to automate the translation between unix/windows before the files get delivered to the system in our "transportsystem". I'll probably need something to determine the current format and something to transform it into the other format. If it's just the newline thats the big difference then I'm considering just reading the files with the java.io. As far as I know, they are able to handle both with readLine. And then just write each line back with
while (line = readline)
print(line + NewlineInOtherFormat)
....
samjudson:
This is only a difference in text files, where UNIX uses a single Line Feed (LF) to signify a new line, Windows uses a Carriage Return/Line Feed (CRLF) and Mac uses just a CR.
to which Cebjyre elaborates:
OS X uses LF, the same as UNIX - MacOS 9 and below did use CR though
Mo
There could also be a difference in character encoding for national characters. There is no "unix-encoding" but many linux-variants use UTF-8 as the default encoding. Mac OS (which is also a unix) uses its own encoding (macroman). I am not sure, what windows default encoding is.
McDowell
In addition to the new-line differences, the byte-order mark can cause problems if files are treated as Unicode on Windows.
Cheekysoft
However, another set of problems that you may come across can be related to single/multi-byte character encodings. If you see strange unexpected chars (not at end-of-line) then this could be the reason. Especially if you see square boxes, question marks, upside-down question marks, extra characters or unexpected accented characters.
Sadie
On unix, files that start with a . are hidden. On windows, it's a filesystem flag that you probably don't have easy access to. This may result in files that are supposed to be hidden now becoming visible on the client machines.
File permissions vary between the two. You will probably find, when you copy files onto a unix system, that the files now belong to the user that did the copying and have limited rights. You'll need to use chown/chmod to make sure the correct users have access to them.
There exists tools to help with the problem:
pauldoo
If you are just interested in the content of text files, then yes the line endings are different. Take a look at something like dos2unix, it may be of help here.
Cheekysoft
As pauldoo suggests, tools like dos2unix can be very useful. Note that these may be on your linux/unix system as fromdos or tofrodos, or perhaps even as the general purpose toolbox recode.
Help for java coding
Cheekysoft
When writing to files or reading from files (that you are in control of), it is often worth specifying the encoding to use, as most Java methods allow this. However, also ensuring that the system locale matches can save a lot of pain
Windows uses FAT and NTFS as file systems, while Linux uses a variety of file systems. Unlike Windows, Linux is bootable from a network drive. In contrast to Windows, everything is either a file or a process in Linux. Linux has two kinds of major partitions called data partitions and swap partitions.
Linux is an open source operating system whereas Windows OS is commercial. Linux has access to source code and alters the code as per user need whereas Windows does not have access to the source code. In Linux, the user has access to the source code of the kernel and alter the code according to his need.
- Unix is more stable and does not go down as often as Windows does, therefore requires less administration and maintenance. - Unix has greater built-in security and permissions features than Windows. - Unix possesses much greater processing power than Windows. - Unix is the leader in serving the Web.
This is only a difference in text files, where UNIX uses a single Line Feed (LF) to signify a new line, Windows uses a Carriage Return/Line Feed (CRLF) and Mac uses just a CR.
Binary files there should be no difference (i.e. a JPEG on a windows machine will be byte for byte the same as the same JPEG on a unix box.)
There could also be a difference in character encoding for national characters. There is no "unix-encoding" but many linux-variants use UTF-8 as the default encoding. Mac OS (which is also a unix) uses its own encoding (macroman). I am not sure, what windows default encoding is.
But this could be another source of trouble (apart from the different linebreaks).
What are your problems? The linebreak-related problems can be easily corrected with the programs dos2unix or unix2dos on the unix-machine
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With