Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.savetxt() stop newline on final line

Tags:

python

numpy

numpy.savetxt() seems to always put a new line on the end of files. Is there a nice way to avoid this behaviour? Substituting the new line character to something else doesn't help.

I don't think this is particular to my code, but the writing is being done like this (model is a 3D array):

np.savetxt(modelFile, model, delimiter=",", fmt='%.3f')
like image 333
chris Avatar asked Feb 13 '15 05:02

chris


2 Answers

I'm not really sure why it matters, or if there is a way to prevent it on the numpy side (I didn't see anything in the docs...) but you can probably seek back in the file after writing and then truncate. e.g.

NEWLINE_SIZE_IN_BYTES = -1  # -2 on Windows?
with open('data.dat', 'w') as fout:
    np.savetxt(fout, model, delimiter=",", fmt='%.3f')
    fout.seek(NEWLINE_SIZE_IN_BYTES, 2)
    fout.truncate()

Note: to seek backwards, the byte-size must be negative

like image 111
mgilson Avatar answered Nov 16 '22 17:11

mgilson


The solution

To answer the question: There is a nice way to avoid this behaviour, althought it depends on your meaning of niceness. Basically, what you have to do is to wrap the numpy.savetxt function into another function or just use the chunk of code showed here wherever you need.

What I've done is to mix some of @mgilson's code with code provided in an answer to another similar question. To make it short, a code saving a file using numpy.savetxt and eliminating the last line would be the next:

import os

with open('some_array.txt', 'w') as fout:
    NEWLINE_SIZE_IN_BYTES = 1 # 2 on Windows?
    np.savetxt(fout, some_array) # Use np.savetxt.
    fout.seek(0, os.SEEK_END) # Go to the end of the file.
    # Go backwards one byte from the end of the file.
    fout.seek(fout.tell() - NEWLINE_SIZE_IN_BYTES, os.SEEK_SET)
    fout.truncate() # Truncate the file to this point.

The definitions of os.SEEK_END and os.SEEK_SET can be found here. Although they are just 2 and 0 respectively.

The logic behind the code

Some things to note here:

  • The file is opened in text mode, not in binary mode. This is important as the writing and reading from a file in text mode is platform dependent if you don't specify the encoding (which we don't usually do as in the two answers provided to this question). The newline character for example is reading differently in Windows or Linux. From the documentation:

    Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding. If encoding is not specified, the default is platform dependent (see open()). (...)

    (...) In text mode, the default when reading is to convert platform-specific line endings (\n on Unix, \r\n on Windows) to just \n. When writing in text mode, the default is to convert occurrences of \n back to platform-specific line endings.

  • In the next line of code, fout.seek(0, os.SEEK_END), we set the current position of the file to the end of the file (see the reference for seek()). This is the only legal operation from the end of the file in text mode as I will cite later in this text.

  • Then, in the line fout.seek(fout.tell() - NEWLINE_SIZE_IN_BYTES, os.SEEK_SET) we are just telling Python:
    • Set the current position backwards 1 byte from the current position: fout.tell() - NEWLINE_SIZE_IN_BYTES. Where tell() just returns the current position as you can see in the reference.
    • Starting from the beginning of the file os.SEEK_SET.
  • The reason to do it in this way is that in the seek() method only offsets returned by tell() are legal, as it says in the seek() documentation:

    If the file is opened in text mode (without ‘b’), only offsets returned by tell() are legal. Use of other offsets causes undefined behavior.

  • Finally, as it may be obvious by now the truncate() method only cuts the file up to the current position.

Another way in binary mode

I must declare I'm not pretty sure by now whether doing this in text mode is better than in binary mode although the other answers made me think so, see the other question.

Following @mgilson's code, we just need to open the file in binary mode. The modified working code is:

NEWLINE_SIZE_IN_BYTES = -1  # -2 on Windows?
with open('data.dat', 'wb') as fout:  # Note 'wb' instead of 'w'
    np.savetxt(fout, model, delimiter=",", fmt='%.3f')
    fout.seek(NEWLINE_SIZE_IN_BYTES, 2)
    fout.truncate()

Both of these ways work for me in versions of Python > 3.2.

like image 20
César Arroyo Cárdenas Avatar answered Nov 16 '22 16:11

César Arroyo Cárdenas