Java modified UTF-8 strings in Python

Tags:

I am interfacing with a Java application via Python. I need to be able to construct byte sequences which contain UTF-8 strings. Java uses a modified UTF-8 encoding in DataInputStream.readUTF() which is not supported by Python (yet at least)

Can anybody point me in the right direction to construct Java modified UTF-8 strings in Python?

Update #1: To see a little more about the Java modified UTF-8, check out the readUTF() method from the DataInput interface on line 550 here, or here in the Java SE docs.

Update #2: I am trying to interface with a third-party JBoss web app which is using this modified UTF-8 format to read in strings via POST requests by calling DataInputStream.readUTF() (sorry for any confusion regarding normal Java UTF-8 string operation).

991

asked Sep 08 '09 09:09

QAZ

1 Answers

You can ignore Modified UTF-8 Encoding (MUTF-8) and just treat it as UTF-8. On the Python side, you can just handle it like this,

Convert the string into normal UTF-8 and stores bytes in a buffer.
Write the 2-byte buffer length (not the string length) as binary in big-endian.
Write the whole buffer.

I've done this in PHP and Java didn't complain about my encoding at all (at least in Java 5).

MUTF-8 is mainly used for JNI and other systems with null-terminated strings. The only difference from normal UTF-8 is how U+0000 is encoded. Normal UTF-8 use 1 byte encoding (0x00) and MUTF-8 uses 2 bytes (0xC0 0x80). First of all, you shouldn't have U+0000 (an invalid codepoint) in any Unicode text. Secondly, DataInputStream.readUTF() doesn't enforce the encoding so it happily accepts either one.

EDIT: The Python code should look like this,

Click to copy

def writeUTF(data, str):
    utf8 = str.encode('utf-8')
    length = len(utf8)
    data.append(struct.pack('!H', length))
    format = '!' + str(length) + 's'
    data.append(struct.pack(format, utf8))

answered Oct 12 '22 14:10

ZZ Coder

Related questions
                            
                                Java library that finds sentence boundaries
                            
                                Lazy Loading DTO fields in Spring
                            
                                Whats the best way to process an asynchronous queue continuously in Java?
                            
                                How to pass a system property using Wrapper.exe
                            
                                Using JAXB to unmarshall elements with varying/dynamic names
                            
                                Configure Hibernate to use Oracle's SYS_GUID() for Primary Key
                            
                                Emacs typeover skeleton-pair-insert-maybe
                            
                                Generating 128-bit keys with keytool
                            
                                Which NLP toolkit to use in JAVA? [closed]
                            
                                Maven assembly:assembly
                            
                                Eclipse plugin reuse outside eclipse
                            
                                Test Cases: Mocking Database using Spring beans
                            
                                Using FTP Proxy with apache commons-net
                            
                                how to add images in HSSFCell in apache POI?
                            
                                simple generic list in java
                            
                                Casting to a Comparable, then Comparing
                            
                                Development productivity of Ruby on Rails vs. Java in 2009
                            
                                Image acquisition hardware scanning from Java [closed]
                            
                                Search for file in directory with multiple directories
                            
                                How to get the hash algorithm name using the OID in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java modified UTF-8 strings in Python

Tags:

java

python

utf-8

QAZ

People also ask

1 Answers

ZZ Coder

Recent Activity

Donate For Us