How Liquibase is currently handling character encoding?

Question

Could you explain how Liquibase (eg. version 3.3.2) is currently handling character encoding ?

1/ Changesets in XML format are declared in UTF-8. However, some changes can be directly declared inline within XML (eg. 'sql') and others can be imported directly from external files (eg. 'sqlFile').

For the first ones (inline), could you confirm that these changes must be then obviously in the same character encoding than the XML changeset (so only UTF-8)? Is it possible to have changes in different character encoding than UTF-8 ? If so, is it then mandatory to set a specific encoding within the XML declaration (eg. encoding="ISO-8859-1") instead of UTF-8 ? And finally, how could we inform Liquibase that these changesets should be parsed with that specific encoding (eg. Java system property) ?

For the second ones (imported as file), could you confirm that these changes could be set in another character encoding than UTF-8 ? If so, could you confirm that we must set the "encoding" attribute of these changes to the appropriate character encoding ? Is it then true that we could have an XML changeset declared as UTF-8 but changes set in a different character encoding (eg. encoding="ISO-8859-1") ? And finally, do we need to inform Liquibase in any way to parse the changeset in a specific encoding (eg. Java system property) ?

2/ Changesets in SQL format are also a different story. It is not possible to set any metadata within these files to inform Liquibase about which character encoding to use when parsing these files.

What character encoding Liquibase is using to parse these files ? UTF-8 or any another character encoding ? Is it possible to have changes in different character encoding than UTF-8 ? If so, how to declare it and how could we inform Liquibase about the character encoding to use to parse these files (eg. Java system property) ?

As far as I know, several Java system properties can be set with Liquibase:

file.encoding,
liquibase.file.encoding,
liquibase.ouputFileEncoding.

However these Java system properties seems more to influence the writing process of the changes than the parsing process.

Currently, most of our databases are using ISO-8859-1 or windows-1252 but Liquibase seems to handle correctly UTF-8 changesets only. Your answers to these questions will help us a lot to understand

what features related to character encoding are provided by Liquibase, and
what restrictions exist depending on the used changeset format.

I thank you in advance for your help,

Bertrand

Jens · Accepted Answer

I think files are read in liquibase with the FileSystemResourceAccessor and there is no encoding which you can specifically set. Which means it will use what ever the underlying Java will use. And InputStreamReader will use the default system encoding.

So you should be able to influence this by setting the the encoding for the JVM with:

-Dfile.encoding=UTF-8

XML files are parsed with SAX parser (and maybe the SAX parser will do further stuff to recognize the encoding.)

For changesets in sql file format it will use the UtfBomAwareReader reader. Though there is code to try to identify the encoding in UtfBomAwareReader I think the SqlChangeLogParser is not using it (as of right now) and instead just defaults to "UTF-8".

This is to the best of my knowledge. So before you make big design decision based on this try to validate it yourself.

Andreas Covidiot · Answer

set it before executing liquibase, e.g. in a Windows environment:

set JAVA_OPTS="-Dfile.encoding=UTF-8"
liquibase.bat

How Liquibase is currently handling character encoding?

Tags:

sql

encoding

liquibase

changeset

bgillis

2 Answers

Jens

Andreas Covidiot

Recent Activity

Donate For Us

How Liquibase is currently handling character encoding?

Tags:

sql

encoding

liquibase

changeset

bgillis

2 Answers

Jens

Andreas Covidiot

Related questions

Recent Activity

Donate For Us