Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems in JCIFS with certain non-ascii characters

Tags:

java

samba

jcifs

I am using JCIFS to access a file share with a lot of Japanese names on it, and I am running into issues when the ・character in it

For example:

the path 人事部/要員・コスト管理課/

the first part is ok, but the second part causes an issue. This may be related to the fact that “・” can be inputted using a slash, but I’m not sure. I have tried escaping the character but that does not seem to fix the issue. Do you have any clue what might be causing it?

like image 303
user439407 Avatar asked May 02 '16 09:05

user439407


2 Answers

UPDATE for U+30FB (KATAKANA MIDDLE DOT):

As @sergey-tachenov point out that issue is related to U+30FB (KATAKANA MIDDLE DOT), then it needs to be solved. For this reason, I would like to share some previous project experience and suggestions.

Suggestion-1:

One of my projects, we are making some manual for project. The manual was in various languages. There we got same type of issues. We used Lotus Notes. On that case, we have made some filters which changed those characters or glyphs to dot. After that lotus notes create folder and file name which are using later as path. So that problem was solved by this way. If you have that type of option, then you can fix easily.

Suggestion-2:

Various people are facing same type of issue. So they have tried in various ways.

Some saying,

  • replacing it with dot(.) solved the issue.
  • KATAKANA MIDDLE DOT (・) is s a double-width character. If you want to use the Katakana (Japanese) mid dot, consider using the HALFWIDTH KATAKANA MIDDLE DOT instead.
  • switched to the regular bullet and it works fine.

If you see twitter-text, they have made solution for KATAKANA MIDDLE DOT (・). See in github repo

Resource Link

Katakana Middle Dot issue solved in Twitter-Text

But attom developer chrissimpkins stated that below

I can confirm that we do not have a Katakana middle dot glyph (U+30FB) in the regular Hack font. There is a middle dot (U+00B7) that will have the appearance that you are after here. I can confirm that the U+00B7 glyph has the same fixed width spacing as the rest of the regular set (and all other variant sets).

Resource Link: https://github.com/atom/atom/issues/9115


First, I want to share with you that dot or period(.) is ASCII character. So dot(.) is not the issue. Character encoding and Server setting may be the issue.

URLs can only be sent over the Internet using the ASCII character-set. If a URL contains characters outside the ASCII set, the URL has to be converted.

SMB URL will be like below:

smb://[[[domain;]username[:password]@]server[:port]/[[share/[dir/]file]]][?param=value[param2=value2[...]]]

jCIFS can also address servers, and workgroups.

Important: all SMB URLs that represent workgroups, servers, shares, or directories require a trailing slash '/'.

When using the java.net.URL class with 'smb://' URLs it is necessary to first call the static jcifs.Config.registerSmbURLHandler(); method. This is required to register the SMB protocol handler.

The userinfo component of the SMB URL (domain;user:pass) must be URL encoded if it contains reserved characters. According to RFC 2396 these characters are non US-ASCII characters and most meta characters however jCIFS will work correctly with anything but '@' which is used to delimit the userinfo component from the server and '%' which is the URL escape character itself.


Character Checking and Setting

Then you have to know which charset you are using. By using following code, you can get:

System.out.println(Charset.defaultCharset());

or you can give command

$ testparm -v | grep dos shows that Samba's default OEM encoding

CIFS uses either UTF-16LE or a default codepage. The default codepage used by JCIFS is Cp850 or US_ASCII.

In jCIFS you can set it UTF-8 and check:

System.setProperty("jcifs.encoding", "UTF8");

Then for japanese locale, you can try

System.setProperty("jcifs.encoding", "Shift_JIS");

share names, passwords, and in some cases file and directory names that contain non ASCII characters may not be handled properly. By default this property is Cp860 which is MS-DOS Latin1.

Note: The Cp860 charset converter is located in jre/lib/charsets.jar which AFAIK is only supported by the internationalized version of Sun's JRE. If Cp860 is not available an exception will occur. To avoid this exception you can set jcifs.encoding to ASCII but share names and passwords with non-ASCII characters will not be processed correctly. To determine if jCIFS is properly processing these characters create a share that contains non-ASCII characers (e.g. Grüße) and then try to list that share with the ListFiles.java example program.


Setting Client Properties with Japanese

For Japanese language, you could try setting jcifs.encoding = Shift_JIS

The following tables show the Japanese encoding sets supported by J2SE 5.0. The canonical names used by the new java.nio APIs are in many cases not the same as those used in the java.io and java.lang APIs.

----------------------------------------------------------------------------------------------
|Canonical Name for  | Canonical Name for java.io  |               Description               |
|   java.nio API     |      and java.lang API      |                                         |
----------------------------------------------------------------------------------------------
|      EUC-JP        |           EUC_JP            | JISX 0201, 0208 and 0212, EUC encoding  | 
|                    |                             |               Japanese                  |
----------------------------------------------------------------------------------------------
|    ISO-2022-JP     |         ISO2022JP           | JIS X 0201, 0208, in ISO 2022 form,     | 
|                    |                             |               Japanese                  |
----------------------------------------------------------------------------------------------
|      Shift_JIS     |             SJIS            |            Shift-JIS, Japanese          | 
----------------------------------------------------------------------------------------------
|    windows-31j     |           MS932             |             Windows Japanese            | 
----------------------------------------------------------------------------------------------
|  x-euc-jp-linux    |        EUC_JP_LINUX         | JISX 0201, 0208, EUC encoding Japanese  | 
----------------------------------------------------------------------------------------------
|   x-eucJP-Open     |       EUC_JP_Solaris        | JISX 0201, 0208, 0212, EUC encoding     | 
|                    |                             |               Japanese                  |
----------------------------------------------------------------------------------------------
|     x-IBM33722     |           Cp33722           | IBM-eucJP - Japanese (superset of 5050) | 
----------------------------------------------------------------------------------------------
|      x-IBM930      |            Cp930            | Japanese Katakana-Kanji mixed with 4370 | 
|                    |                             |         UDC, superset of 5026           |
----------------------------------------------------------------------------------------------
|      x-IBM939      |            Cp939            | Japanese Latin Kanji mixed with 4370    | 
|                    |                             |         UDC, superset of 5035           |
----------------------------------------------------------------------------------------------
|      x-IBM942      |            Cp942            |  IBM OS/2 Japanese, superset of Cp932   | 
----------------------------------------------------------------------------------------------
|      x-IBM943      |            Cp943            | IBM OS/2 Japanese, superset of Cp932    | 
|                    |                             |         and Shift-JIS                   |
----------------------------------------------------------------------------------------------

I have shared some full code example for JCIFS. You could make a try

  1. Copying files over network shared folder using Java
  2. Copying the resources to and from windows network using Java
  3. Java Tutorial – Using JCIFS to copy files to shared network drive using username and password

Here's an example to retrieve a file:

import jcifs.smb.*;

jcifs.Config.setProperty( "jcifs.netbios.wins", "192.168.1.220" );
NtlmPasswordAuthentication auth = new NtlmPasswordAuthentication("domain", "username", "password");
SmbFileInputStream in = new SmbFileInputStream("smb://host/c/My Documents/人事部/要員・コスト管理課/somefile.txt", auth);
byte[] b = new byte[8192];
int n;
while(( n = in.read( b )) > 0 ) {
    System.out.write( b, 0, n );
}

You can also read/write, delete, make directories, rename, list contents of a directory, list the workgroups/ntdomains and servers on the network, list the shares of a server, open named pipes, authenticate web clients ...etc.

The SmbFile, SmbFileInputStream , and SmbFileOutputStream classes are analogous to the File, FileInputStream, and FileOutputStream classes

By using FileInputStream and FileOutputStream, Code will be like below:

SmbFile[] files = getSMBListOfFiles(sb, logger, domain, userName, password, sourcePath, sourcePattern);

    if (files == null)
        return false;
    output(sb, logger, "      Source file count: " + files.length);
    String destFilename;
    FileOutputStream fileOutputStream;
    InputStream fileInputStream;
    byte[] buf;
    int len;
    for (SmbFile smbFile: files) {
        destFilename = destinationPath + smbFile.getName();
        output(sb, logger, "         copying " + smbFile.getName());
        try {
            fileOutputStream = new FileOutputStream(destFilename);
            fileInputStream = smbFile.getInputStream();
            buf = new byte[16 * 1024 * 1024];
            while ((len = fileInputStream.read(buf)) > 0) {
                fileOutputStream.write(buf, 0, len);
            }
            fileInputStream.close();
            fileOutputStream.close();
        } catch (SmbException e) {
            OutputHandler.output(sb, logger, "Exception during copyNetworkFilesToLocal stream to output, SMP issue: " + e.getMessage(), e);
            e.printStackTrace();
            return false;
        } catch (FileNotFoundException e) {
            OutputHandler.output(sb, logger, "Exception during copyNetworkFilesToLocal stream to output, file not found: " + e.getMessage(), e);
            e.printStackTrace();
            return false;
        } catch (IOException e) {
            OutputHandler.output(sb, logger, "Exception during copyNetworkFilesToLocal stream to output, IO problem: " + e.getMessage(), e);
            e.printStackTrace();
            return false;
        } finally {
            OutputHandler.output(sb, logger, "Exception during copyNetworkFilesToLocal stream to output, IO problem: " + e.getMessage(), e);
            e.printStackTrace();
            return false;
        }
    }

Credit goes to @man called haney

Resource Link: How to copy file from smb share to local drive using jcifs in Java?


Precaution-1:

For more cautions for HTTPS and Tomcat users,

In most cases URLs running over HTTP work fine, but not when using HTTPS (i.e. over SSL). This usually results in Unicode (non-ASCII) characters in an HTTPS URL appear incorrect in the URL, and the served page contains numerous errors This occurs when the useBodyEncodingForURI="true" flag is not defined in the HTTPS connector definition in conf/server.xml of the Apache Tomcat application server running JIRA. This flag is set as such by default in 'recommended' distribution installations of JIRA.

However, in JIRA WAR setups, this might not be the case. Hence, ensure that the useBodyEncodingForURI="true" flag is included in the following element of the conf/server.xml file of your Apache Tomcat installation running JIRA:

<Connector port="8443" maxHttpHeaderSize="8192"
              maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
              enableLookups="false" disableUploadTimeout="true"
              acceptCount="100" scheme="https" secure="true"
              clientAuth="false" sslProtocol="TLS" useBodyEncodingForURI="true" />

After specifying the useBodyEncodingForURI="true" in all connector definitions (i.e. both the HTTP and the HTTPS connectors), as described in the 'Modifying Tomcat server.xml' section of the Installing JIRA on Tomcat 6.0 or 7.0 documentation

Resource Link:

How to Get Unicode 'non-ASCII' Characters in HTTPS URL to Appear Correctly


For NON-ASCII character, you can go through

  1. (Please) Stop Using Unsafe Characters in URLs
  2. Can I use non-ASCII characters in URLs?
  3. Is it advisable to have non-ascii characters in the URL?
like image 140
SkyWalker Avatar answered Sep 24 '22 18:09

SkyWalker


Take a look at heenenee comment, take a walk througth your server filesystem to check what is the real share name. I was testing access to network shares with the middle dot and Japanese names in a Samba server (UTF-8) with Java source (UTF-8) without problems.

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.junit.Test;

import jcifs.smb.SmbFile;
import junit.framework.TestCase;

public class JCifstest extends TestCase {

    @Test
    public void testJCifs() throws IOException {

        System.out.println(Charset.defaultCharset());

        SmbFile smbFile = new SmbFile("smb://myuser:mypass@myserver/basepath/人事部要員・コスト管理課/test.txt");
        File destFile = new File("/tmp/" + smbFile.getName());
        FileUtils.writeByteArrayToFile(destFile, IOUtils.toByteArray(smbFile.getInputStream()));
        assertEquals("content", FileUtils.readFileToString(destFile));
    }
}
like image 21
vzamanillo Avatar answered Sep 26 '22 18:09

vzamanillo