package main
import (
"archive/zip"
"fmt"
"io"
"os"
"path/filepath"
"strings"
)
func main() {
var (
Path = os.Args[1]
Name = os.Args[2]
)
File, _ := os.Create(Name)
PS := strings.Split(Path, "\\")
PathName := strings.Join(PS[:len(PS)-1], "\\")
os.Chdir(PathName)
Path = PS[len(PS)-1]
defer File.Close()
Zip := zip.NewWriter(File)
defer Zip.Close()
walk := func(Path string, info os.FileInfo, err error) error {
if err != nil {
fmt.Println(err)
return err
}
if info.IsDir() {
return nil
}
Src, _ := os.Open(Path)
defer Src.Close()
fmt.Println(Path)
FileName, _ := Zip.Create(Path)
io.Copy(FileName, Src)
Zip.Flush()
return nil
}
if err := filepath.Walk(Path, walk); err != nil {
fmt.Println(err)
}
}
This mydir path :
-----root
|---2015-05(dir)
|---中文.go
|---package(dir)
|---你好.go
When I use this code directory, Chinese will be garbled. Who can help me solve the problem.
The problem is that by default in zip entry names only the ASCII characters are allowed by the Zip specification, more specifically: (Source: APPENDIX D)
APPENDIX D.1 The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437. This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages. To address this limitation, this specification will support the following change.
Later support for Unicode names has been added. This can be marked with a special bit referred to as general purpose bit 11
, also called Language encoding flag (EFS)
:
Section 4.4.4 - General purpose bit flag - Bit 11 - Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8.
APPENDIX D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).
The general purpose bit flag
is present and supported by Go: it is the Flags
field of the FileHeader
struct. Unfortunately Go doesn't have methods to set this bit, and by default it is 0.
So the easiest way to add support for Unicode names is to simply set bit 11
to one. Instead of
FileName, _ := Zip.Create(Path)
Start your zip entry with:
h := &zip.FileHeader{Name:Path, Method: zip.Deflate, Flags: 0x800}
FileName, _ := Zip.CreateHeader(h)
The first line creates a FileHeader
in which 0x800
(bit 11
) value is set for the Flags
field which tells that the file name will be encoded using UTF-8
(which is what Go does when it writes a string
to an io.Writer
).
Note:
By doing this, UTF-8 filenames will be preserved, but not all zip reader/extractor supports it. For example on Windows, the windows file handler, the Windows Explorer will not decode it as UTF-8, but for example a more serious Zip handler (e.g. SecureZip) will see the UTF-8 file names and will extract the file names properly (using UTF-8 decoding).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With