Java Unicode如何转GBK编码？-杰瑞科技汇

Of course. Converting Java strings from Unicode (UTF-16) to GBK is a common task, especially when dealing with legacy systems, files, or network protocols that require this specific Chinese encoding.

（图片来源网络，侵删）

Here’s a complete guide covering the concepts, standard methods, and best practices.

The Core Concept: What is Happening?

In Java, a String object is always stored internally as UTF-16. This is your "Unicode" representation.

When you "convert" a string to GBK, you are not changing the String object itself. Instead, you are encoding the characters from that String into a sequence of bytes that represents those characters according to the GBK character set.

The key class for this is java.nio.charset.CharsetEncoder.

（图片来源网络，侵删）

Method 1: The Standard & Recommended Approach (using `String.getBytes()`)

This is the most common and straightforward way to get a GBK-encoded byte array from a Java string.

You provide the GBK character set name to the getBytes() method.

Code Example

import java.nio.charset.StandardCharsets;
public class UnicodeToGbkExample {
    public static void main(String[] args) {
        // 1. Your original string in Java (stored as UTF-16 internally)
        String unicodeString = "这是一个测试字符串，Hello, World! 123";
        System.out.println("Original String: " + unicodeString);
        System.out.println("Original String (hex bytes): " + bytesToHex(unicodeString.getBytes(StandardCharsets.UTF_16)));
        System.out.println("-------------------------------------------");
        try {
            // 2. Encode the string into a GBK byte array
            // This is the key step.
            byte[] gbkBytes = unicodeString.getBytes("GBK");
            System.out.println("Successfully encoded to GBK.");
            System.out.println("GBK Byte Array length: " + gbkBytes.length);
            System.out.println("GBK Bytes (hex): " + bytesToHex(gbkBytes));
            // --- Verification: Decode the bytes back to a string ---
            String decodedString = new String(gbkBytes, "GBK");
            System.out.println("\nDecoded from GBK bytes: " + decodedString);
            System.out.println("Are original and decoded strings equal? " + unicodeString.equals(decodedString));
        } catch (java.io.UnsupportedEncodingException e) {
            // This exception is thrown if the JVM doesn't support the "GBK" charset.
            // On standard JVMs (like Oracle's or OpenJDK for Windows/Linux), this is rare.
            System.err.println("GBK charset is not supported on this JVM.");
            e.printStackTrace();
        }
    }
    // Helper method to print byte arrays in a readable hex format
    private static String bytesToHex(byte[] bytes) {
        if (bytes == null) {
            return "null";
        }
        StringBuilder sb = new StringBuilder(bytes.length * 2);
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString();
    }
}

Explanation

String unicodeString = "...": Creates a String object. Java handles the internal UTF-16 storage automatically.
unicodeString.getBytes("GBK"): This is the crucial line.
- The getBytes(Charset charset) method (or the older getBytes(String charsetName)) iterates through the characters of the String.
- For each character, it looks up its corresponding byte representation in the GBK character set.
- For ASCII characters like 'H', 'e', 'l', 'o', it uses a single byte (e.g., 'H' -> 0x48).
- For Chinese characters like '这', it uses two bytes (e.g., '这' -> 0xD6 0xD0).
- The result is a byte[] array that you can write to a file, send over a network, etc.
new String(gbkBytes, "GBK"): This demonstrates the reverse process (decoding). It takes the GBK byte array and reconstructs the original String object.

Method 2: The More Robust & Flexible Approach (using `Charset` and `CharsetEncoder`)

If you need more control over the encoding process (e.g., handling characters that have no GBK equivalent), using CharsetEncoder is better. It allows you to specify an error-handling strategy.

Code Example

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
public class GbkEncoderExample {
    public static void main(String[] args) {
        String unicodeString = "你好，世界！"; // "Hello, World!" in Chinese
        // Get the GBK Charset object
        Charset gbkCharset = Charset.forName("GBK");
        // Create an encoder with a specific error handling strategy
        // REPLACE: Replace unmappable characters with a '?' (or the charset's default replacement)
        // REPORT: Throw an exception when an unmappable character is found
        // IGNORE: Silently drop unmappable characters
        CharsetEncoder encoder = gbkCharset.newEncoder()
                                            .onMalformedInput(CodingErrorAction.REPLACE)
                                            .onUnmappableCharacter(CodingErrorAction.REPLACE);
        try {
            // Use the encoder to convert the CharBuffer to a ByteBuffer
            CharBuffer charBuffer = CharBuffer.wrap(unicodeString);
            ByteBuffer byteBuffer = encoder.encode(charBuffer);
            // Get the byte array from the ByteBuffer
            byte[] gbkBytes = new byte[byteBuffer.remaining()];
            byteBuffer.get(gbkBytes);
            System.out.println("Original String: " + unicodeString);
            System.out.println("Encoded to GBK (using CharsetEncoder): " + bytesToHex(gbkBytes));
            // Decoding is straightforward
            String decodedString = gbkCharset.newDecoder().decode(byteBuffer).toString();
            System.out.println("Decoded back: " + decodedString);
        } catch (CharacterCodingException e) {
            System.err.println("Encoding failed: " + e.getMessage());
            e.printStackTrace();
        }
    }
    // Helper method to print byte arrays in a readable hex format
    private static String bytesToHex(byte[] bytes) {
        if (bytes == null) {
            return "null";
        }
        StringBuilder sb = new StringBuilder(bytes.length * 2);
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString();
    }
}

When to use `CharsetEncoder`?

Custom Error Handling: If you need to know if a character couldn't be encoded instead of having it silently replaced. The CodingErrorAction enum (REPORT, REPLACE, IGNORE) gives you this power.
Streaming: If you are reading a large string from a stream, you can encode it chunk by chunk without loading the entire string into memory at once.
Maximum Control: It's the lower-level API that String.getBytes() uses under the hood.

Important Considerations

Handling Unsupported Characters

What happens if your string contains a character that doesn't exist in the GBK character set, like a Cyrillic 'Ж' or an emoji '😊'?

（图片来源网络，侵删）

With String.getBytes("GBK"): By default, it will throw an CharacterCodingException or replace the character, depending on the JVM's default configuration. This behavior can be unpredictable.
With CharsetEncoder: You have full control. You can configure it to:
- CodingErrorAction.REPLACE: Replace the unmappable character with a default byte sequence (often a single byte). This is often the safest option.
- CodingErrorAction.IGNORE: Simply drop the character.
- CodingErrorAction.REPORT: Throw a CharacterCodingException. This is useful for debugging or when data integrity is critical.

JVM Support for GBK

For the conversion to work, the JVM must have a charset provider for GBK. This is almost always true for standard JVMs on Windows and Linux. However, on minimal or custom JVMs, it might not be available.

You can check if a charset is supported like this:

import java.nio.charset.Charset;
import java.nio.charset.UnsupportedCharsetException;
public class CheckGbkSupport {
    public static void main(String[] args) {
        try {
            Charset.forName("GBK");
            System.out.println("GBK charset is supported.");
        } catch (UnsupportedCharsetException e) {
            System.err.println("GBK charset is NOT supported on this JVM.");
        }
    }
}

Writing to a File

A very common use case is writing a string to a file in GBK encoding. The java.nio.file package makes this easy and efficient.

import java.io.BufferedWriter;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class WriteGbkFile {
    public static void main(String[] args) {
        String content = "这是将要写入GBK文件的文本。";
        Path path = Paths.get("output_gbk.txt");
        try (BufferedWriter writer =

Java Unicode如何转GBK编码？

The Core Concept: What is Happening?

Method 1: The Standard & Recommended Approach (using `String.getBytes()`)

Code Example

Explanation

Method 2: The More Robust & Flexible Approach (using `Charset` and `CharsetEncoder`)

Code Example

When to use `CharsetEncoder`?

Important Considerations

Handling Unsupported Characters

JVM Support for GBK

Writing to a File

99ANYc3cd6

Linux下如何执行Python脚本文件？

sourceinsight教程

Python jieba库具体怎么用？

matlab2025b安装教程

Mac下Python运行报错怎么办？

androidkiller安装教程

SQL Developer安装教程，步骤是怎样的？

python lwpcookiejar

Java JDBC连接Oracle数据库，配置步骤有哪些？

Python文档如何高效查找与使用？

如何在Linux用Python运行ICTCLAS分词？

Java如何调用Webservice接口？

java exception 定义

labview2025安装教程

Portraiture Mac教程如何快速人像磨皮？

java 2 platform下载

Java Unicode如何转GBK编码？

The Core Concept: What is Happening?

Method 1: The Standard & Recommended Approach (using String.getBytes())

Code Example

Explanation

Method 2: The More Robust & Flexible Approach (using Charset and CharsetEncoder)

Code Example

When to use CharsetEncoder?

Important Considerations

Handling Unsupported Characters

JVM Support for GBK

Writing to a File

相关推荐

androidkiller安装教程

Method 1: The Standard & Recommended Approach (using `String.getBytes()`)

Method 2: The More Robust & Flexible Approach (using `Charset` and `CharsetEncoder`)

When to use `CharsetEncoder`?