杰瑞科技汇

Java Unicode如何转GBK编码?

Of course. Converting Java strings from Unicode (UTF-16) to GBK is a common task, especially when dealing with legacy systems, files, or network protocols that require this specific Chinese encoding.

Java Unicode如何转GBK编码?-图1
(图片来源网络,侵删)

Here’s a complete guide covering the concepts, standard methods, and best practices.

The Core Concept: What is Happening?

In Java, a String object is always stored internally as UTF-16. This is your "Unicode" representation.

When you "convert" a string to GBK, you are not changing the String object itself. Instead, you are encoding the characters from that String into a sequence of bytes that represents those characters according to the GBK character set.

The key class for this is java.nio.charset.CharsetEncoder.

Java Unicode如何转GBK编码?-图2
(图片来源网络,侵删)

Method 1: The Standard & Recommended Approach (using String.getBytes())

This is the most common and straightforward way to get a GBK-encoded byte array from a Java string.

You provide the GBK character set name to the getBytes() method.

Code Example

import java.nio.charset.StandardCharsets;
public class UnicodeToGbkExample {
    public static void main(String[] args) {
        // 1. Your original string in Java (stored as UTF-16 internally)
        String unicodeString = "这是一个测试字符串,Hello, World! 123";
        System.out.println("Original String: " + unicodeString);
        System.out.println("Original String (hex bytes): " + bytesToHex(unicodeString.getBytes(StandardCharsets.UTF_16)));
        System.out.println("-------------------------------------------");
        try {
            // 2. Encode the string into a GBK byte array
            // This is the key step.
            byte[] gbkBytes = unicodeString.getBytes("GBK");
            System.out.println("Successfully encoded to GBK.");
            System.out.println("GBK Byte Array length: " + gbkBytes.length);
            System.out.println("GBK Bytes (hex): " + bytesToHex(gbkBytes));
            // --- Verification: Decode the bytes back to a string ---
            String decodedString = new String(gbkBytes, "GBK");
            System.out.println("\nDecoded from GBK bytes: " + decodedString);
            System.out.println("Are original and decoded strings equal? " + unicodeString.equals(decodedString));
        } catch (java.io.UnsupportedEncodingException e) {
            // This exception is thrown if the JVM doesn't support the "GBK" charset.
            // On standard JVMs (like Oracle's or OpenJDK for Windows/Linux), this is rare.
            System.err.println("GBK charset is not supported on this JVM.");
            e.printStackTrace();
        }
    }
    // Helper method to print byte arrays in a readable hex format
    private static String bytesToHex(byte[] bytes) {
        if (bytes == null) {
            return "null";
        }
        StringBuilder sb = new StringBuilder(bytes.length * 2);
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString();
    }
}

Explanation

  1. String unicodeString = "...": Creates a String object. Java handles the internal UTF-16 storage automatically.
  2. unicodeString.getBytes("GBK"): This is the crucial line.
    • The getBytes(Charset charset) method (or the older getBytes(String charsetName)) iterates through the characters of the String.
    • For each character, it looks up its corresponding byte representation in the GBK character set.
    • For ASCII characters like 'H', 'e', 'l', 'o', it uses a single byte (e.g., 'H' -> 0x48).
    • For Chinese characters like '这', it uses two bytes (e.g., '这' -> 0xD6 0xD0).
    • The result is a byte[] array that you can write to a file, send over a network, etc.
  3. new String(gbkBytes, "GBK"): This demonstrates the reverse process (decoding). It takes the GBK byte array and reconstructs the original String object.

Method 2: The More Robust & Flexible Approach (using Charset and CharsetEncoder)

If you need more control over the encoding process (e.g., handling characters that have no GBK equivalent), using CharsetEncoder is better. It allows you to specify an error-handling strategy.

Code Example

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
public class GbkEncoderExample {
    public static void main(String[] args) {
        String unicodeString = "你好,世界!"; // "Hello, World!" in Chinese
        // Get the GBK Charset object
        Charset gbkCharset = Charset.forName("GBK");
        // Create an encoder with a specific error handling strategy
        // REPLACE: Replace unmappable characters with a '?' (or the charset's default replacement)
        // REPORT: Throw an exception when an unmappable character is found
        // IGNORE: Silently drop unmappable characters
        CharsetEncoder encoder = gbkCharset.newEncoder()
                                            .onMalformedInput(CodingErrorAction.REPLACE)
                                            .onUnmappableCharacter(CodingErrorAction.REPLACE);
        try {
            // Use the encoder to convert the CharBuffer to a ByteBuffer
            CharBuffer charBuffer = CharBuffer.wrap(unicodeString);
            ByteBuffer byteBuffer = encoder.encode(charBuffer);
            // Get the byte array from the ByteBuffer
            byte[] gbkBytes = new byte[byteBuffer.remaining()];
            byteBuffer.get(gbkBytes);
            System.out.println("Original String: " + unicodeString);
            System.out.println("Encoded to GBK (using CharsetEncoder): " + bytesToHex(gbkBytes));
            // Decoding is straightforward
            String decodedString = gbkCharset.newDecoder().decode(byteBuffer).toString();
            System.out.println("Decoded back: " + decodedString);
        } catch (CharacterCodingException e) {
            System.err.println("Encoding failed: " + e.getMessage());
            e.printStackTrace();
        }
    }
    // Helper method to print byte arrays in a readable hex format
    private static String bytesToHex(byte[] bytes) {
        if (bytes == null) {
            return "null";
        }
        StringBuilder sb = new StringBuilder(bytes.length * 2);
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString();
    }
}

When to use CharsetEncoder?

  • Custom Error Handling: If you need to know if a character couldn't be encoded instead of having it silently replaced. The CodingErrorAction enum (REPORT, REPLACE, IGNORE) gives you this power.
  • Streaming: If you are reading a large string from a stream, you can encode it chunk by chunk without loading the entire string into memory at once.
  • Maximum Control: It's the lower-level API that String.getBytes() uses under the hood.

Important Considerations

Handling Unsupported Characters

What happens if your string contains a character that doesn't exist in the GBK character set, like a Cyrillic 'Ж' or an emoji '😊'?

Java Unicode如何转GBK编码?-图3
(图片来源网络,侵删)
  • With String.getBytes("GBK"): By default, it will throw an CharacterCodingException or replace the character, depending on the JVM's default configuration. This behavior can be unpredictable.
  • With CharsetEncoder: You have full control. You can configure it to:
    • CodingErrorAction.REPLACE: Replace the unmappable character with a default byte sequence (often a single byte). This is often the safest option.
    • CodingErrorAction.IGNORE: Simply drop the character.
    • CodingErrorAction.REPORT: Throw a CharacterCodingException. This is useful for debugging or when data integrity is critical.

JVM Support for GBK

For the conversion to work, the JVM must have a charset provider for GBK. This is almost always true for standard JVMs on Windows and Linux. However, on minimal or custom JVMs, it might not be available.

You can check if a charset is supported like this:

import java.nio.charset.Charset;
import java.nio.charset.UnsupportedCharsetException;
public class CheckGbkSupport {
    public static void main(String[] args) {
        try {
            Charset.forName("GBK");
            System.out.println("GBK charset is supported.");
        } catch (UnsupportedCharsetException e) {
            System.err.println("GBK charset is NOT supported on this JVM.");
        }
    }
}

Writing to a File

A very common use case is writing a string to a file in GBK encoding. The java.nio.file package makes this easy and efficient.

import java.io.BufferedWriter;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class WriteGbkFile {
    public static void main(String[] args) {
        String content = "这是将要写入GBK文件的文本。";
        Path path = Paths.get("output_gbk.txt");
        try (BufferedWriter writer =
分享:
扫描分享到社交APP
上一篇
下一篇