杰瑞科技汇

Java如何将UTF-8编码转为GB2312?

Of course. Converting a Java String from UTF-8 to GB2312 is a common task, especially when dealing with legacy systems or data that was created on a Chinese version of Windows.

Java如何将UTF-8编码转为GB2312?-图1
(图片来源网络,侵删)

The core concept is that a Java String is always stored internally as UTF-16. The "encoding" (like UTF-8 or GB2312) only comes into play when you need to serialize the string into a sequence of bytes (e.g., to write to a file, send over a network, or store in a database).

Here are the most common and effective ways to perform this conversion, from the most modern to the classic approach.


Method 1: Using StandardCharsets (Java 7+) - Recommended

This is the most modern, concise, and recommended way. It uses the StandardCharsets enum, which guarantees that the charset is supported by the JVM.

The process is:

Java如何将UTF-8编码转为GB2312?-图2
(图片来源网络,侵删)
  1. Get the bytes of your UTF-8 string using String.getBytes(StandardCharsets.UTF_8).
  2. Create a new String from those bytes, specifying the target encoding: new String(byteArray, StandardCharsets.GB2312).

Important Note: This method will replace any characters that are not supported by GB2312 with the Unicode replacement character (). This is generally the safest behavior to prevent data loss or crashes.

import java.nio.charset.StandardCharsets;
public class Utf8ToGb2312Converter {
    public static void main(String[] args) {
        // This string contains characters that are not in GB2312, like '€' and 'ñ'.
        String originalUtf8String = "你好,世界!Hello World! €ñ";
        System.out.println("Original UTF-8 String: " + originalUtf8String);
        System.out.println("Original UTF-8 Bytes: " + bytesToHex(originalUtf8String.getBytes(StandardCharsets.UTF_8)));
        // --- Conversion Process ---
        try {
            // 1. Get the bytes from the UTF-8 string
            byte[] utf8Bytes = originalUtf8String.getBytes(StandardCharsets.UTF_8);
            // 2. Create a new String from the bytes, specifying the GB2312 charset
            String gb2312String = new String(utf8Bytes, StandardCharsets.GB2312);
            System.out.println("\nConverted GB2312 String: " + gb2312String);
            System.out.println("Converted GB2312 Bytes: " + bytesToHex(gb2312String.getBytes(StandardCharsets.GB2312)));
        } catch (Exception e) {
            // StandardCharsets enum guarantees the charset is supported,
            // so this exception is unlikely unless StandardCharsets.GB2312 is removed from Java.
            e.printStackTrace();
        }
    }
    // Helper method to print byte arrays in a readable hex format
    private static String bytesToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        sb.append("[");
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        sb.append("]");
        return sb.toString();
    }
}

Output of the program:

Original UTF-8 String: 你好,世界!Hello World! €ñ
Original UTF-8 Bytes: [E4 BD A0 E5 A5 BD EF BC 8C E4 B8 96 E7 95 8C EF BC 81 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 20 E2 82 AC C3 B1 ]
Converted GB2312 String: 你好,世界!Hello World! �ñ
Converted GB2312 Bytes: [C4 E3 BA C3 A3 BA C3 CA A0 A3 BA CA A1 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 20 3F C3 B1 ]

Notice how the Euro symbol () was replaced with (or the replacement character internally).


Method 2: Using Charset.forName() (Pre-Java 7)

This method works on all versions of Java but is slightly more verbose. It's functionally identical to Method 1.

Java如何将UTF-8编码转为GB2312?-图3
(图片来源网络,侵删)

Note: This approach will throw an UnsupportedCharsetException if the JVM does not support GB2312. While most modern JVMs do, it's a possibility to be aware of.

import java.nio.charset.Charset;
import java.nio.charset.UnsupportedCharsetException;
public class Utf8ToGb2312ConverterLegacy {
    public static void main(String[] args) {
        String originalUtf8String = "你好,世界!Hello World!";
        System.out.println("Original UTF-8 String: " + originalUtf8String);
        try {
            // 1. Define the charsets
            Charset utf8Charset = Charset.forName("UTF-8");
            Charset gb2312Charset = Charset.forName("GB2312");
            // 2. Get the bytes from the UTF-8 string
            byte[] utf8Bytes = originalUtf8String.getBytes(utf8Charset);
            // 3. Create a new String from the bytes, specifying the GB2312 charset
            String gb2312String = new String(utf8Bytes, gb2312Charset);
            System.out.println("Converted GB2312 String: " + gb2312String);
        } catch (UnsupportedCharsetException e) {
            System.err.println("Error: GB2312 charset is not supported by this JVM.");
            e.printStackTrace();
        }
    }
}

Method 3: Using CharsetEncoder and CharsetDecoder (Advanced)

This is the most powerful and flexible method, giving you fine-grained control over the conversion process, especially for handling unsupported characters.

You can configure the encoder/decoder to:

  • Report errors: Throw an exception when an unmappable character is found.
  • Replace characters: Automatically replace unmappable characters (this is what String's constructor does by default).
  • Ignore characters: Silently skip unmappable characters.
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
public class Utf8ToGb2312Advanced {
    public static void main(String[] args) {
        String originalUtf8String = "你好,世界!Hello World! €ñ";
        System.out.println("Original UTF-8 String: " + originalUtf8String);
        Charset utf8Charset = Charset.forName("UTF-8");
        Charset gb2312Charset = Charset.forName("GB2312");
        // Create an encoder that converts from GB2312 to bytes
        CharsetEncoder encoder = gb2312Charset.newEncoder();
        // Configure the error handling strategy
        encoder.onMalformedInput(CodingErrorAction.REPORT); // Report malformed input
        encoder.onUnmappableCharacter(CodingErrorAction.REPLACE); // Replace unmappable chars
        // Create a decoder that converts from bytes to GB2312 characters
        CharsetDecoder decoder = gb2312Charset.newDecoder();
        try {
            // 1. Wrap the source string in a CharBuffer
            CharBuffer charBuffer = CharBuffer.wrap(originalUtf8String);
            // 2. Encode the CharBuffer to a ByteBuffer (this performs the conversion)
            ByteBuffer byteBuffer = encoder.encode(charBuffer);
            // 3. Decode the ByteBuffer back to a CharBuffer (to see the result)
            CharBuffer resultCharBuffer = decoder.decode(byteBuffer);
            // 4. Convert the CharBuffer back to a String
            String gb2312String = resultCharBuffer.toString();
            System.out.println("Converted GB2312 String: " + gb2312String);
        } catch (CharacterCodingException e) {
            System.err.println("Character coding error during conversion.");
            e.printStackTrace();
        }
    }
}

Summary: Which Method to Use?

Method Pros Cons Best For
StandardCharsets - Modern, concise, type-safe.
- Guaranteed to work on Java 7+.
- Clean and readable.
- Requires Java 7 or later. Most use cases in modern Java applications. This is the recommended default.
Charset.forName() - Works on all Java versions.
- Standard API.
- Can throw UnsupportedCharsetException.
- Slightly more verbose.
Legacy codebases or projects that must run on pre-Java 7 environments.
CharsetEncoder/Decoder - Most powerful and flexible.
- Granular control over error handling (REPORT, REPLACE, IGNORE).
- More complex and verbose.
- Overkill for simple conversions.
Advanced scenarios where you need to log errors on unmappable characters or implement custom conversion logic.
分享:
扫描分享到社交APP
上一篇
下一篇