Of course. Converting Java strings from Unicode (UTF-16) to GBK is a common task, especially when dealing with legacy systems, files, or network protocols that require this specific Chinese encoding.

Here’s a complete guide covering the concepts, standard methods, and best practices.
The Core Concept: What is Happening?
In Java, a String object is always stored internally as UTF-16. This is your "Unicode" representation.
When you "convert" a string to GBK, you are not changing the String object itself. Instead, you are encoding the characters from that String into a sequence of bytes that represents those characters according to the GBK character set.
The key class for this is java.nio.charset.CharsetEncoder.

Method 1: The Standard & Recommended Approach (using String.getBytes())
This is the most common and straightforward way to get a GBK-encoded byte array from a Java string.
You provide the GBK character set name to the getBytes() method.
Code Example
import java.nio.charset.StandardCharsets;
public class UnicodeToGbkExample {
public static void main(String[] args) {
// 1. Your original string in Java (stored as UTF-16 internally)
String unicodeString = "这是一个测试字符串,Hello, World! 123";
System.out.println("Original String: " + unicodeString);
System.out.println("Original String (hex bytes): " + bytesToHex(unicodeString.getBytes(StandardCharsets.UTF_16)));
System.out.println("-------------------------------------------");
try {
// 2. Encode the string into a GBK byte array
// This is the key step.
byte[] gbkBytes = unicodeString.getBytes("GBK");
System.out.println("Successfully encoded to GBK.");
System.out.println("GBK Byte Array length: " + gbkBytes.length);
System.out.println("GBK Bytes (hex): " + bytesToHex(gbkBytes));
// --- Verification: Decode the bytes back to a string ---
String decodedString = new String(gbkBytes, "GBK");
System.out.println("\nDecoded from GBK bytes: " + decodedString);
System.out.println("Are original and decoded strings equal? " + unicodeString.equals(decodedString));
} catch (java.io.UnsupportedEncodingException e) {
// This exception is thrown if the JVM doesn't support the "GBK" charset.
// On standard JVMs (like Oracle's or OpenJDK for Windows/Linux), this is rare.
System.err.println("GBK charset is not supported on this JVM.");
e.printStackTrace();
}
}
// Helper method to print byte arrays in a readable hex format
private static String bytesToHex(byte[] bytes) {
if (bytes == null) {
return "null";
}
StringBuilder sb = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
sb.append(String.format("%02X ", b));
}
return sb.toString();
}
}
Explanation
String unicodeString = "...": Creates aStringobject. Java handles the internal UTF-16 storage automatically.unicodeString.getBytes("GBK"): This is the crucial line.- The
getBytes(Charset charset)method (or the oldergetBytes(String charsetName)) iterates through the characters of theString. - For each character, it looks up its corresponding byte representation in the GBK character set.
- For ASCII characters like 'H', 'e', 'l', 'o', it uses a single byte (e.g., 'H' ->
0x48). - For Chinese characters like '这', it uses two bytes (e.g., '这' ->
0xD6 0xD0). - The result is a
byte[]array that you can write to a file, send over a network, etc.
- The
new String(gbkBytes, "GBK"): This demonstrates the reverse process (decoding). It takes the GBK byte array and reconstructs the originalStringobject.
Method 2: The More Robust & Flexible Approach (using Charset and CharsetEncoder)
If you need more control over the encoding process (e.g., handling characters that have no GBK equivalent), using CharsetEncoder is better. It allows you to specify an error-handling strategy.
Code Example
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
public class GbkEncoderExample {
public static void main(String[] args) {
String unicodeString = "你好,世界!"; // "Hello, World!" in Chinese
// Get the GBK Charset object
Charset gbkCharset = Charset.forName("GBK");
// Create an encoder with a specific error handling strategy
// REPLACE: Replace unmappable characters with a '?' (or the charset's default replacement)
// REPORT: Throw an exception when an unmappable character is found
// IGNORE: Silently drop unmappable characters
CharsetEncoder encoder = gbkCharset.newEncoder()
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE);
try {
// Use the encoder to convert the CharBuffer to a ByteBuffer
CharBuffer charBuffer = CharBuffer.wrap(unicodeString);
ByteBuffer byteBuffer = encoder.encode(charBuffer);
// Get the byte array from the ByteBuffer
byte[] gbkBytes = new byte[byteBuffer.remaining()];
byteBuffer.get(gbkBytes);
System.out.println("Original String: " + unicodeString);
System.out.println("Encoded to GBK (using CharsetEncoder): " + bytesToHex(gbkBytes));
// Decoding is straightforward
String decodedString = gbkCharset.newDecoder().decode(byteBuffer).toString();
System.out.println("Decoded back: " + decodedString);
} catch (CharacterCodingException e) {
System.err.println("Encoding failed: " + e.getMessage());
e.printStackTrace();
}
}
// Helper method to print byte arrays in a readable hex format
private static String bytesToHex(byte[] bytes) {
if (bytes == null) {
return "null";
}
StringBuilder sb = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
sb.append(String.format("%02X ", b));
}
return sb.toString();
}
}
When to use CharsetEncoder?
- Custom Error Handling: If you need to know if a character couldn't be encoded instead of having it silently replaced. The
CodingErrorActionenum (REPORT,REPLACE,IGNORE) gives you this power. - Streaming: If you are reading a large string from a stream, you can encode it chunk by chunk without loading the entire string into memory at once.
- Maximum Control: It's the lower-level API that
String.getBytes()uses under the hood.
Important Considerations
Handling Unsupported Characters
What happens if your string contains a character that doesn't exist in the GBK character set, like a Cyrillic 'Ж' or an emoji '😊'?

- With
String.getBytes("GBK"): By default, it will throw anCharacterCodingExceptionor replace the character, depending on the JVM's default configuration. This behavior can be unpredictable. - With
CharsetEncoder: You have full control. You can configure it to:CodingErrorAction.REPLACE: Replace the unmappable character with a default byte sequence (often a single byte). This is often the safest option.CodingErrorAction.IGNORE: Simply drop the character.CodingErrorAction.REPORT: Throw aCharacterCodingException. This is useful for debugging or when data integrity is critical.
JVM Support for GBK
For the conversion to work, the JVM must have a charset provider for GBK. This is almost always true for standard JVMs on Windows and Linux. However, on minimal or custom JVMs, it might not be available.
You can check if a charset is supported like this:
import java.nio.charset.Charset;
import java.nio.charset.UnsupportedCharsetException;
public class CheckGbkSupport {
public static void main(String[] args) {
try {
Charset.forName("GBK");
System.out.println("GBK charset is supported.");
} catch (UnsupportedCharsetException e) {
System.err.println("GBK charset is NOT supported on this JVM.");
}
}
}
Writing to a File
A very common use case is writing a string to a file in GBK encoding. The java.nio.file package makes this easy and efficient.
import java.io.BufferedWriter;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class WriteGbkFile {
public static void main(String[] args) {
String content = "这是将要写入GBK文件的文本。";
Path path = Paths.get("output_gbk.txt");
try (BufferedWriter writer = 