Java GBK转UTF-8如何正确实现编码转换？-杰瑞科技汇

Of course. Converting text from GBK to UTF-8 in Java is a common task, especially when dealing with data from legacy systems or applications in mainland China. The core of this process involves using Java's character encoding and decoding mechanisms.

（图片来源网络，侵删）

Here’s a comprehensive guide, from the basic concept to complete, runnable code examples.

The Core Concept: Reader & Writer vs. String

The most important thing to understand is that a String in Java is always stored in UTF-16 internally. It doesn't have an "encoding" itself. The encoding comes into play when you need to:

Decode a sequence of bytes (e.g., from a file or network) into a String.
Encode a String into a sequence of bytes (e.g., to write to a file or send over a network).

So, when you "convert GBK to UTF-8", you are actually:

Reading bytes that are encoded in GBK and creating a String (UTF-16).
Writing that String to a new location using the UTF-8 encoding.

Method 1: The Correct & Recommended Way (Using `InputStreamReader` and `OutputStreamWriter`)

This is the standard, most robust way to handle character encoding when reading from or writing to streams (files, network connections, etc.). It avoids loading the entire file into memory, making it suitable for large files.

（图片来源网络，侵删）

Scenario: Converting a File from GBK to UTF-8

Let's say you have a file named input_gbk.txt encoded in GBK.

import java.io.*;
import java.nio.charset.StandardCharsets;
public class GbkToUtf8Converter {
    public static void main(String[] args) {
        // 1. Define source and file paths
        String sourceFilePath = "path/to/your/input_gbk.txt"; // Your GBK encoded file
        String targetFilePath = "path/to/your/output_utf8.txt"; // The new UTF-8 file to be created
        // 2. Use try-with-resources to automatically close streams
        try (
            // Create an InputStream to read the raw bytes from the source file
            FileInputStream fis = new FileInputStream(sourceFilePath);
            // Wrap it in an InputStreamReader that decodes the bytes using the GBK charset
            InputStreamReader isr = new InputStreamReader(fis, "GBK");
            // Wrap the Reader in a BufferedReader for efficient line-by-line reading
            BufferedReader br = new BufferedReader(isr);
            // Create an OutputStream to write raw bytes to the target file
            FileOutputStream fos = new FileOutputStream(targetFilePath);
            // Wrap it in an OutputStreamWriter that encodes characters to UTF-8 bytes
            OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8);
            // Wrap the Writer in a BufferedWriter for efficient writing
            BufferedWriter bw = new BufferedWriter(osw)
        ) {
            String line;
            // 3. Read line by line from the GBK file
            while ((line = br.readLine()) != null) {
                // 4. Write each line to the UTF-8 file
                bw.write(line);
                // Add the newline character back, as readLine() strips it
                bw.newLine();
            }
            System.out.println("File converted successfully from GBK to UTF-8.");
        } catch (UnsupportedEncodingException e) {
            System.err.println("Error: The GBK encoding is not supported by this JVM.");
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            System.err.println("Error: One of the files was not found.");
            e.printStackTrace();
        } catch (IOException e) {
            System.err.println("An I/O error occurred during the conversion.");
            e.printStackTrace();
        }
    }
}

Explanation of the Code:

FileInputStream / FileOutputStream: These read and write raw bytes.
InputStreamReader(fis, "GBK"): This is the key part for reading. It takes the raw byte stream (fis) and uses the "GBK" character set to correctly interpret those bytes as characters, creating a Reader.
BufferedReader: A wrapper for efficiency, allowing us to read the file line by line with readLine().
OutputStreamWriter(fos, StandardCharsets.UTF_8): This is the key part for writing. It takes a Writer and a character set. It takes the characters we write and encodes them into bytes using the specified UTF-8 charset before writing them to the raw byte stream (fos).
StandardCharsets.UTF_8: It's best practice to use the StandardCharsets enum for common encodings like UTF-8, as it's guaranteed to be supported and is more type-safe than using a string literal.

Method 2: The In-Memory Way (Using `String` Constructors)

This method is simpler but consumes more memory because it loads the entire file content into a String object. It's only suitable for small files.

import java.io.*;
import java.nio.charset.StandardCharsets;
public class GbkToUtf8StringConverter {
    public static void main(String[] args) {
        String sourceFilePath = "path/to/your/input_gbk.txt";
        String targetFilePath = "path/to/your/output_utf8_inmemory.txt";
        try {
            // 1. Read all bytes from the GBK file into a byte array
            byte[] gbkBytes = readAllBytesOrExit(new File(sourceFilePath));
            // 2. Create a String from the byte array, specifying the source encoding (GBK)
            // This decodes the GBK bytes into a UTF-16 String.
            String content = new String(gbkBytes, "GBK");
            // 3. Get the UTF-8 bytes from the String, specifying the target encoding (UTF-8)
            // This encodes the UTF-16 String into UTF-8 bytes.
            byte[] utf8Bytes = content.getBytes(StandardCharsets.UTF_8);
            // 4. Write the UTF-8 bytes to the new file
            Files.write(Files.createFile(Paths.get(targetFilePath)), utf8Bytes);
            System.out.println("File converted successfully from GBK to UTF-8 (in-memory).");
        } catch (UnsupportedEncodingException e) {
            System.err.println("Error: The GBK encoding is not supported by this JVM.");
            e.printStackTrace();
        } catch (IOException e) {
            System.err.println("An I/O error occurred during the conversion.");
            e.printStackTrace();
        }
    }
    // Helper method to read all bytes from a file
    private static byte[] readAllBytesOrExit(File file) throws IOException {
        return Files.readAllBytes(file.toPath());
    }
}

Note: This example uses java.nio.file.Files, which is available in Java 7 and later. It's a very convenient utility for file operations.

Explanation of the Code:

new String(gbkBytes, "GBK"): This constructor takes a byte array and a charset name. It interprets the bytes using the "GBK" encoding and constructs a String.
content.getBytes(StandardCharsets.UTF_8): This method on the String object takes a charset and returns a byte array representing the string's characters encoded in that charset (in this case, UTF-8).

Important Prerequisite: JVM Support for GBK

For the code to work, your Java Virtual Machine (JVM) must have a font or library that supports the GBK character set. Most modern JVMs do include GBK support by default, especially those distributed for Windows or Chinese-speaking regions.

（图片来源网络，侵删）

If you get an UnsupportedEncodingException, it means your JVM doesn't know what "GBK" is. You might need to:

Ensure you are using a standard JVM distribution (like Oracle's or OpenJDK's).
On Linux, you might need to install a package like libcharset1 or ensure the sun.io converters are available.

Summary: Which Method to Use?

Feature	Method 1: `InputStreamReader` / `OutputStreamWriter`	Method 2: `String` Constructors
Memory Usage	Low. Processes data in chunks (line by line or buffer by buffer).	High. Loads the entire file into memory.
Performance	Good for large files.	Good for small files. Can be faster for tiny files due to less overhead.
Complexity	Slightly more verbose due to wrapping streams.	Very concise and easy to read.
Recommendation	Highly Recommended. The standard, production-ready way.	Use only for small files or quick scripts where memory is not a concern.

Java GBK转UTF-8如何正确实现编码转换？

The Core Concept: Reader & Writer vs. String

Method 1: The Correct & Recommended Way (Using `InputStreamReader` and `OutputStreamWriter`)

Scenario: Converting a File from GBK to UTF-8

Explanation of the Code:

Method 2: The In-Memory Way (Using `String` Constructors)

Explanation of the Code:

Important Prerequisite: JVM Support for GBK

Summary: Which Method to Use?

99ANYc3cd6

Python中except与raise如何协同处理异常？

mac系统如何从u盘启动？

ArrayList和数组有何区别？

HBase Thrift Python如何连接与使用？

Java数组和ArrayList到底该用哪个？

HDR Light Studio教程如何快速上手？

Python affected rows如何获取？

SolidWorks钣金焊件教程如何快速上手？

Java webservice教程，如何快速上手开发？

python async httpio

Python DataFrame如何使用zip函数处理数据？

Java如何实现WebService的XML交互？

finereader14破解教程

Python2与Python3如何共存安装？

Photoshop CS6教程PDF哪里找？怎么学？

Python编码解码如何正确处理字符与字节？

Java GBK转UTF-8如何正确实现编码转换？

The Core Concept: Reader & Writer vs. String

Method 1: The Correct & Recommended Way (Using InputStreamReader and OutputStreamWriter)

Scenario: Converting a File from GBK to UTF-8

Explanation of the Code:

Method 2: The In-Memory Way (Using String Constructors)

Explanation of the Code:

Important Prerequisite: JVM Support for GBK

Summary: Which Method to Use?

相关推荐

HDR Light Studio教程如何快速上手？

Method 1: The Correct & Recommended Way (Using `InputStreamReader` and `OutputStreamWriter`)

Method 2: The In-Memory Way (Using `String` Constructors)