Java Unicode如何转UTF-8？-杰瑞科技汇

Of course! This is a very common and important topic in Java. Let's break it down clearly.

（图片来源网络，侵删）

The key thing to understand first is that Java internally uses UTF-16 to represent strings. When you talk about a "Java Unicode string," you're almost always talking about a java.lang.String object, which is already stored in UTF-16.

The real question you're asking is: "How do I convert this internal UTF-16 representation into a sequence of UTF-8 bytes?"

Here’s a complete guide covering the different scenarios.

The Core Concept: `String` -> `byte[]` (The Standard Way)

The most common task is to get a UTF-8 byte array from a String. You do this using the String.getBytes() method, specifying the StandardCharsets.UTF_8 character set.

（图片来源网络，侵删）

Method Signature:

byte[] getBytes(StandardCharsets charset)

Example:

This is the standard, recommended approach for converting a String to a UTF-8 byte array.

import java.nio.charset.StandardCharsets;
public class UnicodeToUtf8 {
    public static void main(String[] args) {
        // Our input string with various characters
        // 'A' (ASCII), 'é' (Latin), '€' (Euro symbol), '你' (Chinese)
        String originalString = "A é € 你";
        System.out.println("Original String: " + originalString);
        System.out.println("Original String length (chars): " + originalString.length());
        // Convert the String to a UTF-8 byte array
        byte[] utf8Bytes = originalString.getBytes(StandardCharsets.UTF_8);
        System.out.println("\nUTF-8 Byte Array:");
        // Print the bytes in a readable format
        for (byte b : utf8Bytes) {
            System.out.printf("%02X ", b);
        }
        System.out.println("\nByte Array Length: " + utf8Bytes.length);
        // --- Verification: Convert back to String ---
        String decodedString = new String(utf8Bytes, StandardCharsets.UTF_8);
        System.out.println("\nDecoded String from UTF-8 bytes: " + decodedString);
        System.out.println("Are strings equal? " + originalString.equals(decodedString));
    }
}

Output:

Original String: A é € 你
Original String length (chars): 7
UTF-8 Byte Array:
41 20 C3 A9 20 E2 82 AC 20 E4 BD A0 
Byte Array Length: 13
Decoded String from UTF-8 bytes: A é € 你
Are strings equal? true

Why does the byte array length (13) differ from the string length (7)?

A -> 41 (1 byte)
` (space) ->20` (1 byte)
-> C3 A9 (2 bytes, because it's not in the ASCII range)
` (space) ->20` (1 byte)
-> E2 82 AC (3 bytes)
` (space) ->20` (1 byte)
你 -> E4 BD A0 (3 bytes)

Total: 1 + 1 + 2 + 1 + 3 + 1 + 3 = 13 bytes. This demonstrates how UTF-8 uses a variable number of bytes (1 to 4) to encode characters.

The "Old Way" (Not Recommended)

Before Java 7, you had to use the Charset class, which was more verbose.

（图片来源网络，侵删）

import java.nio.charset.Charset;
public class UnicodeToUtf8OldWay {
    public static void main(String[] args) {
        String originalString = "A é € 你";
        // This is the pre-Java 7 way, less efficient and more verbose
        Charset utf8Charset = Charset.forName("UTF-8");
        byte[] utf8Bytes = originalString.getBytes(utf8Charset);
        System.out.println("Byte Array (Old Way): " + java.util.Arrays.toString(utf8Bytes));
    }
}

Why is StandardCharsets.UTF_8 better?

Type Safety: It's a compile-time constant. Charset.forName("UTF-8") can throw a IllegalCharsetException at runtime if the name is misspelled.
Performance: The StandardCharsets are pre-defined and guaranteed to be available, so the JVM can optimize access.
Readability: It's self-documenting and clear.

Writing UTF-8 to a File

A very common use case is to write a String directly to a file in UTF-8 encoding. The best tool for this is Files.write() from the NIO (New I/O) API.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class WriteUtf8ToFile {
    public static void main(String[] args) {
        String content = "This file contains UTF-8 text: é € 你";
        Path path = Paths.get("output.txt");
        try {
            // The StandardCharsets.UTF_8 argument is crucial!
            Files.write(path, content.getBytes(StandardCharsets.UTF_8));
            System.out.println("File 'output.txt' written successfully in UTF-8.");
        } catch (IOException e) {
            System.err.println("Error writing to file: " + e.getMessage());
        }
    }
}

If you were using older FileOutputStream and OutputStreamWriter, it would look like this (the NIO way is preferred):

// Old, more verbose way
try (FileOutputStream fos = new FileOutputStream("output_old.txt");
     OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8)) {
    osw.write(content);
} catch (IOException e) {
    e.printStackTrace();
}

Reading UTF-8 from a File

Naturally, you'll also need to read files. Use Files.readAllBytes() and then create a String from the bytes.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class ReadUtf8FromFile {
    public static void main(String[] args) {
        Path path = Paths.get("output.txt"); // The file we created earlier
        try {
            // Read all bytes from the file
            byte[] fileBytes = Files.readAllBytes(path);
            // Convert the UTF-8 byte array back to a String
            String content = new String(fileBytes, StandardCharsets.UTF_8);
            System.out.println("Content read from file:");
            System.out.println(content);
        } catch (IOException e) {
            System.err.println("Error reading from file: " + e.getMessage());
        }
    }
}

Advanced: Handling Invalid UTF-8 Sequences

What happens if you read a byte array that is not valid UTF-8? By default, Java will throw a MalformedInputException.

To handle this gracefully, you can use a CharsetDecoder.

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
public class HandleInvalidUtf8 {
    public static void main(String[] args) {
        // Create a byte array that is NOT valid UTF-8
        // The byte 0xFF is invalid in UTF-8 sequences
        byte[] invalidBytes = "Hello \uFFFD World".getBytes(StandardCharsets.ISO_8859_1); // Get bytes from a different encoding
        // Let's manually create an invalid sequence
        byte[] trulyInvalidBytes = {(byte) 0xFF, (byte) 0xFF, (byte) 0xFF}; 
        System.out.println("Attempting to decode invalid bytes...");
        // Configure the decoder to replace invalid sequences instead of failing
        CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder()
                .onMalformedInput(CodingErrorAction.REPLACE) // Replace bad sequences with the Unicode replacement character (U+FFFD)
                .onUnmappableCharacter(CodingErrorAction.REPLACE);
        try {
            CharBuffer charBuffer = decoder.decode(ByteBuffer.wrap(trulyInvalidBytes));
            String result = charBuffer.toString();
            System.out.println("Decoded result with REPLACE: " + result);
            System.out.println("Contains replacement char? " + result.contains("\uFFFD"));
        } catch (CharacterCodingException e) {
            System.err.println("Decoding failed even with replacement strategy: " + e.getMessage());
        }
    }
}

Common CodingErrorActions:

REPORT (default): Throws an exception.
IGNORE: Drops the malformed input and continues.
REPLACE: Replaces the malformed input with a default character (usually \uFFFD).

Summary

Task	Recommended Method	Key Point
String to UTF-8 byte array	`my

Java Unicode如何转UTF-8？

The Core Concept: `String` -> `byte[]` (The Standard Way)

Method Signature:

Example:

Output:

The "Old Way" (Not Recommended)

Writing UTF-8 to a File

Reading UTF-8 from a File

Advanced: Handling Invalid UTF-8 Sequences

Summary

99ANYc3cd6

Linux下如何执行Python脚本文件？

sourceinsight教程

Python jieba库具体怎么用？

matlab2025b安装教程

Mac下Python运行报错怎么办？

androidkiller安装教程

ec6108v8机顶盒破解教程

Python precondition如何有效应用与优化？

mastercam2025教程

Python allowedhosts如何配置与使用？

assertRaises 如何正确捕获 Python 异常？

AI如何掌握Illustrator高级技巧？

autoform3.1安装教程

Windows驱动开发视频教程如何快速入门？

Java double转int会丢失精度吗？

如何用adb刷入recovery？

Java Unicode如何转UTF-8？

The Core Concept: String -> byte[] (The Standard Way)

Method Signature:

Example:

Output:

The "Old Way" (Not Recommended)

Writing UTF-8 to a File

Reading UTF-8 from a File

Advanced: Handling Invalid UTF-8 Sequences

Summary

相关推荐

androidkiller安装教程

The Core Concept: `String` -> `byte[]` (The Standard Way)