杰瑞科技汇

Java bytes与string如何高效转换?

Of course! This is a very common and important task in Java, especially when dealing with network protocols, file I/O, or cryptography. The core challenge is that a String in Java is a sequence of characters (Unicode), while a byte[] (array of bytes) is a sequence of 8-bit values.

Java bytes与string如何高效转换?-图1
(图片来源网络,侵删)

The conversion between them requires specifying a character encoding, which defines how characters are mapped to byte sequences and vice-versa.

Here’s a complete guide covering the best practices, common pitfalls, and code examples.


The Golden Rule: Always Specify the Encoding

Never rely on the platform's default encoding (e.g., String.getBytes() or new String(bytes) without a charset). This leads to the "It works on my machine" problem because different operating systems (Windows, Linux, macOS) can have different default encodings (like Cp1252, UTF-8, etc.).

Always be explicit about the character encoding. The modern, standard, and recommended encoding is UTF-8.

Java bytes与string如何高效转换?-图2
(图片来源网络,侵删)

Converting a String to a byte[]

You need to encode the characters of the string into a sequence of bytes.

Method 1: The Recommended Way (Using StandardCharsets)

This is the most modern and readable approach. It's available since Java 7.

import java.nio.charset.StandardCharsets;
public class StringToBytes {
    public static void main(String[] args) {
        String originalString = "Hello, 世界!"; // A string with non-ASCII characters
        // --- ENCODING: String to byte[] ---
        // Use the try-with-resources block to ensure the Charset is available.
        // StandardCharsets.UTF_8 is a constant, so this is generally safe.
        byte[] utf8Bytes = originalString.getBytes(StandardCharsets.UTF_8);
        System.out.println("Original String: " + originalString);
        System.out.println("Byte array length: " + utf8Bytes.length);
        System.out.println("Byte array content: " + java.util.Arrays.toString(utf8Bytes));
        // Output: [72, 101, 108, 108, 111, 44, 32, -28, -72, -83, -26, -113, -85, 33]
    }
}

Method 2: The Classic Way (Using Charset.forName)

This works on all Java versions (including 6 and older) and is still widely used.

import java.nio.charset.Charset;
public class StringToBytesClassic {
    public static void main(String[] args) {
        String originalString = "Hello, 世界!";
        // --- ENCODING: String to byte[] ---
        byte[] utf8Bytes = originalString.getBytes(Charset.forName("UTF-8"));
        System.out.println("Original String: " + originalString);
        System.out.println("Byte array length: " + utf8Bytes.length);
    }
}

What Happens with the Default Encoding? (The Pitfall)

If you call getBytes() without a charset, it uses the platform's default. This can cause data corruption if the bytes are read on a different system.

Java bytes与string如何高效转换?-图3
(图片来源网络,侵删)
// DANGEROUS - DO NOT DO THIS IN PRODUCTION
// The encoding is platform-dependent!
byte[] defaultBytes = originalString.getBytes(); 

Converting a byte[] to a String

You need to decode the sequence of bytes back into characters.

Method 1: The Recommended Way (Using StandardCharsets)

Again, this is the preferred method for modern Java.

import java.nio.charset.StandardCharsets;
public class BytesToString {
    public static void main(String[] args) {
        byte[] utf8Bytes = {
            72, 101, 108, 108, 111, 44, 32,
            (byte) 0xE4, (byte) 0xBD, (byte) 0xA0, // 你 in UTF-8
            (byte) 0xE5, (byte) 0xA5, (byte) 0xBD  // 好 in UTF-8
        };
        // --- DECODING: byte[] to String ---
        String decodedString = new String(utf8Bytes, StandardCharsets.UTF_8);
        System.out.println("Decoded String: " + decodedString); // Output: 你好
    }
}

Method 2: The Classic Way (Using Charset.forName)

import java.nio.charset.Charset;
public class BytesToStringClassic {
    public static void main(String[] args) {
        byte[] utf8Bytes = {72, 101, 108, 108, 111, 44, 32, -28, -72, -83, -26, -113, -85, 33};
        // --- DECODING: byte[] to String ---
        String decodedString = new String(utf8Bytes, Charset.forName("UTF-8"));
        System.out.println("Decoded String: " + decodedString); // Output: Hello, 世界!
    }
}

What Happens with the Default Encoding? (The Pitfall)

If you create a String from a byte[] without a charset, it will use the platform's default encoding, which can lead to or garbled characters if the bytes were not encoded with that default.

// DANGEROUS - DO NOT DO THIS IN PRODUCTION
// The encoding is platform-dependent!
String corruptedString = new String(utf8Bytes);

Advanced Handling: CharsetDecoder and CharsetEncoder

For more complex scenarios, like handling invalid bytes or partial data, you can use the java.nio.charset package's low-level encoder and decoder. This gives you fine-grained control over error handling.

Error Handling Strategies:

  • StandardCharsets.UTF_8: By default, it will replace malformed sequences with the Unicode replacement character ().
  • CharsetDecoder: You can specify an error handling mode.
    • REPORT: Throws a CharacterCodingException on malformed input.
    • IGNORE: Silently discards malformed input.
    • REPLACE: Replaces malformed input with the replacement character (default behavior).

Example: Using CharsetDecoder

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
public class AdvancedDecoding {
    public static void main(String[] args) throws CharacterCodingException {
        // A byte array with a malformed UTF-8 sequence
        // The byte (byte) 0xC0 is invalid on its own in UTF-8
        byte[] badUtf8Bytes = "H�llo".getBytes(StandardCharsets.ISO_8859_1); // Create some bad bytes
        System.out.println("Original byte array: " + java.util.Arrays.toString(badUtf8Bytes));
        // --- DECODING with explicit error handling ---
        // 1. Default behavior (REPLACE)
        String defaultString = new String(badUtf8Bytes, StandardCharsets.UTF_8);
        System.out.println("Default (REPLACE):   " + defaultString); // Output: H�llo
        // 2. Using a decoder to REPORT the error
        CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
        decoder.onMalformedInput(CodingErrorAction.REPORT); // Throw an exception on bad data
        decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
        try {
            CharBuffer charBuffer = decoder.decode(ByteBuffer.wrap(badUtf8Bytes));
            System.out.println("Decoder (REPORT):    " + charBuffer.toString());
        } catch (CharacterCodingException e) {
            System.err.println("Caught exception with REPORT: " + e.getMessage());
        }
        // 3. Using a decoder to IGNORE the error
        decoder.onMalformedInput(CodingErrorAction.IGNORE);
        CharBuffer ignoredBuffer = decoder.decode(ByteBuffer.wrap(badUtf8Bytes));
        System.out.println("Decoder (IGNORE):    " + ignoredBuffer.toString()); // Output: Hllo
    }
}

Summary: Which Method to Use?

Scenario Recommended Method Why?
General Purpose string.getBytes(StandardCharsets.UTF_8)
new String(bytes, StandardCharsets.UTF_8)
Modern, readable, safe, and efficient. Explicitly uses the standard UTF-8 encoding.
Legacy Code / Java 6 string.getBytes(Charset.forName("UTF-8"))
new String(bytes, Charset.forName("UTF-8"))
Works on older Java versions while still being explicit about the encoding.
Need Error Handling CharsetDecoder / CharsetEncoder Provides fine-grained control over how to handle malformed or unmappable byte sequences.
Never use string.getBytes()
new String(bytes)
Platform-dependent. Your code will break when deployed to environments with different default encodings.
分享:
扫描分享到社交APP
上一篇
下一篇