杰瑞科技汇

Java中String split()方法如何正确使用?

Of course! In Java, "string split" most commonly refers to the split() method, which is used to break a string into an array of substrings based on a specified delimiter (a regular expression).

Java中String split()方法如何正确使用?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the basics, common use cases, and important considerations.


The split() Method

The split() method is part of the String class in Java. Its most common signature is:

public String[] split(String regex)
  • regex: A regular expression that acts as the delimiter. This is a crucial point—the delimiter is a regex, not just a simple character.

Simple Example: Splitting by a Comma

Let's split a simple comma-separated string (CSV).

public class SplitExample {
    public static void main(String[] args) {
        String text = "apple,banana,cherry,date";
        String[] fruits = text.split(",");
        // The result is an array of strings
        System.out.println("Original String: " + text);
        System.out.println("Split Result: " + java.util.Arrays.toString(fruits));
        // You can loop through the array
        System.out.println("\nIndividual Fruits:");
        for (String fruit : fruits) {
            System.out.println(fruit);
        }
    }
}

Output:

Java中String split()方法如何正确使用?-图2
(图片来源网络,侵删)
Original String: apple,banana,cherry,date
Split Result: [apple, banana, cherry, date]
Individual Fruits:
apple
banana
cherry
date

Important Parameters and Variations

The split() method has an overloaded version that is very useful:

public String[] split(String regex, int limit)

The limit parameter controls the number of substrings produced.

  • limit > 0: The array will contain at most limit substrings. The string will be split at most limit - 1 times.
  • limit < 0: The array can have any number of substrings (same as not providing a limit).
  • limit = 0: This is a special case. The trailing empty strings are discarded from the result array.

Example with the limit Parameter

public class SplitLimitExample {
    public static void main(String[] args) {
        String text = "a,b,c,d,e";
        // Case 1: limit > 0 (e.g., 3)
        // Splits at most 2 times, resulting in 3 parts.
        String[] parts1 = text.split(",", 3);
        System.out.println("Limit 3: " + java.util.Arrays.toString(parts1)); // [a, b, c,d,e]
        // Case 2: limit < 0 (e.g., -1)
        // No limit, same as text.split(",")
        String[] parts2 = text.split(",", -1);
        System.out.println("Limit -1: " + java.util.Arrays.toString(parts2)); // [a, b, c, d, e]
        // Case 3: limit = 0
        // Discards trailing empty strings.
        String textWithEmpty = "a,b,,c,";
        String[] parts3 = textWithEmpty.split(",", 0);
        System.out.println("Limit 0: " + java.util.Arrays.toString(parts3)); // [a, b, , c]
        String[] parts4 = textWithEmpty.split(",", -1); // For comparison
        System.out.println("Limit -1 (for comparison): " + java.util.Arrays.toString(parts4)); // [a, b, , c, ]
    }
}

Output:

Limit 3: [a, b, c,d,e]
Limit -1: [a, b, c, d, e]
Limit 0: [a, b, , c]
Limit -1 (for comparison): [a, b, , c, ]

Handling Special Characters (Regex Metacharacters)

This is the most common pitfall for beginners. Since split() uses a regular expression, characters like , , , , , ^, , , , [, , \ have special meanings.

Java中String split()方法如何正确使用?-图3
(图片来源网络,侵删)

If you want to split by one of these characters, you must escape it with a backslash (\).

In a regular Java string, a backslash is itself an escape character. So, to represent a single backslash \ in the string, you need to write \\. This means to escape a regex metacharacter, you need two backslashes.

Example: Splitting by a Dot ()

The in regex means "any character". To split by a literal dot, you must escape it: "\\.".

public class SplitSpecialCharExample {
    public static void main(String[] args) {
        String sentence = "www.example.com";
        // WRONG: This will split by every character!
        // String[] partsWrong = sentence.split(".");
        // CORRECT: Escape the dot with "\\"
        String[] partsCorrect = sentence.split("\\.");
        System.out.println("Original: " + sentence);
        System.out.println("Split by dot: " + java.util.Arrays.toString(partsCorrect));
    }
}

Output:

Original: www.example.com
Split by dot: [www, example, com]

Other Special Characters

Delimiter Correct split() Argument Why?
"\\." means "any character" in regex.
\ "\\\\" \ is an escape character in Java strings. \\ becomes a single \ in the regex, which is then used to escape the next character. To match a literal \, you need \\\\.
"\\|" means "OR" in regex.
"\\*" means "zero or more" of the preceding character.
"\\+" means "one or more" of the preceding character.
"\\?" means "zero or one" of the preceding character.
"\\(" starts a capturing group.
"\\)" ends a capturing group.
[ "\\[" [ starts a character class.
"\\{" starts a quantifier (e.g., {3,5}).

Advanced: Splitting by Multiple Delimiters

You can use a "regex character class" to split by multiple different characters at once. A character class is defined by characters inside square brackets [].

Example: Splitting by Comma, Semicolon, or Space

Let's split a string that uses different separators.

public class SplitMultipleDelimitersExample {
    public static void main(String[] args) {
        String data = "apple, banana;cherry orange,grape";
        // The regex "[,; ]" means: split by a comma OR a semicolon OR a space.
        String[] items = data.split("[,; ]");
        System.out.println("Original: " + data);
        System.out.println("Split by comma, semicolon, or space: " + java.util.Arrays.toString(items));
    }
}

Output:

Original: apple, banana;cherry orange,grape
Split by comma, semicolon, or space: [apple, banana, cherry, orange, grape]

Common Pitfalls and Best Practices

Pitfall 1: Leading/Trailing Delimiters

If your string starts or ends with the delimiter, you might get empty strings in your array.

String text = ",apple,banana,,"; // Leading and trailing commas
String[] parts = text.split(",");
// Result: [, apple, banana, , , ]
// Notice the empty strings at the beginning and end.

If you want to remove these empty strings, you can use limit = 0.

String[] partsClean = text.split(",", 0); // Result: [ apple, banana, ]
// Note: The space after the comma in " apple" is part of the string.
// To trim whitespace, see the "Best Practice" below.

Pitfall 2: Consecutive Delimiters

If there are two or more delimiters in a row, you will get empty strings for the gaps between them.

String text = "a,,b"; // Two commas in a row
String[] parts = text.split(",");
// Result: [a, , b]

Best Practice: Trimming Whitespace

When splitting user input or data files, it's common to have extra whitespace around the delimiter. A great way to handle this is to use a "positive lookbehind" and "positive lookahead" in your regex. This allows you to split on the delimiter but not include it in the resulting strings.

The regex \\s*,\\s* means:

  • \\s*: Matches zero or more whitespace characters.
  • Matches the literal comma.
  • \\s*: Matches zero or more whitespace characters.
public class SplitAndTrimExample {
    public static void main(String[] args) {
        String text = " apple , banana , cherry ";
        // This regex handles spaces around the comma
        String[] fruits = text.split("\\s*,\\s*");
        // The result is clean, trimmed strings
        System.out.println("Original: '" + text + "'");
        System.out.println("Split and trimmed: " + java.util.Arrays.toString(fruits));
    }
}

Output:

Original: ' apple , banana , cherry '
Split and trimmed: [apple, banana, cherry]

Alternative: StringTokenizer

Before Java 1.4, StringTokenizer was the primary way to split strings. It's generally considered outdated now for several reasons:

  • It doesn't support regular expressions.
  • It has a less intuitive API.
  • It treats consecutive delimiters as one delimiter by default, which is often not what you want.

You should prefer split() unless you have a very specific reason to use StringTokenizer (e.g., maintaining legacy code).

Example of StringTokenizer (for comparison)

import java.util.StringTokenizer;
public class StringTokenizerExample {
    public static void main(String[] args) {
        String text = "apple,banana,cherry";
        StringTokenizer tokenizer = new StringTokenizer(text, ",");
        // You must manually loop and count
        int count = tokenizer.countTokens();
        String[] fruits = new String[count];
        int i = 0;
        while (tokenizer.hasMoreTokens()) {
            fruits[i++] = tokenizer.nextToken();
        }
        System.out.println("Result with StringTokenizer: " + java.util.Arrays.toString(fruits));
    }
}

Summary: When to Use split()

Use Case Recommended Method Why?
Split a string by a single, simple character (e.g., , , ). text.split(",") Simple, readable, and effective.
Split by a special regex character (e.g., , , ). text.split("\\.") Correctly escapes the metacharacter.
Split by multiple different delimiters (e.g., , , space). text.split("[,; ]") Uses a regex character class for efficiency.
Split and trim whitespace from the results. text.split("\\s*,\\s*") A concise regex that handles the whitespace.
Split a string by a complex regex pattern. text.split("your-complex-regex") split() is built on regex, making it powerful.
Need to split a string by a fixed length (e.g., every 3 characters). text.split("(?<=\\G.{3})") A clever but advanced regex trick. Looping is often clearer.
分享:
扫描分享到社交APP
上一篇
下一篇