杰瑞科技汇

Java StringTokenizer如何正确分割字符串?

Of course! Let's dive deep into java.util.StringTokenizer. It's a classic Java class, but understanding its role is key to appreciating the evolution of Java's APIs.

Java StringTokenizer如何正确分割字符串?-图1
(图片来源网络,侵删)

What is StringTokenizer?

StringTokenizer is a legacy class in Java that was used to break a string into tokens based on a set of delimiter characters. It was one of the earliest ways to parse strings in Java.

Think of it like a pair of scissors: you give it a string (the paper) and a set of delimiter characters (the cut marks), and it cuts the string into smaller pieces for you.

Key Characteristics:

  • Legacy: It's considered an old, legacy class. The official Java documentation states: "The string tokenizer class allows an application to break a string into tokens. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings. Also, the set of delimiter characters is specified, and the delimiters themselves are not returned. ... The StringTokenizer class is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead."
  • Simple: It's very straightforward to use for basic tasks.
  • Stateful: It maintains an internal pointer to the current position in the string as it tokenizes.

Basic Usage and Example

Let's look at a simple example. We'll tokenize the string "apple,banana,cherry" using a comma as a delimiter.

Java StringTokenizer如何正确分割字符串?-图2
(图片来源网络,侵删)
import java.util.StringTokenizer;
public class StringTokenizerExample {
    public static void main(String[] args) {
        String str = "apple,banana,cherry";
        String delimiter = ",";
        // Create a StringTokenizer object
        StringTokenizer tokenizer = new StringTokenizer(str, delimiter);
        // Check if there are more tokens
        System.out.println("Number of tokens: " + tokenizer.countTokens());
        // Loop through the tokens
        while (tokenizer.hasMoreTokens()) {
            // Get the next token
            String token = tokenizer.nextToken();
            System.out.println("Token: " + token);
        }
        // After the loop, the tokenizer is exhausted
        System.out.println("Are there more tokens? " + tokenizer.hasMoreTokens()); // false
    }
}

Output:

Number of tokens: 3
Token: apple
Token: banana
Token: cherry
Are there more tokens? false

Key Methods

Here are the most important methods in the StringTokenizer class:

Method Description
StringTokenizer(String str, String delim) Constructor. Creates a tokenizer for the given string str using delim as the set of delimiter characters.
StringTokenizer(String str, String delim, boolean returnDelims) Constructor. If returnDelims is true, delimiter characters are also returned as tokens.
boolean hasMoreTokens() Returns true if there is at least one token more to be parsed from the string.
String nextToken() Returns the next token from the string.
String nextToken(String newDelim) Returns the next token and changes the delimiter for the rest of the tokenization process.
int countTokens() Returns the total number of tokens that can be retrieved from the current tokenizing position.

Advanced Features

Multiple Delimiters

You can specify a string of multiple characters to act as delimiters. For example, using to split on spaces and commas.

String str = "one,two three;four";
StringTokenizer tokenizer = new StringTokenizer(str, " ,;");
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}
// Output:
// one
// two
// three
// four

Returning Delimiters as Tokens

The third constructor allows you to treat the delimiters themselves as tokens. This is useful if you need to preserve the structure of the original string.

Java StringTokenizer如何正确分割字符串?-图3
(图片来源网络,侵删)
String str = "1+2=3";
// We want to keep the '+' and '=' as tokens
StringTokenizer tokenizer = new StringTokenizer(str, "+=", true);
while (tokenizer.hasMoreTokens()) {
    System.out.println("Token: '" + tokenizer.nextToken() + "'");
}
// Output:
// Token: '1'
// Token: '+'
// Token: '2'
// Token: '='
// Token: '3'

The Modern Alternative: String.split()

For over a decade, the recommended way to split strings in Java has been the split() method of the String class itself. It's more powerful, flexible, and generally easier to use.

Why split() is often preferred:

  1. Simplicity: It's a method call on the string itself, not a separate class.
  2. Returns an Array: It gives you a clean String[] array, which is a standard and easy-to-work-with data structure.
  3. Regular Expression Support: The delimiter in split() is a regular expression. This gives you immense power to define complex splitting rules.

Example with String.split():

String str = "apple,banana,cherry";
String[] fruits = str.split(",");
for (String fruit : fruits) {
    System.out.println("Fruit: " + fruit);
}
// Output:
// Fruit: apple
// Fruit: banana
// Fruit: cherry

Example with a more complex regex: Let's split on one or more commas or spaces.

String str = "apple,, banana  ,cherry";
// The regex ",\\s+" means: one or more commas followed by one or more whitespace characters.
String[] items = str.split(",\\s+");
for (String item : items) {
    System.out.println("Item: " + item);
}
// Output:
// Item: apple
// Item: banana
// Item: cherry

StringTokenizer vs. String.split() vs. Scanner

Here’s a quick comparison to help you choose the right tool.

Feature StringTokenizer String.split() java.util.Scanner
Primary Use Simple, old-school tokenization. Modern standard for splitting strings. Parsing formatted input from sources like files or strings.
Return Type Iterates tokens one by one. String[] array. Iterates tokens one by one.
Delimiters Fixed set of characters. Regular Expressions. Regular Expressions. Can be changed dynamically.
State Stateful (remembers position). Stateless (returns a new array each time). Stateful (remembers position).
Additional Features Can return delimiters as tokens. N/A Can parse primitive types (nextInt(), nextDouble(), etc.).
Performance Generally faster for simple tasks. Can be slower due to regex overhead and array creation. Slower than the other two, but very flexible.
Recommendation Avoid in new code. Use for legacy compatibility. Best choice for simple string splitting. Best choice for parsing complex input streams.

When (If Ever) Should You Use StringTokenizer?

Given the strong recommendations against it, you should only use StringTokenizer in one of these rare scenarios:

  1. Maintaining Legacy Code: You are working on an old codebase that already uses it, and you need to maintain consistency.
  2. Android Development (Historically): In older versions of Android, StringTokenizer was sometimes preferred for performance-critical loops over String.split() because it avoided creating a temporary array. This is rarely a concern with modern Android runtimes and JIT compilers.
  3. Very Simple, High-Performance Loop: In a micro-benchmark where you've proven that StringTokenizer's iterator-style loop is measurably faster than split() for your specific, simple case. This is an edge case.

Conclusion

StringTokenizer is a piece of Java history. It was useful in its time but has been superseded by more powerful and flexible tools.

  • For splitting a string into an array, use String.split().
  • For parsing complex input (like reading from a file), use Scanner.
  • Only use StringTokenizer when you absolutely have to for compatibility with old code.
分享:
扫描分享到社交APP
上一篇
下一篇