杰瑞科技汇

Java String的hashCode为何特殊?

Of course! Let's dive deep into the hashCode() method for Java String objects. It's a fundamental concept in Java, especially for performance and data structures like HashMap.

Java String的hashCode为何特殊?-图1
(图片来源网络,侵删)

The Big Picture: What is a Hash Code?

A hash code is a numerical value (an integer) computed from an object's data. The primary goals are:

  1. Speed: It should be very fast to compute.
  2. Uniformity: For different objects, the hash codes should be well-distributed across the integer range to avoid collisions.

In Java, hashCode() is crucial for hash-based collections like HashMap, HashSet, and Hashtable. When you add an object to a HashMap, it uses the hash code to determine the "bucket" where the object should be stored. This allows for very fast lookups (average O(1) time complexity) instead of slow linear searches (O(n)).


The String.hashCode() Formula

The Java specification defines the exact formula for the String.hashCode() method. It's not arbitrary; it's a carefully chosen algorithm.

For a String s with length n and characters s[0] to s[n-1], the hash code h is calculated as:

Java String的hashCode为何特殊?-图2
(图片来源网络,侵删)
h = s[0] * 31^(n-1) + s[1] * 31^(n-2) + ... + s[n-1] * 31^0

Let's break this down:

  • Each character contributes: The hash code is a sum of the numeric values of each character in the string.
  • The "Magic Number" 31: The character at position i is multiplied by 31 raised to the power of (n - 1 - i). The number 31 is a prime number, which helps in distributing the hash values more evenly and reducing collisions. It's also a Mersenne prime (2⁵ - 1), which allows the compiler to optimize the multiplication 31 * i into a bit-shift and subtraction: (i << 5) - i. This makes the calculation very fast.
  • Initial Value: The calculation starts with an initial value of 0.

Example Calculation

Let's calculate the hash code for the string "ABC".

  1. s[0] = 'A' (ASCII value 65)
  2. s[1] = 'B' (ASCII value 66)
  3. s[2] = 'C' (ASCII value 67)
  4. n = 3

Using the formula: h = 65 * 31^(2) + 66 * 31^(1) + 67 * 31^(0) h = 65 * 961 + 66 * 31 + 67 * 1 h = 62465 + 2046 + 67 h = 64578

If you run System.out.println("ABC".hashCode()); in Java, you will get 64578.

Java String的hashCode为何特殊?-图3
(图片来源网络,侵删)

Implementation in Java Source Code

You can see the actual implementation in the OpenJDK source code for String.java. It looks something like this (simplified for clarity):

public int hashCode() {
    int h = hash; // A cached value to avoid re-computation
    if (h == 0 && value.length > 0) {
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + value[i]; // 'value' is the internal char array
        }
        hash = h; // Cache the result
    }
    return h;
}

Key points from the implementation:

  • Caching: The String class caches the hash code in a private int hash field. This is a performance optimization. Once the hash code is calculated for a string, it's stored and reused for all subsequent calls to hashCode(). This is safe because String objects are immutable. Once created, their content can never change, so their hash code will never change.
  • Lazy Calculation: The hash code is only calculated when it's first needed (h == 0).

Important Properties and Rules

When working with hashCode(), you must remember the general contract defined in the Object class, which String follows:

  1. Consistency: During the execution of an application, whenever hashCode is invoked on the same String object more than once, it must consistently return the same integer, provided no information used in equals comparisons on the object is modified. (This is guaranteed by String's immutability and caching).
  2. Collision Rule: If two objects are equals() according to their equals() method, they must have the same hash code.
    String s1 = new String("hello");
    String s2 = new String("hello");
    System.out.println(s1.equals(s2)); // true
    System.out.println(s1.hashCode() == s2.hashCode()); // MUST be true
  3. Non-Collision Rule (Not Required but Recommended): If two objects are not equals(), they do not need to have different hash codes. However, for good performance in hash-based collections, it's highly desirable that unequal objects have different hash codes. The String hash algorithm is very good at this.

Common Pitfall: Modifying a String's Hash Code

Since String is immutable, you can't change its content after creation. However, if you were to create a mutable object (like a StringBuilder) and then use it as a key in a HashMap, you could run into a serious problem.

// WARNING: This is a BAD PRACTICE example!
StringBuilder sb = new StringBuilder("key");
Map<StringBuilder, String> map = new HashMap<>();
// 1. Put the StringBuilder in the map
map.put(sb, "value1");
int originalHash = sb.hashCode(); // Let's say it's 12345
// 2. Modify the StringBuilder
sb.append("_modified");
// 3. Try to get the value from the map
// The map will look for the object in the bucket for hash 12345.
// But the object's content has changed, and its hash code is now different!
// The map will not find it.
String value = map.get(sb); // Returns NULL!
// The "entry" is now effectively "stuck" in the map and cannot be retrieved!
// This is why using mutable objects as HashMap keys is dangerous.

Summary: String.hashCode() vs Object.hashCode()

Feature Object.hashCode() (Default) String.hashCode() (Overridden)
Purpose A generic identity-based hash. A content-based hash.
Calculation Typically derived from the object's memory address. Calculated from the sequence of characters using the 31 multiplier formula.
Performance Very fast (often a memory address). Fast, but requires a loop through the characters. Caching makes it O(1) after the first call.
Use Case Used when you need a unique identifier for an object instance. Essential for using String objects as keys in hash-based collections like HashMap.

In conclusion, String.hashCode() is a well-designed, content-based hash function that is fundamental to the performance of Java's collection framework. Its immutability allows for safe and efficient caching of the hash value.

分享:
扫描分享到社交APP
上一篇
下一篇