Of course! Let's dive deep into the hashCode() method for Java String objects. It's a fundamental concept in Java, especially for performance and data structures like HashMap.

The Big Picture: What is a Hash Code?
A hash code is a numerical value (an integer) computed from an object's data. The primary goals are:
- Speed: It should be very fast to compute.
- Uniformity: For different objects, the hash codes should be well-distributed across the integer range to avoid collisions.
In Java, hashCode() is crucial for hash-based collections like HashMap, HashSet, and Hashtable. When you add an object to a HashMap, it uses the hash code to determine the "bucket" where the object should be stored. This allows for very fast lookups (average O(1) time complexity) instead of slow linear searches (O(n)).
The String.hashCode() Formula
The Java specification defines the exact formula for the String.hashCode() method. It's not arbitrary; it's a carefully chosen algorithm.
For a String s with length n and characters s[0] to s[n-1], the hash code h is calculated as:

h = s[0] * 31^(n-1) + s[1] * 31^(n-2) + ... + s[n-1] * 31^0
Let's break this down:
- Each character contributes: The hash code is a sum of the numeric values of each character in the string.
- The "Magic Number" 31: The character at position
iis multiplied by31raised to the power of(n - 1 - i). The number 31 is a prime number, which helps in distributing the hash values more evenly and reducing collisions. It's also a Mersenne prime (2⁵ - 1), which allows the compiler to optimize the multiplication31 * iinto a bit-shift and subtraction:(i << 5) - i. This makes the calculation very fast. - Initial Value: The calculation starts with an initial value of
0.
Example Calculation
Let's calculate the hash code for the string "ABC".
s[0]= 'A' (ASCII value 65)s[1]= 'B' (ASCII value 66)s[2]= 'C' (ASCII value 67)n= 3
Using the formula:
h = 65 * 31^(2) + 66 * 31^(1) + 67 * 31^(0)
h = 65 * 961 + 66 * 31 + 67 * 1
h = 62465 + 2046 + 67
h = 64578
If you run System.out.println("ABC".hashCode()); in Java, you will get 64578.

Implementation in Java Source Code
You can see the actual implementation in the OpenJDK source code for String.java. It looks something like this (simplified for clarity):
public int hashCode() {
int h = hash; // A cached value to avoid re-computation
if (h == 0 && value.length > 0) {
for (int i = 0; i < value.length; i++) {
h = 31 * h + value[i]; // 'value' is the internal char array
}
hash = h; // Cache the result
}
return h;
}
Key points from the implementation:
- Caching: The
Stringclass caches the hash code in a privateint hashfield. This is a performance optimization. Once the hash code is calculated for a string, it's stored and reused for all subsequent calls tohashCode(). This is safe becauseStringobjects are immutable. Once created, their content can never change, so their hash code will never change. - Lazy Calculation: The hash code is only calculated when it's first needed (
h == 0).
Important Properties and Rules
When working with hashCode(), you must remember the general contract defined in the Object class, which String follows:
- Consistency: During the execution of an application, whenever
hashCodeis invoked on the sameStringobject more than once, it must consistently return the same integer, provided no information used inequalscomparisons on the object is modified. (This is guaranteed byString's immutability and caching). - Collision Rule: If two objects are
equals()according to theirequals()method, they must have the same hash code.String s1 = new String("hello"); String s2 = new String("hello"); System.out.println(s1.equals(s2)); // true System.out.println(s1.hashCode() == s2.hashCode()); // MUST be true - Non-Collision Rule (Not Required but Recommended): If two objects are not
equals(), they do not need to have different hash codes. However, for good performance in hash-based collections, it's highly desirable that unequal objects have different hash codes. TheStringhash algorithm is very good at this.
Common Pitfall: Modifying a String's Hash Code
Since String is immutable, you can't change its content after creation. However, if you were to create a mutable object (like a StringBuilder) and then use it as a key in a HashMap, you could run into a serious problem.
// WARNING: This is a BAD PRACTICE example!
StringBuilder sb = new StringBuilder("key");
Map<StringBuilder, String> map = new HashMap<>();
// 1. Put the StringBuilder in the map
map.put(sb, "value1");
int originalHash = sb.hashCode(); // Let's say it's 12345
// 2. Modify the StringBuilder
sb.append("_modified");
// 3. Try to get the value from the map
// The map will look for the object in the bucket for hash 12345.
// But the object's content has changed, and its hash code is now different!
// The map will not find it.
String value = map.get(sb); // Returns NULL!
// The "entry" is now effectively "stuck" in the map and cannot be retrieved!
// This is why using mutable objects as HashMap keys is dangerous.
Summary: String.hashCode() vs Object.hashCode()
| Feature | Object.hashCode() (Default) |
String.hashCode() (Overridden) |
|---|---|---|
| Purpose | A generic identity-based hash. | A content-based hash. |
| Calculation | Typically derived from the object's memory address. | Calculated from the sequence of characters using the 31 multiplier formula. |
| Performance | Very fast (often a memory address). | Fast, but requires a loop through the characters. Caching makes it O(1) after the first call. |
| Use Case | Used when you need a unique identifier for an object instance. | Essential for using String objects as keys in hash-based collections like HashMap. |
In conclusion, String.hashCode() is a well-designed, content-based hash function that is fundamental to the performance of Java's collection framework. Its immutability allows for safe and efficient caching of the hash value.
