Of course! Let's dive deep into searchsorted, a powerful and highly useful function from the NumPy library.

What is searchsorted?
numpy.searchsorted(a, v, side='left', sorter=None) is a function that finds the insertion points for a set of values v into a sorted array a.
Think of it like this: You have a sorted bookshelf (array a), and you want to insert a new book (value v). searchsorted tells you exactly which position (index) on the shelf the new book should go in to keep the shelf sorted.
The Core Concept: Insertion Point
The key idea is the insertion point. For a given value v, the insertion point is the index in the sorted array a where v should be inserted to maintain the sorted order.
The behavior of searchsorted is controlled by the side parameter, which is crucial to understand.

side='left' (Default)
This finds the insertion point before any existing entries of v.
Rule: The returned index i is the smallest index such that a[i] >= v.
Example:
import numpy as np a = np.array([1, 3, 4, 4, 6, 8])
Value v to Insert |
Sorted Array a |
Insertion Point (Index) | Resulting Array |
|---|---|---|---|
2 |
[1, **3**, 4, 4, 6, 8] |
1 |
[1, 2, 3, ...] |
4 |
[1, 3, **4**, 4, 6, 8] |
2 |
[1, 3, 4, 4, ...] |
9 |
[1, 3, 4, 4, 6, 8, **9**] |
6 |
[1, 3, 4, 4, 6, 8, 9] |
# For v = 2 print(np.searchsorted(a, 2)) # Output: 1 # For v = 4 print(np.searchsorted(a, 4)) # Output: 2 (the first position where a[i] >= 4) # For v = 9 print(np.searchsorted(a, 9)) # Output: 6 (appended at the end)
side='right'
This finds the insertion point after any existing entries of v.

Rule: The returned index i is the largest index such that a[i] <= v.
Example:
Using the same array a = np.array([1, 3, 4, 4, 6, 8]):
Value v to Insert |
Sorted Array a |
Insertion Point (Index) | Resulting Array |
|---|---|---|---|
2 |
[1, **3**, 4, 4, 6, 8] |
1 |
[1, 2, 3, ...] |
4 |
[1, 3, 4, 4, **6**, 8] |
4 |
[1, 3, 4, 4, 4, 6, ...] |
9 |
[1, 3, 4, 4, 6, 8, **9**] |
6 |
[1, 3, 4, 4, 6, 8, 9] |
# For v = 2 print(np.searchsorted(a, 2, side='right')) # Output: 1 (same as left) # For v = 4 print(np.searchsorted(a, 4, side='right')) # Output: 4 (the position after the last 4) # For v = 9 print(np.searchsorted(a, 9, side='right')) # Output: 6 (same as left)
Key Parameters
a: The 1D array that must be sorted in ascending order. Ifais not sorted, the results will be incorrect. (NumPy does not check for sortedness for performance reasons).v: The value(s) to find the insertion points for. This can be a single value or an array of values.side: A string, either'left'or'right', as described above.sorter: Optional. Ifawas sorted using a custom key (e.g.,argsort), you can provide the array of indices that would sorta. This is useful if you don't want to actually sortabut know its sorted order.
Common Use Cases
Binning (Finding Bin Indices)
This is the most common and powerful use case. searchsorted is extremely fast for this, much faster than a manual loop or list comprehension.
Problem: You have data and you want to assign each data point to a bin. For example, converting ages into age groups.
import numpy as np
# Define the edges of our bins (age groups)
age_bins = [0, 18, 35, 50, 65, 100]
# A list of ages to be binned
ages = np.array([5, 17, 22, 35, 50, 60, 70, 99])
# Find the index of the bin for each age
# We use side='right' because the bins are defined as [0, 18), [18, 35), etc.
bin_indices = np.searchsorted(age_bins, ages, side='right')
print("Age Bins Edges:", age_bins)
print("Ages:", ages)
print("Bin Indices:", bin_indices)
# Output:
# Age Bins Edges: [0, 18, 35, 50, 65, 100]
# Ages: [ 5 17 22 35 50 60 70 99]
# Bin Indices: [1 1 2 2 3 4 5 5]
You can then use these indices to create the final age groups:
# A list of bin labels
bin_labels = ['Child', 'Teen', 'Adult', 'Middle Age', 'Senior', 'Elderly']
# Use the bin_indices to get the corresponding label
age_groups = [bin_labels[i] for i in bin_indices]
print("Age Groups:", age_groups)
# Output:
# Age Groups: ['Child', 'Teen', 'Adult', 'Adult', 'Middle Age', 'Senior', 'Elderly', 'Elderly']
Inserting Multiple Values
searchsorted is vectorized, meaning you can find insertion points for many values at once, which is very efficient.
import numpy as np
a = np.array([10, 20, 30, 40, 50])
values_to_insert = np.array([5, 15, 25, 35, 99])
# Find insertion points for all values at once
insertion_points = np.searchsorted(a, values_to_insert)
print("Original array:", a)
print("Values to insert:", values_to_insert)
print("Insertion points:", insertion_points)
# Output:
# Original array: [10 20 30 40 50]
# Values to insert: [ 5 15 25 35 99]
# Insertion points: [0 1 2 3 5]
Finding Duplicates
You can use side='left' and side='right' together to find the range of indices for a specific value, which is useful for counting duplicates.
import numpy as np
a = np.array([1, 3, 4, 4, 4, 6, 8])
v = 4
left_index = np.searchsorted(a, v, side='left')
right_index = np.searchsorted(a, v, side='right')
print(f"Array: {a}")
print(f"Value to find: {v}")
print(f"Left insertion index: {left_index}")
print(f"Right insertion index: {right_index}")
# The number of occurrences is the difference
num_duplicates = right_index - left_index
print(f"Number of occurrences of {v}: {num_duplicates}")
# Output:
# Array: [1 3 4 4 4 6 8]
# Value to find: 4
# Left insertion index: 2
# Right insertion index: 5
# Number of occurrences of 4: 3
Performance and Alternatives
-
Performance:
searchsorteduses a binary search algorithm, which has a time complexity of O(log N) for a single value and O(M log N) for M values. This is extremely fast and much better than a brute-force O(M*N) approach. -
Alternatives:
bisectmodule: Python's standard library has abisectmodule that does the exact same thing. It's a great option if you don't want to depend on NumPy for this single task.bisect.bisect_left(a, v)is equivalent tonp.searchsorted(a, v, side='left')bisect.bisect_right(a, v)is equivalent tonp.searchsorted(a, v, side='right')
pandas: If you are working with pandas DataFrames or Series, thepandas.Series.searchsorted()method is available and works similarly, often integrating more seamlessly with pandas objects.
Summary
| Feature | Description |
|---|---|
| What it does | Finds insertion points for values into a sorted array. |
| Core Concept | Binary search for fast index lookup. |
side='left' |
Finds the first position where a[i] >= v. |
side='right' |
Finds the last position where a[i] <= v. |
| Primary Use Case | Binning / Histograms - extremely fast for assigning data to categories. |
| Performance | Very fast (O(log N) per value). |
| Precondition | The input array a must be sorted. |
searchsorted is an essential tool in any data scientist's or numerical programmer's toolkit, especially for tasks involving categorization and finding positions in sorted data.
