| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 | 25 | 26 | 27 |
| 28 | 29 | 30 | 31 |
- string
- baekjoon
- const
- 문자열
- pass by reference
- function
- assignment operator
- predictive analysis
- Data Science
- 백준
- 반복문
- 배열
- Object Oriented Programming
- 알고리즘
- pointer
- Python
- 파이썬
- Pre-processing
- 함수
- programming
- OOP
- raw data
- 포인터
- Class
- Deep Learning
- 오블완
- array
- 티스토리챌린지
- C++
- vscode
- Today
- Total
Channi Studies
[Data Structure] Hash Table 5 - Common Hash Functions 본문
[Data Structure] Hash Table 5 - Common Hash Functions
Chan Lee 2025. 4. 10. 04:01A hash function is fast if the hash function can minimize collisions.
A perfect hash fucntion maps items to buckets with no collisions.
It can be created if the number of items and all possible item keys are known beforehand.
The runtime for insert, serach and remove is O(1) with it.
A good hash function should uniformly distribute items into buckets.
With chaining, a good hash function results in short bucket lists and thus fast operations.
With lienar probing, a good hash function will avoid hashing multiple items to consecutive buckets and thus minimize the average linear probing length to achieve fast opeartions.
On average, a good hash function will achieve O(1) inserts, seraches, and removes, but in the worst-case may require O(N).
A hash function's performance depends on the hash table size and knowledge of the expected keys.
Ex. The hash function key % 10 will perform poorly if the expected keys are all multiples of 10, becuase they will all collide at bucket 0.
Let's take a look into some of the common hash functions used.
Modulo Hash Function
A modulo hash uses the raminder from division of the key by hash table size N.
HashRemainder(int Key) {
return key % N
}
Mid-square Hash Function
A mid-square hash squares the key, extracts R digits from the result's middle, and returns the remainder of the middle digits divided by hash table size N.
Ex. For a hash table with 100 entries and a key of 453, the decimal (base 10) mid-square hash function computes 453 * 453 = 205209, and returns the middle two digits 52.
For N buckets, R must be greater than equal to [log₁₀ N] to index all buckets.
The process of squaring and extracting middle digits reduces the likelihood of keys mapping to just a few buckets.
⇒ 110 * 110 = 12100 → Taking 3 middle digits = 210 → 210 % 200 = 10
⇒ 112 * 112 = 12544 → Taking 3 middle digits = 254
Mid-square hash function base 2 implementation
However, the mid-square hash function is typically implemented using binary (base 2), and not decimal (base 10).
A decimal implementation requries converting the square of the key to a string, extracting a substring for the middle digits, and converting that substring to an integer. Meanwhile a binary implementation only requires a few shift and bitwise AND oeprations.
A binary mid-square hash function extracts the middle R bits, and returns the remainder of the middle bits divided by hash table size N, where R is greater than equal to [log₂ N].
Multiplicative String hash function
A multiplicative string hash repeatedly multiples the hash value and adds the ASCII (or Unicode) value of each character in the string.
A multiplicative hash function for string strats with a large initial value.
For each caracter, the hash function multiplies the current hash value by a multiplier (often prime) and adds the character's value.
Finally, the function returns the remainder of the sum divided by the hash table size N.
HashMultiplicative(string key) {
stringHash = InitialValue
for (each character strChar in key) {
stringHash = (stringHash * HashMultiplier) + strChar
}
return stringHash % N
}
Daniel J. Bernstein creted a popular version of a multiplicative string hash function that uses an initial value of 5381 and a multiplier of 33. Bernstein's hash function performs well for hashing short English strings.
1) Initial Value = 0, Hash Multiplier = 1. String = BAT, Size = 1000
⇒ A hash multiplier of 1 will simply add the characters' ASCCI values
→ stringHash = initialValue + B + A + T = 0 + 66 + 65 + 84 = 215
2) Initial Value = 0, Hash Multiplier = 1. String = TAB, Size = 1000
→ stringHash = initialValue + T + A + B = 0 + 84 + 65 + 66 = 215
3) Initial Value = 17, Hash Multiplier = 3. String = WE, Size = 1000
→ stringHash = initialValue = 17
→ stringHash = 17 * 3 + W = 51 + 87 = 138
→ stringHash = 138 * 3 +69 = 483
Direct Hash Function
A direct hash function uses the item's key as the bucket index. (Ex. If the key is 937, the index is also 937.)
A hash table with a direct hash function is called a direct access table.
Given a key, a direct access table serach algorithm returns the item at index key if the bucket is not empty, and returns null if empty.

Limitations of Direct Hashing
A direct access table has the advantage of no collisions: Each key is unique (by definition of a key), and each gets a unique bucket, so no collisions can occur. However, a direct access table has two main limitations:
1. All keys must be non-negative integers, but for some applications keys may be negative.
2. The hash table's size equals the largest key value plus 1, which may be very large.
'Data Science > Data Structure & Algorithm' 카테고리의 다른 글
| [Data Structures] Binary Trees - 1 (0) | 2025.04.22 |
|---|---|
| [Data Structure] Hash Table - 6 | Python Implementation (0) | 2025.04.10 |
| [Data Structure] Hash Table - 4 | Hash Table Resizing (0) | 2025.04.10 |
| [Data Structure] Hash Table - 3 | Double Hashing (0) | 2025.04.09 |
| [Data Structure] Hash Table - 2 | Linear Probing & Quadratic Probing (0) | 2025.04.09 |