Hash Maps & Sets

1. What is a Hash Map? 2. Time Complexity 3. Hash Maps in Python and JS 4. Hash Sets 5. Common Hash Map Patterns 6. When to Use Hash Maps 7. LeetCode Problems 8. Implementing a Hash Map from Scratch 9. Practice Quiz

1. What is a Hash Map?

A hash map (also called a hash table) is a data structure that stores key-value pairs. It allows you to associate a value with a unique key and retrieve that value later in near-constant time.

Different names, same idea

Python — dict (dictionary)
JavaScript — plain Object or Map
Java — HashMap
C++ — unordered_map
General CS — hash table, associative array

How Hashing Works

Under the hood a hash map is backed by an array. When you insert a key, a hash function converts the key into an integer, and that integer is mapped to an index in the array (usually via modulo). The value is then stored at that index.

index = hash(key) % array_length

Step-by-step example

Suppose our internal array has 8 slots and we want to store "alice" -> 90.

Compute hash: hash("alice") returns, say, 7572839183
Find index: 7572839183 % 8 = 7
Store the pair ("alice", 90) at slot 7

Later, to look up "alice", we repeat the same hash and go directly to slot 7 — O(1).

Visual Representation

/* Hash Table with 8 buckets */

  Key        hash(key) % 8    Bucket
  -------    --------------   ----------------------------
  "alice"    7                [0] ->
  "bob"      3                [1] ->
  "charlie"  1                [2] ->
  "diana"    3  (collision!)  [3] -> ("bob",85) -> ("diana",92)
  "eve"      5                [4] ->
                              [5] -> ("eve",78)
                              [6] ->
                              [7] -> ("alice",90)

  Bucket 1:  ("charlie", 88)
  Bucket 3:  ("bob", 85) -> ("diana", 92)   // chained

Collision Handling

A collision occurs when two different keys hash to the same index. There are two main strategies for resolving collisions:

1. Chaining (Separate Chaining)

Each bucket holds a linked list (or another collection). When a collision happens, the new key-value pair is appended to the list at that bucket.

Pros: Simple to implement. Never "runs out" of space.
Cons: Uses extra memory for pointers. If many keys hash to the same bucket, lookups degrade to O(n).

# Chaining: bucket 3 has two entries
bucket[3] -> [("bob", 85)] -> [("diana", 92)] -> None

2. Open Addressing (Linear Probing)

All entries live directly in the array. When a collision occurs, the algorithm probes forward to the next empty slot.

Pros: Better cache performance (everything in one array). No extra memory for pointers.
Cons: Clustering can degrade performance. The table needs to be resized before it gets too full (load factor).

# Open Addressing: "diana" finds slot 3 taken, probes to slot 4
[0]         [1] charlie  [2]         [3] bob     [4] diana   [5] eve   [6]   [7] alice

Load Factor

The load factor = (number of entries) / (number of buckets). When it exceeds a threshold (often 0.75), the hash map resizes (typically doubles) and rehashes all entries. This keeps operations near O(1) on average.

2. Time Complexity

Operation	Average	Worst Case
Insert	O(1)	O(n)
Lookup	O(1)	O(n)
Delete	O(1)	O(n)
Space	O(n)	O(n)

Why Worst Case is O(n)

If every single key hashes to the same bucket, the hash map degrades into a linked list. Every lookup would need to traverse all n elements — hence O(n).

Worst-case scenario

# All keys hash to bucket 0:
bucket[0] -> ("a", 1) -> ("b", 2) -> ("c", 3) -> ... -> ("z", 26)

# Looking up "z" requires traversing all 26 entries

Why We Usually Say O(1)

In practice, a good hash function distributes keys uniformly across buckets. With a reasonable load factor, the expected number of items per bucket is a small constant. This is called amortized O(1) — on average, each operation takes constant time.

Average time per operation = O(1 + load_factor) = O(1) when load_factor is bounded

Interview tip

For coding interviews, always state hash map operations as O(1) average, O(n) worst case. Most interviewers accept O(1) unless they specifically ask about worst case.

3. Hash Maps in Python and JS

Python: `dict`

Python
# --- Creating ---
d = {}                            # empty dict
d = {"name": "Alice", "age": 25}  # with initial values
d = dict(name="Alice", age=25)     # using dict() constructor

# --- Accessing ---
d["name"]                         # "Alice"  (KeyError if missing)
d.get("name")                      # "Alice"  (None if missing)
d.get("gpa", 0.0)                  # 0.0      (default if missing)

# --- Inserting / Updating ---
d["grade"] = "A"                   # add new key
d["age"] = 26                      # update existing key

# --- Deleting ---
del d["grade"]                     # remove key (KeyError if missing)
d.pop("grade", None)               # remove key (no error if missing)

# --- Checking membership ---
"name" in d                        # True
"gpa" not in d                    # True

# --- Iterating ---
for key in d:
    print(key)                    # "name", "age"

for key, value in d.items():
    print(key, value)             # "name" "Alice", "age" 26

for val in d.values():
    print(val)                    # "Alice", 26

for key in d.keys():
    print(key)                    # "name", "age"

# --- Useful methods ---
len(d)                             # 2
d.update({"gpa": 3.9})             # merge another dict

Python: `defaultdict` and `Counter`

Python
from collections import defaultdict, Counter

# --- defaultdict: auto-creates default values for missing keys ---
freq = defaultdict(int)           # default value is 0
for char in "hello":
    freq[char] += 1
# freq = {'h': 1, 'e': 1, 'l': 2, 'o': 1}

groups = defaultdict(list)         # default value is []
groups["fruits"].append("apple")   # no need to check if key exists
groups["fruits"].append("banana")

# --- Counter: count elements in one line ---
count = Counter("abracadabra")
# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

count.most_common(2)              # [('a', 5), ('b', 2)]

# Counter from a list
nums = [1, 2, 2, 3, 3, 3]
count = Counter(nums)
# Counter({3: 3, 2: 2, 1: 1})

JavaScript: Object vs Map

When to use Map vs Object

Use Map when keys are not strings (numbers, objects, etc.), when you need to preserve insertion order, or when you frequently add/remove keys.
Use plain Object for simple string-keyed config, JSON-compatible data, or when you need object destructuring.

JavaScript: Object

JavaScript
// --- Creating ---
const obj = {};
const obj2 = { "name": "Alice", "age": 25 };

// --- Accessing ---
obj2["name"];                      // "Alice"
obj2.name;                          // "Alice"  (dot notation)

// --- Inserting / Updating ---
obj2["grade"] = "A";
obj2.grade = "A";

// --- Deleting ---
delete obj2.grade;

// --- Checking ---
"name" in obj2;                     // true
obj2.hasOwnProperty("name");        // true

// --- Iterating ---
for (const key in obj2) {
    console.log(key, obj2[key]);
}
Object.keys(obj2);                  // ["name", "age"]
Object.values(obj2);                // ["Alice", 25]
Object.entries(obj2);               // [["name","Alice"], ["age",25]]

JavaScript: Map

JavaScript
// --- Creating ---
const map = new Map();
const map2 = new Map([
    ["name", "Alice"],
    ["age", 25]
]);

// --- Operations ---
map2.set("grade", "A");             // insert / update
map2.get("name");                    // "Alice"
map2.has("name");                    // true
map2.delete("grade");               // remove
map2.size;                           // 2

// --- Iterating ---
map2.forEach((value, key) => {
    console.log(key, value);
});

for (const [key, value] of map2) {
    console.log(key, value);
}

// --- Keys can be ANY type ---
const objKey = { id: 1 };
map.set(objKey, "object as key");    // works!
map.set(42, "number as key");        // works!

// --- Convert to/from array ---
const arr = [...map2];               // [["name","Alice"], ["age",25]]
const fromArr = new Map(arr);       // back to Map

4. Hash Sets

A hash set is like a hash map but it stores only keys — no associated values. It answers one question fast: "Is this element in the collection?"

Python: `set`

Python
# --- Creating ---
s = set()                          # empty set (NOT {} which is a dict)
s = {1, 2, 3}                        # set with values
s = set([1, 2, 2, 3])                # from list (duplicates removed)

# --- Operations ---
s.add(4)                             # add element
s.remove(4)                          # remove (KeyError if missing)
s.discard(4)                         # remove (no error if missing)
3 in s                              # True  (O(1) lookup)
len(s)                              # 3

# --- Set Operations ---
a = {1, 2, 3}
b = {2, 3, 4}

a | b                               # Union:        {1, 2, 3, 4}
a.union(b)                          # Union:        {1, 2, 3, 4}

a & b                               # Intersection: {2, 3}
a.intersection(b)                   # Intersection: {2, 3}

a - b                               # Difference:   {1}
a.difference(b)                     # Difference:   {1}

a ^ b                               # Symmetric difference: {1, 4}
a.symmetric_difference(b)           # Symmetric difference: {1, 4}

a.issubset(b)                       # False
a.issuperset(b)                     # False

JavaScript: `Set`

JavaScript
// --- Creating ---
const s = new Set();
const s2 = new Set([1, 2, 2, 3]);   // {1, 2, 3}  (duplicates removed)

// --- Operations ---
s2.add(4);                           // add element
s2.delete(4);                        // remove element (returns boolean)
s2.has(3);                            // true  (O(1) lookup)
s2.size;                             // 3

// --- Iterating ---
s2.forEach(val => console.log(val));
for (const val of s2) { console.log(val); }

// --- Set Operations (manual in JS) ---
const a = new Set([1, 2, 3]);
const b = new Set([2, 3, 4]);

// Union
const union = new Set([...a, ...b]);              // {1, 2, 3, 4}

// Intersection
const inter = new Set([...a].filter(x => b.has(x))); // {2, 3}

// Difference (a - b)
const diff = new Set([...a].filter(x => !b.has(x))); // {1}

// Symmetric Difference
const symDiff = new Set(
    [...a].filter(x => !b.has(x)).concat([...b].filter(x => !a.has(x)))
); // {1, 4}

// Convert to array
const arr = [...s2];                  // [1, 2, 3]
const arr2 = Array.from(s2);          // [1, 2, 3]

5. Common Hash Map Patterns

Pattern 1: Frequency Counter

Count how often each element appears. This is the single most common hash map pattern in interviews.

Python
def frequency_count(nums):
    freq = {}
    for num in nums:
        freq[num] = freq.get(num, 0) + 1
    return freq

# Or with Counter:
from collections import Counter
freq = Counter(nums)

# Example
print(frequency_count([1, 2, 2, 3, 3, 3]))
# {1: 1, 2: 2, 3: 3}

JavaScript
function frequencyCount(nums) {
    const freq = new Map();
    for (const num of nums) {
        freq.set(num, (freq.get(num) || 0) + 1);
    }
    return freq;
}

// Example
console.log(frequencyCount([1, 2, 2, 3, 3, 3]));
// Map { 1 => 1, 2 => 2, 3 => 3 }

Pattern 2: Two Sum (The Classic)

Given an array of integers and a target, return the indices of the two numbers that add up to the target.

Key insight: For each number num, check if target - num already exists in the hash map.

Why hash map beats brute force

Brute force: Check every pair — O(n^2).
Hash map: One pass, store complements — O(n).

Python
def two_sum(nums, target):
    # Map: number -> its index
    seen = {}

    for i, num in enumerate(nums):
        complement = target - num       # what we need to find

        if complement in seen:           # have we seen it before?
            return [seen[complement], i]  # return both indices

        seen[num] = i                   # store current number and index

    return []                           # no solution found

# --- Walk-through ---
# nums = [2, 7, 11, 15], target = 9
#
# i=0: num=2,  comp=7,  seen={}        -> not found, store {2: 0}
# i=1: num=7,  comp=2,  seen={2: 0}    -> FOUND! return [0, 1]

print(two_sum([2, 7, 11, 15], 9))    # [0, 1]

JavaScript
function twoSum(nums, target) {
    const seen = new Map();            // number -> index

    for (let i = 0; i < nums.length; i++) {
        const complement = target - nums[i]; // what we need

        if (seen.has(complement)) {      // have we seen it?
            return [seen.get(complement), i]; // return both indices
        }

        seen.set(nums[i], i);           // store current number
    }

    return [];                          // no solution
}

// Walk-through:
// nums = [2, 7, 11, 15], target = 9
//
// i=0: num=2,  comp=7,  seen=Map{}         -> not found, store {2->0}
// i=1: num=7,  comp=2,  seen=Map{2->0}     -> FOUND! return [0, 1]

console.log(twoSum([2, 7, 11, 15], 9)); // [0, 1]

Pattern 3: Group Anagrams

Given an array of strings, group anagrams together. Two strings are anagrams if they contain the same characters in any order.

Key insight: Sort each string to get a canonical key. Anagrams share the same sorted form.
"eat" -> "aet", "tea" -> "aet", "ate" -> "aet" -- all map to "aet"

Python
from collections import defaultdict

def group_anagrams(strs):
    groups = defaultdict(list)

    for s in strs:
        # Sort the string to create a canonical key
        key = tuple(sorted(s))        # tuple because lists can't be dict keys
        groups[key].append(s)

    return list(groups.values())

# --- Walk-through ---
# strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
#
# "eat" -> sorted = ('a','e','t') -> groups[('a','e','t')] = ["eat"]
# "tea" -> sorted = ('a','e','t') -> groups[('a','e','t')] = ["eat","tea"]
# "tan" -> sorted = ('a','n','t') -> groups[('a','n','t')] = ["tan"]
# "ate" -> sorted = ('a','e','t') -> groups[('a','e','t')] = ["eat","tea","ate"]
# "nat" -> sorted = ('a','n','t') -> groups[('a','n','t')] = ["tan","nat"]
# "bat" -> sorted = ('a','b','t') -> groups[('a','b','t')] = ["bat"]
#
# Result: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

print(group_anagrams(["eat","tea","tan","ate","nat","bat"]))

JavaScript
function groupAnagrams(strs) {
    const groups = new Map();

    for (const s of strs) {
        // Sort the string to create a canonical key
        const key = s.split("").sort().join("");

        if (!groups.has(key)) {
            groups.set(key, []);
        }
        groups.get(key).push(s);
    }

    return [...groups.values()];
}

// Walk-through:
// "eat" -> sorted = "aet" -> groups{"aet": ["eat"]}
// "tea" -> sorted = "aet" -> groups{"aet": ["eat","tea"]}
// "tan" -> sorted = "ant" -> groups{"ant": ["tan"]}
// "ate" -> sorted = "aet" -> groups{"aet": ["eat","tea","ate"]}
// "nat" -> sorted = "ant" -> groups{"ant": ["tan","nat"]}
// "bat" -> sorted = "abt" -> groups{"abt": ["bat"]}
//
// Result: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

console.log(groupAnagrams(["eat","tea","tan","ate","nat","bat"]));

Pattern 4: Checking for Duplicates

Python
def contains_duplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

# One-liner alternative:
def contains_duplicate(nums):
    return len(nums) != len(set(nums))

JavaScript
function containsDuplicate(nums) {
    const seen = new Set();
    for (const num of nums) {
        if (seen.has(num)) return true;
        seen.add(num);
    }
    return false;
}

// One-liner alternative:
const containsDuplicate = nums => new Set(nums).size !== nums.length;

6. When to Use Hash Maps

Reach for a hash map (or hash set) when you see these patterns in a problem:

Pattern	Use	Example
O(1) lookups	Hash map / set	"Have we seen this element before?"
Counting frequencies	Hash map	"How many times does each char appear?"
Caching / memoization	Hash map	"Store computed results to avoid recalculation"
Mapping relationships	Hash map	"Map each node to its parent"
Removing duplicates	Hash set	"Return unique elements only"
Complement problems	Hash map	"Find two numbers that sum to target"
Grouping	Hash map of lists	"Group anagrams", "group by category"
Prefix sums	Hash map	"Subarray sum equals K"

Interview heuristic

If the brute force is O(n^2) because of nested "find" operations, a hash map can almost always bring it down to O(n).

Hash Map vs Sorted Map -- Decision Rule:

• Need O(1) average lookup, no ordering → Hash Map (dict, HashMap, unordered_map)
• Need sorted iteration or range queries → Sorted Map (TreeMap, std::map) -- O(log n)
• Only need membership testing (no values) → Hash Set (set, HashSet)

Load factor rule: Expected chain length = n/m (items/buckets). O(1) assumes load factor is bounded (Python: 2/3, Java: 0.75).

7. LeetCode Problems

Essential hash map problems ordered from easy to medium. Practice these to build pattern recognition.

Problem	Difficulty	Pattern	Key Idea
Two Sum	Easy	Complement lookup	Store num->index, check if target-num exists
Valid Anagram	Easy	Frequency counter	Compare character frequencies of both strings
Contains Duplicate	Easy	Set membership	Add to set; if already present, duplicate found
Group Anagrams	Medium	Sorted key grouping	Sort each word as key, group values in a list
Top K Frequent Elements	Medium	Freq map + bucket sort	Count frequencies, then use bucket sort or heap for top K
Longest Consecutive Sequence	Medium	Set + sequence start	Put all in set; only start counting from sequence beginnings
Subarray Sum Equals K	Medium	Prefix sum + hash map	Store prefix_sum->count; check if prefix_sum - k exists
LRU Cache	Medium	Hash map + doubly linked list	O(1) access via map, O(1) eviction via linked list

Full Solution: Two Sum

Problem statement

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume each input has exactly one solution.

Python
def twoSum(nums, target):
    seen = {}                           # value -> index

    for i, num in enumerate(nums):
        complement = target - num       # the number we need

        if complement in seen:           # O(1) lookup
            return [seen[complement], i]

        seen[num] = i                   # store for future lookups

    return []

# Time:  O(n) - single pass through array
# Space: O(n) - hash map stores up to n entries

JavaScript
var twoSum = function(nums, target) {
    const seen = new Map();           // value -> index

    for (let i = 0; i < nums.length; i++) {
        const complement = target - nums[i]; // the number we need

        if (seen.has(complement)) {      // O(1) lookup
            return [seen.get(complement), i];
        }

        seen.set(nums[i], i);           // store for future lookups
    }

    return [];
};

// Time:  O(n) - single pass through array
// Space: O(n) - Map stores up to n entries

Full Solution: Group Anagrams

Problem statement

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example: ["eat","tea","tan","ate","nat","bat"] returns [["eat","tea","ate"],["tan","nat"],["bat"]]

Python
from collections import defaultdict

def groupAnagrams(strs):
    groups = defaultdict(list)       # sorted_key -> [original strings]

    for s in strs:
        key = tuple(sorted(s))        # O(k log k) where k = len(s)
        groups[key].append(s)

    return list(groups.values())

# Time:  O(n * k log k) where n = number of strings, k = max string length
# Space: O(n * k) for storing all strings in groups

# --- Alternative: character count key (avoids sorting) ---
def groupAnagrams(strs):
    groups = defaultdict(list)

    for s in strs:
        # Count frequency of each letter as key
        count = [0] * 26
        for c in s:
            count[ord(c) - ord('a')] += 1
        groups[tuple(count)].append(s)

    return list(groups.values())

# Time:  O(n * k) - no sorting needed
# Space: O(n * k)

JavaScript
var groupAnagrams = function(strs) {
    const groups = new Map();        // sorted_key -> [original strings]

    for (const s of strs) {
        const key = s.split("").sort().join(""); // O(k log k)

        if (!groups.has(key)) {
            groups.set(key, []);
        }
        groups.get(key).push(s);
    }

    return [...groups.values()];
};

// Time:  O(n * k log k)
// Space: O(n * k)

// --- Alternative: character count key ---
var groupAnagrams = function(strs) {
    const groups = new Map();

    for (const s of strs) {
        const count = new Array(26).fill(0);
        for (const c of s) {
            count[c.charCodeAt(0) - 97]++;
        }
        const key = count.join(",");    // "1,0,0,...,1,0" as string key

        if (!groups.has(key)) {
            groups.set(key, []);
        }
        groups.get(key).push(s);
    }

    return [...groups.values()];
};

// Time:  O(n * k)
// Space: O(n * k)

8. Implementing a Hash Map from Scratch

Building your own hash map helps you understand the internals. Here is a simplified but educational implementation using chaining (array of linked lists).

Python
class HashMap:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.size = 0
        self.buckets = [[] for _ in range(capacity)]

    def _hash(self, key):
        """Convert key to bucket index."""
        return hash(key) % self.capacity

    def put(self, key, value):
        """Insert or update a key-value pair."""
        index = self._hash(key)
        bucket = self.buckets[index]

        # Check if key already exists in this bucket
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)   # update existing
                return

        # Key not found, add new entry
        bucket.append((key, value))
        self.size += 1

        # Resize if load factor exceeds 0.75
        if self.size / self.capacity > 0.75:
            self._resize()

    def get(self, key, default=None):
        """Retrieve value by key. Returns default if not found."""
        index = self._hash(key)
        bucket = self.buckets[index]

        for k, v in bucket:
            if k == key:
                return v

        return default

    def remove(self, key):
        """Remove a key-value pair. Returns True if removed."""
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self.size -= 1
                return True

        return False

    def contains(self, key):
        """Check if key exists."""
        return self.get(key) is not None

    def _resize(self):
        """Double capacity and rehash all entries."""
        old_buckets = self.buckets
        self.capacity *= 2
        self.size = 0
        self.buckets = [[] for _ in range(self.capacity)]

        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value)

    def __repr__(self):
        items = []
        for bucket in self.buckets:
            for k, v in bucket:
                items.append(f"{k}: {v}")
        return "HashMap({" + ", ".join(items) + "})"


# --- Test it out ---
hm = HashMap()
hm.put("alice", 90)
hm.put("bob", 85)
hm.put("charlie", 88)

print(hm.get("alice"))              # 90
print(hm.get("bob"))                # 85
print(hm.get("unknown", -1))       # -1
print(hm.contains("charlie"))       # True

hm.put("alice", 95)                  # update Alice's score
print(hm.get("alice"))              # 95

hm.remove("bob")
print(hm.contains("bob"))          # False
print(hm)                           # HashMap({alice: 95, charlie: 88})

What this implementation shows

Hashing: hash(key) % capacity maps any key to a bucket index
Chaining: Each bucket is a list of (key, value) pairs
Collision handling: Multiple entries can share a bucket
Resizing: When load factor > 0.75, we double capacity and rehash everything
O(1) average: With good distribution, each bucket has ~1 entry

Production vs interview

This implementation is for learning. Real hash maps (like Python's dict) use open addressing, better hash functions, and many optimizations. But the core ideas are the same, and this is the level of detail interviewers expect.

9. Practice Quiz

Q1: What is the average time complexity of looking up a key in a hash map?

Hash maps provide O(1) average-case lookup by computing a hash of the key and going directly to the corresponding bucket. Worst case is O(n) if all keys collide, but with a good hash function this is extremely rare.

Q2: You need to find two numbers in an array that add up to a target. What data structure reduces this from O(n^2) to O(n)?

A hash map stores each number as you iterate. For each new number, you check if its complement (target - num) already exists in the map in O(1), giving an overall O(n) solution.

Q3: What happens when two different keys hash to the same index?

When two keys hash to the same index, it is called a collision. The hash map resolves it using a strategy like chaining (storing both entries in a linked list at that bucket) or open addressing (probing for the next empty slot).

Q4: In Python, what is the correct way to safely get a value from a dict with a default?

d.get("key", default) returns the value if the key exists, or default if it does not. Option A would raise a KeyError if the key is missing (the or only helps if the value is falsy, not if the key is absent). Options C and D are not valid dict methods.

Q5: When grouping anagrams, what is the key insight for the hash map approach?

Sorting each string produces a canonical representation: "eat", "tea", and "ate" all become "aet". Using this sorted string as a hash map key groups all anagrams together. Using string length (C) would group non-anagrams, and ASCII sums (D) would cause false collisions (e.g., "ad" and "bc" have the same sum).

Table of Contents