HashMap & HashSet — Deep Dive
"The HashMap is not a trick — it is the most powerful O(1) tool in your interview toolkit."
The HashMap is the single most frequently used data structure in coding interviews. It powers Two Sum, Group Anagrams, Top K Frequent elements, counting, caching, and dozens of other patterns. You will use a HashMap in more than 60% of all medium-difficulty LeetCode problems. Understanding why it's O(1) — and when it degrades to O(n) — is what separates good candidates from great ones.
📚 Why HashMaps Beat Every Other Structure
Before HashMaps, solving "Does this element exist?" required O(n) linear scan or O(log n) binary search in a sorted structure. HashMaps provide O(1) average-case lookup, insert, and delete — an extraordinary advantage.
| Operation | Array | Sorted Array | BST | HashMap |
|---|---|---|---|---|
| Search | O(n) | O(log n) | O(log n) | O(1) avg |
| Insert | O(1) amortised | O(n) | O(log n) | O(1) avg |
| Delete | O(n) | O(n) | O(log n) | O(1) avg |
| Ordered? | No | Yes | Yes | No (use TreeMap) |
TreeMap (O(log n)). If keys are small integers →
use an int[] frequency array (faster in practice).
HashMaps have high constant factors due to hashing overhead.
⚙ How Hashing Works — The Internals
A HashMap stores entries in an internal array of buckets. To find where a key belongs:
- Compute
hash = key.hashCode() -
Map to bucket index:
index = hash & (capacity - 1) - Store the (key, value) pair in that bucket
Collision Resolution — Separate Chaining vs Open Addressing
When two keys hash to the same bucket, we have a collision. Java's HashMap uses Separate Chaining — each bucket holds a linked list (or a red-black tree when the chain length > 8).
| Strategy | How It Works | Java Uses? | Worst Case |
|---|---|---|---|
| Separate Chaining | Each bucket = list/tree of entries | ✅ Yes (HashMap) | O(n) — all into one bucket |
| Open Addressing | Linear/quadratic probing for next empty slot | No | O(n) with bad hash |
| Robin Hood | Steal from "rich" entries, give to "poor" | No | O(log n) expected |
TREEIFY_THRESHOLD = 8,
the linked list is converted to a red-black tree. This makes
worst-case per-bucket lookup O(log n) instead of O(n). This is the
"HashMap treeification" interview question at Google.
Load Factor & Resizing
Java's HashMap has a default load factor of 0.75.
When size / capacity > 0.75, the map
doubles capacity and rehashes all entries — an O(n)
operation. This is why HashMap insertions are O(1) amortised,
not O(1) worst-case.
Map<String, Integer> map = new HashMap<>(); // Core operations — all O(1) average map.put("apple", 3); // insert / update map.get("apple"); // → 3 map.getOrDefault("banana", 0); // → 0 (safe default) map.containsKey("apple"); // → true map.remove("apple"); // delete key // Frequency counting idiom (interview staple) map.put("apple", map.getOrDefault("apple", 0) + 1); // OR: Java 8+ map.merge("apple", 1, Integer::sum); // Iterating for (Map.Entry<String, Integer> e : map.entrySet()) System.out.println(e.getKey() + " → " + e.getValue()); // HashSet — same internals, no values Set<Integer> seen = new HashSet<>(); seen.add(5); seen.contains(5); // O(1) seen.remove(5);
🎯 The 5 Core HashMap Interview Patterns
Every HashMap problem you will encounter in FAANG interviews is a variation of one of these five patterns. Learn to recognise the pattern first — the code will follow naturally.
| # | Pattern | Trigger phrase | Canonical Problem |
|---|---|---|---|
| 1 | Frequency Map | "count occurrences", "most frequent" | Top K Frequent (LC 347) |
| 2 | Complement Lookup | "find two that add to target" | Two Sum (LC 1) |
| 3 | Prefix Sum + Map | "subarray sum equals K" | Subarray Sum = K (LC 560) |
| 4 | Group-by-Key | "group", "anagram", "same signature" | Group Anagrams (LC 49) |
| 5 | Set Membership | "duplicate", "seen before", "longest sequence" | Longest Consecutive (LC 128) |
Pattern 1: Frequency Map
Count how many times each element appears. The result is a
Map<T, Integer> where value = count.
// Count frequency of each number Map<Integer, Integer> freq = new HashMap<>(); for (int x : nums) freq.merge(x, 1, Integer::sum); // Java 8 idiom // OR: freq.put(x, freq.getOrDefault(x, 0) + 1);
Pattern 2: Complement Lookup
For "find two elements summing to target": as you scan, store what you've seen. For each new element, check if its complement = target - element is already stored.
Map<Integer, Integer> seen = new HashMap<>(); for (int i = 0; i < nums.length; i++) { int complement = target - nums[i]; if (seen.containsKey(complement)) return new int[]{seen.get(complement), i}; seen.put(nums[i], i); // add AFTER checking }
Pattern 3: Prefix Sum + Map
For "subarray sum equals K": maintain a running prefix sum. At each
index, check if prefixSum - K has been seen before — that
gap is a valid subarray.
Map<Integer, Integer> prefixCount = new HashMap<>(); prefixCount.put(0, 1); // empty prefix sum seen once int sum = 0, count = 0; for (int x : nums) { sum += x; count += prefixCount.getOrDefault(sum - k, 0); prefixCount.merge(sum, 1, Integer::sum); }
Pattern 4: Group-by-Key
Transform each element into a canonical key (sorted string, character frequency array, etc.) and group elements sharing the same key.
Map<String, List<String>> groups = new HashMap<>(); for (String s : strs) { char[] c = s.toCharArray(); Arrays.sort(c); String key = new String(c); // sorted = canonical form groups.computeIfAbsent(key, k -> new ArrayList<>()).add(s); }
Pattern 5: Set Membership
Add all elements to a HashSet first. Then traverse, using the set for
O(1) existence checks. Classic trick for "longest consecutive
sequence": only start counting from sequence beginnings (nums where
num - 1 is NOT in the set).
Set<Integer> set = new HashSet<>(); for (int x : nums) set.add(x); int longest = 0; for (int x : set) { if (!set.contains(x - 1)) { // x is a sequence start int len = 1; while (set.contains(x + len)) len++; longest = Math.max(longest, len); } }
💪 In-Lecture Practice Problems
Work through in order. Click a card to expand. Try solving before revealing the solution.
Given an array and a target, return indices of the two numbers that add to target. Exactly one solution exists.
Brute force: Check all pairs → O(n²). Too
slow.
Key insight: For each element x, the complement
we need is target - x. If we store every element
we've seen so far in a HashMap (value → index), we can check if
the complement already exists in O(1).
Critical detail: Put into map
after checking — prevents using the same element twice.
▶Solution with full dry run
int[] twoSum(int[] nums, int target) { Map<Integer, Integer> seen = new HashMap<>(); // value → index for (int i = 0; i < nums.length; i++) { int complement = target - nums[i]; if (seen.containsKey(complement)) return new int[]{seen.get(complement), i}; seen.put(nums[i], i); // add AFTER checking } return new int[]{}; } // Dry run: nums=[2,7,11,15], target=9 // i=0: complement=7, seen={} → miss → seen={2:0} // i=1: complement=2, seen={2:0} → HIT → return [0,1] ✓
Convert a Roman numeral string to an integer. Roman symbols: I=1, V=5, X=10, L=50, C=100, D=500, M=1000. Subtraction rule: if a smaller value precedes a larger one, subtract it (e.g. IV=4, IX=9).
Build a HashMap from Roman characters to integer values. Scan right to left — if current value is less than the next value to the right, subtract it; otherwise add it.
▶Solution with dry run
int romanToInt(String s) { Map<Character, Integer> val = Map.of( 'I', 1, 'V', 5, 'X', 10, 'L', 50, 'C', 100, 'D', 500, 'M', 1000); int result = 0, prev = 0; for (int i = s.length() - 1; i >= 0; i--) { int curr = val.get(s.charAt(i)); result += (curr < prev) ? -curr : curr; prev = curr; } return result; } // Dry run: "MCMXCIV" (right to left) // i=6: V=5, prev=0 → 5>=0 → add 5. result=5, prev=5 // i=5: I=1, prev=5 → 1< 5 → sub 1. result=4, prev=1 // i=4: C=100,prev=1 → 100>=1→ add 100.result=104,prev=100 // i=3: X=10, prev=100→ 10<100→sub 10.result=94, prev=10 // i=2: M=1000...→ result=1994 ✓
Given a list of strings, group the anagrams together. An anagram is a word formed by rearranging another word's letters.
Key insight: All anagrams of a word share the same sorted character sequence. "eat", "tea", "ate" all sort to "aet". Use the sorted string as the HashMap key, and group original words under it.
▶Solution with dry run
List<List<String>> groupAnagrams(String[] strs) { Map<String, List<String>> map = new HashMap<>(); for (String s : strs) { char[] c = s.toCharArray(); Arrays.sort(c); String key = new String(c); // canonical form map.computeIfAbsent(key, k -> new ArrayList<>()).add(s); } return new ArrayList<>(map.values()); } // Dry run: ["eat","tea","tan"] // "eat" → sort → "aet" → map={"aet":["eat"]} // "tea" → sort → "aet" → map={"aet":["eat","tea"]} // "tan" → sort → "ant" → map={"aet":[...], "ant":["tan"]} // Return [[eat,tea],[tan]] ✓
[1,0,0,...,1]. Sorting is O(k
log k); frequency key is O(k). Both work — mention the
tradeoff in interviews.
Given an integer array and k, return the k most frequent elements.
Step 1: Build a frequency map (element → count). Step 2: Find
the top-K most frequent. Options:
a) Sort by frequency descending — O(n log n).
b) Use a min-heap of size K — O(n log k). Better
when k << n.
c) Bucket sort on frequency — O(n). Best theoretical complexity.
▶Solution — Min-Heap approach O(n log k)
int[] topKFrequent(int[] nums, int k) { // Step 1: frequency map Map<Integer, Integer> freq = new HashMap<>(); for (int x : nums) freq.merge(x, 1, Integer::sum); // Step 2: min-heap of size k (keeps top-k frequent) PriorityQueue<Integer> pq = new PriorityQueue<>((a, b) -> freq.get(a) - freq.get(b)); for (int key : freq.keySet()) { pq.offer(key); if (pq.size() > k) pq.poll(); // evict least frequent } // Step 3: extract answers int[] res = new int[k]; for (int i = k - 1; i >= 0; i--) res[i] = pq.poll(); return res; } // nums=[1,1,1,2,2,3], k=2 // freq={1:3, 2:2, 3:1} // heap after all: [2(freq2), 1(freq3)] → [1,2] ✓
Given an integer array and k, return the number of contiguous subarrays that sum to k. Array may contain negative numbers.
If prefix[j] - prefix[i] = k, then subarray
[i+1..j] sums to k. Rearranging:
prefix[i] = prefix[j] - k.
As we scan, for each new prefix sum, we ask:
how many times has (prefixSum - k) appeared before?
Each such occurrence is a valid subarray ending here.
▶Solution with dry run
int subarraySum(int[] nums, int k) { Map<Integer, Integer> prefixCount = new HashMap<>(); prefixCount.put(0, 1); // empty prefix sum seen once int sum = 0, count = 0; for (int x : nums) { sum += x; count += prefixCount.getOrDefault(sum - k, 0); prefixCount.merge(sum, 1, Integer::sum); } return count; } // Dry run: nums=[1,1,1], k=2 // Start: map={0:1}, sum=0, count=0 // x=1: sum=1, need sum-k=1-2=-1 → map has 0 → count=0 → map={0:1,1:1} // x=1: sum=2, need 2-2=0 → map has 1 → count=1 → map={0:1,1:1,2:1} // x=1: sum=3, need 3-2=1 → map has 1 → count=2 → map={...,3:1} // Return 2 ✓
Given an unsorted array, return the length of the longest consecutive elements sequence. Must run in O(n).
Key insight: Add all numbers to a HashSet. Only
start building a sequence from a "sequence start" — a number
where num - 1 is NOT in the set. From there, count
upwards. This ensures each number is visited at most twice →
O(n).
▶Solution with dry run
int longestConsecutive(int[] nums) { Set<Integer> set = new HashSet<>(); for (int x : nums) set.add(x); int longest = 0; for (int x : set) { if (!set.contains(x - 1)) { // x is a sequence start int len = 1; while (set.contains(x + len)) len++; longest = Math.max(longest, len); } } return longest; } // [100,4,200,1,3,2] → set={100,4,200,1,3,2} // x=100: set has 99? No → start. 101? No → len=1 // x=4: set has 3? Yes → NOT a start → skip // x=1: set has 0? No → start. 2?Yes,3?Yes,4?Yes,5?No → len=4 // longest = 4 ✓
Design a data structure that implements an LRU (Least Recently
Used) cache with capacity constraint. Both get and
put must run in O(1).
O(1) get requires a HashMap. O(1) ordering (move to front,
remove last) requires a Doubly Linked List.
Together: HashMap (key → DLL node) + DLL (ordered from
most-recent to least-recent).
Java shortcut: LinkedHashMap with access-order mode
does this automatically — great to mention in interviews, but
implement from scratch if asked.
▶Solution — LinkedHashMap (concise) + DLL (full)
class LRUCache extends LinkedHashMap<Integer, Integer> { private int capacity; LRUCache(int capacity) { // true = access-order (most recently used first) super(capacity, 0.75f, true); this.capacity = capacity; } public int get(int key) { return super.getOrDefault(key, -1); } public void put(int key, int value) { super.put(key, value); } // Called automatically by LinkedHashMap after each put @Override protected boolean removeEldestEntry(Map.Entry<Integer, Integer> e) { return size() > capacity; // evict when over capacity } }
📝 Assignment
Complete the assignment before moving to Matrix Problems. Covers frequency counting, Two-Sum family variants, subarray sum patterns, isomorphic strings, and cache design problems.
📄 Open Assignment →
✅ Lecture Completion Checklist
Check each item before advancing to Lecture 14.
Matrix problems require combining array traversal with directional logic (spiral, BFS over grid, DFS flood fill). The HashMap patterns you just mastered appear inside those solutions too — especially for island counting and multi-source BFS state tracking.