Files
codetutor/backend/data/questions/design-twitter.yaml

272 lines
14 KiB
YAML

title: Design Twitter
slug: design-twitter
difficulty: medium
leetcode_id: 355
leetcode_url: https://leetcode.com/problems/design-twitter/
categories:
- hash-tables
- heap
patterns:
- slug: heap
is_optimal: true
function_signature: "class Twitter:\n def __init__(self): ...\n def post_tweet(self, user_id: int, tweet_id: int) -> None: ...\n def get_news_feed(self, user_id: int) -> list[int]: ...\n def follow(self, follower_id: int, followee_id: int) -> None: ...\n def unfollow(self, follower_id: int, followee_id: int) -> None: ..."
test_cases:
visible:
- input: { operations: ["Twitter", "postTweet", "getNewsFeed", "follow", "postTweet", "getNewsFeed", "unfollow", "getNewsFeed"], arguments: [[], [1, 5], [1], [1, 2], [2, 6], [1], [1, 2], [1]] }
expected: [null, null, [5], null, null, [6, 5], null, [5]]
- input: { operations: ["Twitter", "postTweet", "postTweet", "getNewsFeed"], arguments: [[], [1, 1], [1, 2], [1]] }
expected: [null, null, null, [2, 1]]
hidden:
- input: { operations: ["Twitter", "getNewsFeed"], arguments: [[], [1]] }
expected: [null, []]
- input: { operations: ["Twitter", "follow", "getNewsFeed"], arguments: [[], [1, 2], [1]] }
expected: [null, null, []]
- input: { operations: ["Twitter", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "getNewsFeed"], arguments: [[], [1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [1, 10], [1, 11], [1]] }
expected: [null, null, null, null, null, null, null, null, null, null, null, null, [11, 10, 9, 8, 7, 6, 5, 4, 3, 2]]
- input: { operations: ["Twitter", "postTweet", "follow", "follow", "unfollow", "getNewsFeed"], arguments: [[], [2, 5], [1, 2], [1, 2], [1, 2], [1]] }
expected: [null, null, null, null, null, []]
- input: { operations: ["Twitter", "postTweet", "postTweet", "follow", "getNewsFeed"], arguments: [[], [1, 1], [2, 2], [1, 2], [1]] }
expected: [null, null, null, null, [2, 1]]
description: |
Design a simplified version of Twitter where users can post tweets, follow/unfollow another user, and is able to see the `10` most recent tweets in the user's news feed.
Implement the `Twitter` class:
- `Twitter()` Initialises your twitter object.
- `void postTweet(int userId, int tweetId)` Composes a new tweet with ID `tweetId` by the user `userId`. Each call to this function will be made with a unique `tweetId`.
- `List<Integer> getNewsFeed(int userId)` Retrieves the `10` most recent tweet IDs in the user's news feed. Each item in the news feed must be posted by users who the user followed or by the user themself. Tweets must be **ordered from most recent to least recent**.
- `void follow(int followerId, int followeeId)` The user with ID `followerId` started following the user with ID `followeeId`.
- `void unfollow(int followerId, int followeeId)` The user with ID `followerId` started unfollowing the user with ID `followeeId`.
constraints: |
- `1 <= userId, followerId, followeeId <= 500`
- `0 <= tweetId <= 10^4`
- All the tweets have **unique** IDs.
- At most `3 * 10^4` calls will be made to `postTweet`, `getNewsFeed`, `follow`, and `unfollow`.
- A user cannot follow himself.
examples:
- input: |
["Twitter", "postTweet", "getNewsFeed", "follow", "postTweet", "getNewsFeed", "unfollow", "getNewsFeed"]
[[], [1, 5], [1], [1, 2], [2, 6], [1], [1, 2], [1]]
output: "[null, null, [5], null, null, [6, 5], null, [5]]"
explanation: |
Twitter twitter = new Twitter();
twitter.postTweet(1, 5); // User 1 posts a new tweet (id = 5).
twitter.getNewsFeed(1); // User 1's news feed should return [5].
twitter.follow(1, 2); // User 1 follows user 2.
twitter.postTweet(2, 6); // User 2 posts a new tweet (id = 6).
twitter.getNewsFeed(1); // Returns [6, 5]. Tweet 6 precedes tweet 5 because it was posted later.
twitter.unfollow(1, 2); // User 1 unfollows user 2.
twitter.getNewsFeed(1); // Returns [5], since user 1 no longer follows user 2.
explanation:
intuition: |
Think of this problem as building two interconnected systems: a **social graph** (who follows whom) and a **timeline aggregator** (combining tweets from multiple sources in chronological order).
The social graph is straightforward — we need to track follow relationships efficiently. A hash map where each user maps to a set of people they follow works perfectly.
The interesting challenge is the news feed. When a user requests their feed, we need to find the 10 most recent tweets from potentially many users. Imagine each user has their own stream of tweets, and we need to **merge these streams** while keeping only the top 10 most recent.
This is exactly what a **min-heap** excels at! We can use a heap to efficiently track the k largest (most recent) elements from multiple sorted streams. Think of it like merging k sorted lists — we maintain a heap of "candidate" tweets, always pulling the most recent one next.
The key insight is that we don't need to look at every tweet ever posted. We only need to consider the most recent tweets from each followed user, and a heap lets us do this efficiently.
approach: |
We use **Hash Maps + Min-Heap** to solve this problem:
**Step 1: Design the data structures**
- `user_tweets`: A hash map where each user ID maps to a list of `(timestamp, tweet_id)` tuples, stored in chronological order (newest at the end)
- `user_follows`: A hash map where each user ID maps to a set of user IDs they follow
- `timestamp`: A global counter that increments with each tweet, used to determine recency
&nbsp;
**Step 2: Implement postTweet**
- Increment the global timestamp
- Append `(timestamp, tweet_id)` to the user's tweet list
- Time: O(1)
&nbsp;
**Step 3: Implement follow/unfollow**
- `follow`: Add followee to the follower's set of followed users
- `unfollow`: Remove followee from the set (if present)
- Time: O(1) for both operations
&nbsp;
**Step 4: Implement getNewsFeed (the core algorithm)**
- Collect all users whose tweets should appear: the user themself + everyone they follow
- For each of these users, if they have tweets, add their most recent tweet to a max-heap
- Store `(timestamp, tweet_id, user_id, index)` in the heap, where index points to the tweet's position in that user's list
- Pop the most recent tweet from the heap, add it to the result
- Push the *next* tweet from that same user (if any) to the heap
- Repeat until we have 10 tweets or the heap is empty
- Time: O(k log n) where k is 10 and n is the number of followed users
&nbsp;
This approach efficiently merges multiple tweet streams without sorting all tweets together.
common_pitfalls:
- title: Sorting All Tweets
description: |
A naive approach collects all tweets from the user and their followees, sorts them by timestamp, and returns the top 10.
While correct, this is inefficient. If a user follows many active posters, you might be sorting thousands of tweets just to return 10. With `3 * 10^4` calls to `getNewsFeed`, this adds up quickly.
The heap-based approach only examines at most `10 * n` tweets (where n is the number of followed users), which is much more efficient when users have many tweets.
wrong_approach: "Collect all tweets, sort, take top 10"
correct_approach: "Use a heap to merge streams, pulling only what's needed"
- title: Forgetting Self-Tweets
description: |
The news feed must include the user's own tweets, not just tweets from people they follow.
Make sure when building the list of "users to check", you include the requesting user themselves, regardless of their follow list.
wrong_approach: "Only check followed users' tweets"
correct_approach: "Include self in the list of users to aggregate"
- title: Missing Edge Cases
description: |
Several edge cases need handling:
- User has no tweets and follows no one: return empty list
- User unfollows someone: their tweets should no longer appear
- User posts more than 10 tweets: only show the 10 most recent
- User follows themself (not allowed per constraints, but good to handle gracefully)
Using sets for the follow relationship and defaultdict for tweets handles most of these automatically.
key_takeaways:
- "**Heap for top-k from multiple streams**: When merging sorted streams and only needing the top k elements, a heap is the ideal data structure"
- "**Separate concerns**: The social graph (follows) and content storage (tweets) are independent — design them separately"
- "**Lazy evaluation**: Don't process all data upfront. The heap approach only examines tweets as needed"
- "**Foundation for real systems**: This pattern (fan-out on read with heap merge) is used in actual social media systems at scale"
time_complexity: "O(1) for `postTweet`, `follow`, and `unfollow`. O(k log n) for `getNewsFeed` where k = 10 and n is the number of followed users."
space_complexity: "O(U + T + F) where U is the number of users, T is the total number of tweets, and F is the total number of follow relationships."
solutions:
- approach_name: Hash Map + Max-Heap
is_optimal: true
code: |
import heapq
from collections import defaultdict
class Twitter:
def __init__(self):
# Global timestamp to track tweet order
self.timestamp = 0
# user_id -> list of (timestamp, tweet_id)
self.user_tweets = defaultdict(list)
# user_id -> set of followed user_ids
self.user_follows = defaultdict(set)
def postTweet(self, userId: int, tweetId: int) -> None:
# Store tweet with current timestamp, then increment
self.user_tweets[userId].append((self.timestamp, tweetId))
self.timestamp += 1
def getNewsFeed(self, userId: int) -> list[int]:
# Users whose tweets we care about: self + followed users
users_to_check = self.user_follows[userId] | {userId}
# Max-heap: store (-timestamp, tweet_id, user_id, index)
# Negative timestamp because heapq is a min-heap
max_heap = []
for uid in users_to_check:
tweets = self.user_tweets[uid]
if tweets:
# Start with most recent tweet (last in list)
idx = len(tweets) - 1
ts, tid = tweets[idx]
# Push negative timestamp for max-heap behavior
heapq.heappush(max_heap, (-ts, tid, uid, idx))
result = []
while max_heap and len(result) < 10:
neg_ts, tid, uid, idx = heapq.heappop(max_heap)
result.append(tid)
# If this user has more tweets, add the next one
if idx > 0:
idx -= 1
ts, tid = self.user_tweets[uid][idx]
heapq.heappush(max_heap, (-ts, tid, uid, idx))
return result
def follow(self, followerId: int, followeeId: int) -> None:
# Prevent self-follow (though constraints say it won't happen)
if followerId != followeeId:
self.user_follows[followerId].add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
# Remove from set if present (discard won't raise error)
self.user_follows[followerId].discard(followeeId)
explanation: |
**Time Complexity:**
- `postTweet`: O(1) — append to list
- `follow`/`unfollow`: O(1) — set operations
- `getNewsFeed`: O(k log n) — where k = 10 tweets and n = number of followed users. We do at most k heap operations, each O(log n).
**Space Complexity:** O(U + T + F) — storing users, tweets, and follow relationships.
The heap elegantly merges multiple sorted tweet streams. By storing the index into each user's tweet list, we can efficiently pull the "next" tweet from any user when needed.
- approach_name: Collect and Sort
is_optimal: false
code: |
from collections import defaultdict
class Twitter:
def __init__(self):
self.timestamp = 0
self.user_tweets = defaultdict(list)
self.user_follows = defaultdict(set)
def postTweet(self, userId: int, tweetId: int) -> None:
self.user_tweets[userId].append((self.timestamp, tweetId))
self.timestamp += 1
def getNewsFeed(self, userId: int) -> list[int]:
# Collect ALL tweets from self and followed users
all_tweets = []
users_to_check = self.user_follows[userId] | {userId}
for uid in users_to_check:
all_tweets.extend(self.user_tweets[uid])
# Sort all tweets by timestamp (descending)
all_tweets.sort(key=lambda x: x[0], reverse=True)
# Return top 10 tweet IDs
return [tid for _, tid in all_tweets[:10]]
def follow(self, followerId: int, followeeId: int) -> None:
if followerId != followeeId:
self.user_follows[followerId].add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
self.user_follows[followerId].discard(followeeId)
explanation: |
**Time Complexity:**
- `postTweet`: O(1)
- `follow`/`unfollow`: O(1)
- `getNewsFeed`: O(T log T) where T is the total number of tweets from followed users
**Space Complexity:** O(T) for collecting all tweets during `getNewsFeed`.
This approach is simpler but less efficient. For users who follow many active posters, collecting and sorting all their tweets becomes expensive. The heap approach is preferred for production systems.