codetutor/backend/data/questions/design-twitter.yaml

title: Design Twitter
slug: design-twitter
difficulty: medium
leetcode_id: 355
leetcode_url: https://leetcode.com/problems/design-twitter/
categories:
  - hash-tables
  - heap
patterns:
  - slug: heap
    is_optimal: true

function_signature: "class Twitter:\n    def __init__(self): ...\n    def post_tweet(self, user_id: int, tweet_id: int) -> None: ...\n    def get_news_feed(self, user_id: int) -> list[int]: ...\n    def follow(self, follower_id: int, followee_id: int) -> None: ...\n    def unfollow(self, follower_id: int, followee_id: int) -> None: ..."

test_cases:
  visible:
    - input: { operations: ["Twitter", "postTweet", "getNewsFeed", "follow", "postTweet", "getNewsFeed", "unfollow", "getNewsFeed"], arguments: [[], [1, 5], [1], [1, 2], [2, 6], [1], [1, 2], [1]] }
      expected: [null, null, [5], null, null, [6, 5], null, [5]]
    - input: { operations: ["Twitter", "postTweet", "postTweet", "getNewsFeed"], arguments: [[], [1, 1], [1, 2], [1]] }
      expected: [null, null, null, [2, 1]]
  hidden:
    - input: { operations: ["Twitter", "getNewsFeed"], arguments: [[], [1]] }
      expected: [null, []]
    - input: { operations: ["Twitter", "follow", "getNewsFeed"], arguments: [[], [1, 2], [1]] }
      expected: [null, null, []]
    - input: { operations: ["Twitter", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "postTweet", "getNewsFeed"], arguments: [[], [1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [1, 10], [1, 11], [1]] }
      expected: [null, null, null, null, null, null, null, null, null, null, null, null, [11, 10, 9, 8, 7, 6, 5, 4, 3, 2]]
    - input: { operations: ["Twitter", "postTweet", "follow", "follow", "unfollow", "getNewsFeed"], arguments: [[], [2, 5], [1, 2], [1, 2], [1, 2], [1]] }
      expected: [null, null, null, null, null, []]
    - input: { operations: ["Twitter", "postTweet", "postTweet", "follow", "getNewsFeed"], arguments: [[], [1, 1], [2, 2], [1, 2], [1]] }
      expected: [null, null, null, null, [2, 1]]

description: |
  Design a simplified version of Twitter where users can post tweets, follow/unfollow another user, and is able to see the `10` most recent tweets in the user's news feed.

  Implement the `Twitter` class:

  - `Twitter()` Initialises your twitter object.
  - `void postTweet(int userId, int tweetId)` Composes a new tweet with ID `tweetId` by the user `userId`. Each call to this function will be made with a unique `tweetId`.
  - `List<Integer> getNewsFeed(int userId)` Retrieves the `10` most recent tweet IDs in the user's news feed. Each item in the news feed must be posted by users who the user followed or by the user themself. Tweets must be **ordered from most recent to least recent**.
  - `void follow(int followerId, int followeeId)` The user with ID `followerId` started following the user with ID `followeeId`.
  - `void unfollow(int followerId, int followeeId)` The user with ID `followerId` started unfollowing the user with ID `followeeId`.

constraints: |
  - `1 <= userId, followerId, followeeId <= 500`
  - `0 <= tweetId <= 10^4`
  - All the tweets have **unique** IDs.
  - At most `3 * 10^4` calls will be made to `postTweet`, `getNewsFeed`, `follow`, and `unfollow`.
  - A user cannot follow himself.

examples:
  - input: |
      ["Twitter", "postTweet", "getNewsFeed", "follow", "postTweet", "getNewsFeed", "unfollow", "getNewsFeed"]
      [[], [1, 5], [1], [1, 2], [2, 6], [1], [1, 2], [1]]
    output: "[null, null, [5], null, null, [6, 5], null, [5]]"
    explanation: |
      Twitter twitter = new Twitter();
      twitter.postTweet(1, 5); // User 1 posts a new tweet (id = 5).
      twitter.getNewsFeed(1);  // User 1's news feed should return [5].
      twitter.follow(1, 2);    // User 1 follows user 2.
      twitter.postTweet(2, 6); // User 2 posts a new tweet (id = 6).
      twitter.getNewsFeed(1);  // Returns [6, 5]. Tweet 6 precedes tweet 5 because it was posted later.
      twitter.unfollow(1, 2);  // User 1 unfollows user 2.
      twitter.getNewsFeed(1);  // Returns [5], since user 1 no longer follows user 2.

explanation:
  intuition: |
    Think of this problem as building two interconnected systems: a **social graph** (who follows whom) and a **timeline aggregator** (combining tweets from multiple sources in chronological order).

    The social graph is straightforward — we need to track follow relationships efficiently. A hash map where each user maps to a set of people they follow works perfectly.

    The interesting challenge is the news feed. When a user requests their feed, we need to find the 10 most recent tweets from potentially many users. Imagine each user has their own stream of tweets, and we need to **merge these streams** while keeping only the top 10 most recent.

    This is exactly what a **min-heap** excels at! We can use a heap to efficiently track the k largest (most recent) elements from multiple sorted streams. Think of it like merging k sorted lists — we maintain a heap of "candidate" tweets, always pulling the most recent one next.

    The key insight is that we don't need to look at every tweet ever posted. We only need to consider the most recent tweets from each followed user, and a heap lets us do this efficiently.

  approach: |
    We use **Hash Maps + Min-Heap** to solve this problem:

    **Step 1: Design the data structures**

    - `user_tweets`: A hash map where each user ID maps to a list of `(timestamp, tweet_id)` tuples, stored in chronological order (newest at the end)
    - `user_follows`: A hash map where each user ID maps to a set of user IDs they follow
    - `timestamp`: A global counter that increments with each tweet, used to determine recency

    &nbsp;

    **Step 2: Implement postTweet**

    - Increment the global timestamp
    - Append `(timestamp, tweet_id)` to the user's tweet list
    - Time: O(1)

    &nbsp;

    **Step 3: Implement follow/unfollow**

    - `follow`: Add followee to the follower's set of followed users
    - `unfollow`: Remove followee from the set (if present)
    - Time: O(1) for both operations

    &nbsp;

    **Step 4: Implement getNewsFeed (the core algorithm)**

    - Collect all users whose tweets should appear: the user themself + everyone they follow
    - For each of these users, if they have tweets, add their most recent tweet to a max-heap
    - Store `(timestamp, tweet_id, user_id, index)` in the heap, where index points to the tweet's position in that user's list
    - Pop the most recent tweet from the heap, add it to the result
    - Push the *next* tweet from that same user (if any) to the heap
    - Repeat until we have 10 tweets or the heap is empty
    - Time: O(k log n) where k is 10 and n is the number of followed users

    &nbsp;

    This approach efficiently merges multiple tweet streams without sorting all tweets together.

  common_pitfalls:
    - title: Sorting All Tweets
      description: |
        A naive approach collects all tweets from the user and their followees, sorts them by timestamp, and returns the top 10.

        While correct, this is inefficient. If a user follows many active posters, you might be sorting thousands of tweets just to return 10. With `3 * 10^4` calls to `getNewsFeed`, this adds up quickly.

        The heap-based approach only examines at most `10 * n` tweets (where n is the number of followed users), which is much more efficient when users have many tweets.
      wrong_approach: "Collect all tweets, sort, take top 10"
      correct_approach: "Use a heap to merge streams, pulling only what's needed"

    - title: Forgetting Self-Tweets
      description: |
        The news feed must include the user's own tweets, not just tweets from people they follow.

        Make sure when building the list of "users to check", you include the requesting user themselves, regardless of their follow list.
      wrong_approach: "Only check followed users' tweets"
      correct_approach: "Include self in the list of users to aggregate"

    - title: Missing Edge Cases
      description: |
        Several edge cases need handling:
        - User has no tweets and follows no one: return empty list
        - User unfollows someone: their tweets should no longer appear
        - User posts more than 10 tweets: only show the 10 most recent
        - User follows themself (not allowed per constraints, but good to handle gracefully)

        Using sets for the follow relationship and defaultdict for tweets handles most of these automatically.

  key_takeaways:
    - "**Heap for top-k from multiple streams**: When merging sorted streams and only needing the top k elements, a heap is the ideal data structure"
    - "**Separate concerns**: The social graph (follows) and content storage (tweets) are independent — design them separately"
    - "**Lazy evaluation**: Don't process all data upfront. The heap approach only examines tweets as needed"
    - "**Foundation for real systems**: This pattern (fan-out on read with heap merge) is used in actual social media systems at scale"

  time_complexity: "O(1) for `postTweet`, `follow`, and `unfollow`. O(k log n) for `getNewsFeed` where k = 10 and n is the number of followed users."
  space_complexity: "O(U + T + F) where U is the number of users, T is the total number of tweets, and F is the total number of follow relationships."

solutions:
  - approach_name: Hash Map + Max-Heap
    is_optimal: true
    code: |
      import heapq
      from collections import defaultdict


      class Twitter:
          def __init__(self):
              # Global timestamp to track tweet order
              self.timestamp = 0
              # user_id -> list of (timestamp, tweet_id)
              self.user_tweets = defaultdict(list)
              # user_id -> set of followed user_ids
              self.user_follows = defaultdict(set)

          def postTweet(self, userId: int, tweetId: int) -> None:
              # Store tweet with current timestamp, then increment
              self.user_tweets[userId].append((self.timestamp, tweetId))
              self.timestamp += 1

          def getNewsFeed(self, userId: int) -> list[int]:
              # Users whose tweets we care about: self + followed users
              users_to_check = self.user_follows[userId] | {userId}

              # Max-heap: store (-timestamp, tweet_id, user_id, index)
              # Negative timestamp because heapq is a min-heap
              max_heap = []

              for uid in users_to_check:
                  tweets = self.user_tweets[uid]
                  if tweets:
                      # Start with most recent tweet (last in list)
                      idx = len(tweets) - 1
                      ts, tid = tweets[idx]
                      # Push negative timestamp for max-heap behavior
                      heapq.heappush(max_heap, (-ts, tid, uid, idx))

              result = []
              while max_heap and len(result) < 10:
                  neg_ts, tid, uid, idx = heapq.heappop(max_heap)
                  result.append(tid)

                  # If this user has more tweets, add the next one
                  if idx > 0:
                      idx -= 1
                      ts, tid = self.user_tweets[uid][idx]
                      heapq.heappush(max_heap, (-ts, tid, uid, idx))

              return result

          def follow(self, followerId: int, followeeId: int) -> None:
              # Prevent self-follow (though constraints say it won't happen)
              if followerId != followeeId:
                  self.user_follows[followerId].add(followeeId)

          def unfollow(self, followerId: int, followeeId: int) -> None:
              # Remove from set if present (discard won't raise error)
              self.user_follows[followerId].discard(followeeId)
    explanation: |
      **Time Complexity:**
      - `postTweet`: O(1) — append to list
      - `follow`/`unfollow`: O(1) — set operations
      - `getNewsFeed`: O(k log n) — where k = 10 tweets and n = number of followed users. We do at most k heap operations, each O(log n).

      **Space Complexity:** O(U + T + F) — storing users, tweets, and follow relationships.

      The heap elegantly merges multiple sorted tweet streams. By storing the index into each user's tweet list, we can efficiently pull the "next" tweet from any user when needed.

  - approach_name: Collect and Sort
    is_optimal: false
    code: |
      from collections import defaultdict


      class Twitter:
          def __init__(self):
              self.timestamp = 0
              self.user_tweets = defaultdict(list)
              self.user_follows = defaultdict(set)

          def postTweet(self, userId: int, tweetId: int) -> None:
              self.user_tweets[userId].append((self.timestamp, tweetId))
              self.timestamp += 1

          def getNewsFeed(self, userId: int) -> list[int]:
              # Collect ALL tweets from self and followed users
              all_tweets = []
              users_to_check = self.user_follows[userId] | {userId}

              for uid in users_to_check:
                  all_tweets.extend(self.user_tweets[uid])

              # Sort all tweets by timestamp (descending)
              all_tweets.sort(key=lambda x: x[0], reverse=True)

              # Return top 10 tweet IDs
              return [tid for _, tid in all_tweets[:10]]

          def follow(self, followerId: int, followeeId: int) -> None:
              if followerId != followeeId:
                  self.user_follows[followerId].add(followeeId)

          def unfollow(self, followerId: int, followeeId: int) -> None:
              self.user_follows[followerId].discard(followeeId)
    explanation: |
      **Time Complexity:**
      - `postTweet`: O(1)
      - `follow`/`unfollow`: O(1)
      - `getNewsFeed`: O(T log T) where T is the total number of tweets from followed users

      **Space Complexity:** O(T) for collecting all tweets during `getNewsFeed`.

      This approach is simpler but less efficient. For users who follow many active posters, collecting and sorting all their tweets becomes expensive. The heap approach is preferred for production systems.