codetutor/backend/data/questions/roman-to-integer.yaml

title: Roman to Integer
slug: roman-to-integer
difficulty: easy
leetcode_id: 13
leetcode_url: https://leetcode.com/problems/roman-to-integer/
categories:
  - strings
  - hash-tables
  - math
patterns:
  - greedy

function_signature: "def roman_to_int(s: str) -> int:"

test_cases:
  visible:
    - input: { s: "III" }
      expected: 3
    - input: { s: "LVIII" }
      expected: 58
    - input: { s: "MCMXCIV" }
      expected: 1994
  hidden:
    - input: { s: "IV" }
      expected: 4
    - input: { s: "IX" }
      expected: 9
    - input: { s: "XL" }
      expected: 40
    - input: { s: "XC" }
      expected: 90
    - input: { s: "CD" }
      expected: 400
    - input: { s: "CM" }
      expected: 900

description: |
  Roman numerals are represented by seven different symbols: `I`, `V`, `X`, `L`, `C`, `D` and `M`.

  | Symbol | Value |
  |--------|-------|
  | I      | 1     |
  | V      | 5     |
  | X      | 10    |
  | L      | 50    |
  | C      | 100   |
  | D      | 500   |
  | M      | 1000  |

  For example, `2` is written as `II` in Roman numeral, just two ones added together. `12` is written as `XII`, which is simply `X + II`. The number `27` is written as `XXVII`, which is `XX + V + II`.

  Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not `IIII`. Instead, the number four is written as `IV`. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as `IX`. There are six instances where subtraction is used:

  - `I` can be placed before `V` (5) and `X` (10) to make 4 and 9
  - `X` can be placed before `L` (50) and `C` (100) to make 40 and 90
  - `C` can be placed before `D` (500) and `M` (1000) to make 400 and 900

  Given a roman numeral, convert it to an integer.

constraints: |
  - `1 <= s.length <= 15`
  - `s` contains only the characters `('I', 'V', 'X', 'L', 'C', 'D', 'M')`
  - It is **guaranteed** that `s` is a valid roman numeral in the range `[1, 3999]`

examples:
  - input: 's = "III"'
    output: "3"
    explanation: "III = 3."
  - input: 's = "LVIII"'
    output: "58"
    explanation: "L = 50, V = 5, III = 3."
  - input: 's = "MCMXCIV"'
    output: "1994"
    explanation: "M = 1000, CM = 900, XC = 90 and IV = 4."

explanation:
  intuition: |
    Imagine reading a Roman numeral from left to right like reading a sentence. Each symbol has a value, and normally you'd just add them all up. But there's a twist: sometimes a smaller symbol appears *before* a larger one, signalling subtraction instead of addition.

    Think of it like this: when you see `IV`, the `I` (1) comes before `V` (5). This is the Roman way of saying "one less than five" = 4. The same logic applies to `IX` (9), `XL` (40), `XC` (90), `CD` (400), and `CM` (900).

    The **core insight** is simple: as you scan from left to right, if a symbol is smaller than the one that follows it, you *subtract* its value instead of adding it. Otherwise, you add it normally.

    This works because valid Roman numerals never have more than one subtraction symbol in a row. When you see a smaller value followed by a larger one, it's always a subtraction case.

  approach: |
    We solve this using a **Single Pass with Lookahead** approach:

    **Step 1: Create a symbol-to-value mapping**

    - Use a hash map to store the value of each Roman symbol
    - This gives O(1) lookup time for each character

    &nbsp;

    **Step 2: Initialise the result**

    - `total`: Set to `0` to accumulate our answer

    &nbsp;

    **Step 3: Iterate through the string**

    - For each position `i`, get the current symbol's value
    - **Lookahead check**: Compare with the next symbol's value (if it exists)
    - If current value < next value: this is a subtraction case, so *subtract* current value from total
    - Otherwise: *add* current value to total

    &nbsp;

    **Step 4: Return the result**

    - After processing all characters, `total` contains the final integer value

    &nbsp;

    The greedy nature of this approach works because we make the correct add/subtract decision at each step based on local information (current vs next symbol).

  common_pitfalls:
    - title: Overcomplicating with Special Cases
      description: |
        A common mistake is trying to handle all six subtraction cases (`IV`, `IX`, `XL`, `XC`, `CD`, `CM`) as special two-character patterns with separate logic.

        This leads to complex code with many conditionals. The elegant solution recognises that **all subtraction cases share one property**: the first symbol is smaller than the second.

        Instead of checking for specific pairs, simply compare adjacent values.
      wrong_approach: "if-else chains for IV, IX, XL, XC, CD, CM"
      correct_approach: "Compare current value with next value"

    - title: Off-by-One Errors in Lookahead
      description: |
        When comparing the current symbol with the next one, be careful at the end of the string. Accessing `s[i+1]` when `i` is the last index causes an index out of bounds error.

        Always check that `i + 1 < len(s)` before accessing the next character, or iterate up to `len(s) - 1` for comparisons.
      wrong_approach: "Accessing s[i+1] without bounds checking"
      correct_approach: "Check i + 1 < len(s) before lookahead"

    - title: Processing from Right to Left Confusion
      description: |
        Some solutions iterate right-to-left, which also works but can be confusing. The logic inverts: if current > previous (to the right), subtract. This is mathematically equivalent but less intuitive.

        Left-to-right with lookahead matches how we naturally read Roman numerals.

  key_takeaways:
    - "**Pattern recognition over enumeration**: Instead of listing all special cases, find the underlying rule (smaller before larger = subtract)"
    - "**Hash maps for symbol lookups**: O(1) character-to-value mapping is cleaner than switch statements"
    - "**Lookahead technique**: Comparing current element with the next one is a common string/array pattern"
    - "**Foundation for Integer to Roman**: Understanding this conversion helps with the reverse problem (LeetCode 12)"

  time_complexity: "O(n). We traverse the string exactly once, where `n` is the length of the input string."
  space_complexity: "O(1). The hash map has a fixed size of 7 entries regardless of input size."

solutions:
  - approach_name: Single Pass with Lookahead
    is_optimal: true
    code: |
      def roman_to_int(s: str) -> int:
          # Map each Roman symbol to its integer value
          values = {
              'I': 1,
              'V': 5,
              'X': 10,
              'L': 50,
              'C': 100,
              'D': 500,
              'M': 1000
          }

          total = 0
          n = len(s)

          for i in range(n):
              # Get current symbol's value
              current = values[s[i]]

              # If there's a next symbol and it's larger, subtract current
              if i + 1 < n and current < values[s[i + 1]]:
                  total -= current
              else:
                  # Otherwise, add current value
                  total += current

          return total
    explanation: |
      **Time Complexity:** O(n) — Single pass through the string.

      **Space Complexity:** O(1) — Fixed-size hash map with 7 entries.

      We iterate through each character, using lookahead to determine whether to add or subtract. The subtraction rule naturally handles all six special cases without explicit enumeration.

  - approach_name: Right-to-Left Traversal
    is_optimal: true
    code: |
      def roman_to_int(s: str) -> int:
          values = {
              'I': 1, 'V': 5, 'X': 10, 'L': 50,
              'C': 100, 'D': 500, 'M': 1000
          }

          total = 0
          prev = 0  # Track previous value (to our right)

          # Process from right to left
          for char in reversed(s):
              current = values[char]

              # If current is smaller than what's to the right, subtract
              if current < prev:
                  total -= current
              else:
                  total += current

              prev = current  # Update previous for next iteration

          return total
    explanation: |
      **Time Complexity:** O(n) — Single pass through the string.

      **Space Complexity:** O(1) — Fixed-size hash map with 7 entries.

      Processing right-to-left with a `prev` variable is mathematically equivalent. If the current value is smaller than what we've already processed (to the right), we subtract. This avoids the bounds check needed for lookahead.

  - approach_name: Replace Subtraction Pairs
    is_optimal: false
    code: |
      def roman_to_int(s: str) -> int:
          # Replace subtraction pairs with additive equivalents
          replacements = [
              ('IV', 'IIII'),   # 4 = 1+1+1+1
              ('IX', 'VIIII'),  # 9 = 5+1+1+1+1
              ('XL', 'XXXX'),   # 40 = 10+10+10+10
              ('XC', 'LXXXX'),  # 90 = 50+10+10+10+10
              ('CD', 'CCCC'),   # 400 = 100+100+100+100
              ('CM', 'DCCCC')   # 900 = 500+100+100+100+100
          ]

          for old, new in replacements:
              s = s.replace(old, new)

          # Now simply sum all values
          values = {'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000}
          return sum(values[c] for c in s)
    explanation: |
      **Time Complexity:** O(n) — String replacement and summation are both linear.

      **Space Complexity:** O(n) — Creates new strings during replacement.

      This approach transforms the string to remove subtraction cases, then sums all values. While correct, it uses extra space and is less elegant than the lookahead solution. Included to show an alternative way of thinking about the problem.