Parse path string

Question

Suppose path strings are encoded in this way:

2T3T5: means move 2 steps, turn right, move 3 steps, turn right, and move 5 steps
32T: which better read as 3(2T), and it means do the following operation 3 times => (move 2 steps, turn right), if written in a complete word, it means move 2 steps, turn right, move 2 steps, turn right, move 2 steps, turn right
3(4T2T)5T, which means do 3 time operation of (move 4 steps, turn right, move 2 steps, turn right), then move 5 steps and turn right

The method I want to review is parse_path, and I put some other mock methods for debug purposes only. Any advice to make the parser more reliable, more elegant/shorter and any code style advice is highly appreciated.

class Movement:
    def __init__(self,x,y,delta_x,delta_y):
        self.x = x
        self.y = y
        self.delta_x = delta_x
        self.delta_y = delta_y
    def move(self, steps):
        print 'move ', steps
    def turn_right(self):
        print 'turn right'
    def parse_path(self, source):
        #2T3T5
        #32T
        #3(4T2T)5T
        start = 0
        index = 0
        while index < len(source):
            if source[index] == 'T':
                steps = int(source[start:index])
                if steps >= 10:
                    times = steps / 10
                    steps = steps % 10
                else:
                    times = 1
                for t in range(times):
                    self.move(steps)
                    self.turn_right()
                start = index + 1
                index += 1
            elif index == len(source) - 1:
                steps = int(source[start:index+1])
                if steps >= 10:
                    times = steps / 10
                    steps = steps % 10
                else:
                    times = 1
                for t in range(times):
                    self.move(steps)
                start = index + 1
                index += 1
            elif source[index] == '(':
                right = index
                while source[right] != ')':
                    right += 1
                times = int(source[start:index])
                for t in range(times):
                    self.parse_path(source[index+1:right])
                index = right + 1
                start = index
            else:
                index += 1

if __name__ == "__main__":
    m = Movement(0,0,0,0)
    # 2T3T5
    # 32T
    # 3(4T2T)5T
    #m.parse_path("2T3T5")
    #m.parse_path("32T")
    m.parse_path("3(4T2T)5T")

How should 999T be parsed? 99(9T), 9(99T) or 9(9(9T))? — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Apr 21, 2017 at 8:17
It is parsed as 99(9T). But is it a good design choice to have the times/steps like this? In my opinion, no! To get 13 steps and a turn, you'll have to write 131T... — holroy
– holroy, Commented Apr 21, 2017 at 15:22
I hope to get around to answering this tomorrow, or the day after, but I've been kind of busy lately. — holroy
– holroy, Commented Apr 21, 2017 at 21:12
@holroy No, worse, 13 steps and a turn is (121)1T or (8)5T since 131T is 13 steps and 13 turns (13(1T)). — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Apr 21, 2017 at 21:56
@MathiasEttinger, I was indeed wrong, but it turns out you are wrong as well. Neither (121)1T nor (8)5T works, but 1(121)1T or 1(8)5T does work... :-D — holroy
– holroy, Commented Apr 21, 2017 at 22:25

Reinderien · Accepted Answer · 2025-11-29 16:09:37Z

Movement conflates at least three different things: tokenising/scanning (part of lexing), evaluating, and application of the path to start coordinates. In other words, a Movement class could preserve x, y; but should not have any parsing at all.

I don't think it's a very good idea to infer times, steps by modulus. A lexer would separate these by character and lexical context instead.

You have some "test case" strings but no true tests. Write unit tests.

This is one way to represent the lexing frontend and its output as a stream of lexemes (tokens):

import re
import typing
from dataclasses import dataclass


class Token: pass

@dataclass(frozen=True, slots=True)
class OpenParen(Token): pass

@dataclass(frozen=True, slots=True)
class CloseParen(Token): pass

@dataclass(frozen=True, slots=True)
class Move(Token):
    distance: int

@dataclass(frozen=True, slots=True)
class Repeat(Token):
    count: int

@dataclass(frozen=True, slots=True)
class Turn(Token): pass


LEX_PAT = re.compile(r'''(?x)
      \(   # open paren
    | \)   # close paren
    | T    # right turn command
    | (?P<move>   # named capture
        \d        # move count, single digit
    )
    (?=           # positive lookahead
        T|\)|$    # move must appear prior to one of: turn, close paren, end
    )
    | (?P<repeat> # named capture
        \d+       # repeat count, one or more digits, greedy
    )
    (?=           # positive lookahead
        \d|\(     # repeat must appear prior to one of: move digit, open paren
    )
''')


def lex(content: str) -> typing.Iterator[Token]:
    start = 0
    while start < len(content):
        groups  = LEX_PAT.match(string=content, pos=start)
        if groups is None:
            raise ValueError(f'Invalid syntax at position {start} "{content[start:]}"')
        if (move := groups['move']) is not None:
            yield Move(distance=int(move))
        elif (repeat := groups['repeat']) is not None:
            yield Repeat(count=int(repeat))
        else:
            match groups.group():
                case '(':
                    yield OpenParen()
                case ')':
                    yield CloseParen()
                case 'T':
                    yield Turn()
                case _:
                    raise ValueError(f'Unexpected regex state at position {start} "{content[start:]}"')

        start = groups.end()


def op_test() -> None:
    actual = tuple(lex('2T3T5'))
    expected = (Move(2), Turn(), Move(3), Turn(), Move(5))
    assert actual == expected, 'OP case 1 with final move'

    actual = tuple(lex('32T'))
    expected = (Repeat(3), Move(2), Turn())
    assert actual == expected, 'OP case 2a with implicit repeat'

    actual = tuple(lex('3(2T)'))
    expected = (Repeat(3), OpenParen(), Move(2), Turn(), CloseParen())
    assert actual == expected, 'OP case 2b with explicit repeat'

    actual = tuple(lex('3(4T2T)5T'))
    expected = (
        Repeat(3), OpenParen(),
        Move(4), Turn(), Move(2), Turn(),
        CloseParen(), Move(5), Turn(),
    )
    assert actual == expected, 'OP case 3 with paren group'


def comments_test() -> None:
    actual = tuple(lex('999T'))
    expected = (Repeat(99), Move(9), Turn())
    assert actual == expected, '301_Moved_Permanently (1)'

    actual = tuple(lex('(121)1T'))
    expected = (OpenParen(), Repeat(12), Move(1), CloseParen(), Move(1), Turn())
    assert actual == expected, '301_Moved_Permanently (2a)'

    actual = tuple(lex('(8)5T'))
    expected = (OpenParen(), Move(8), CloseParen(), Move(5), Turn())
    assert actual == expected, '301_Moved_Permanently (2b)'

    actual = tuple(lex('131T'))
    expected = (Repeat(13), Move(1), Turn())
    assert actual == expected, '301_Moved_Permanently (2c)'

    actual = tuple(lex('1(121)1T'))
    expected = (Repeat(1), OpenParen(), Repeat(12), Move(1), CloseParen(), Move(1), Turn())
    assert actual == expected, 'holroy (1)'


def positive_test() -> None:
    actual = tuple(lex('TTT'))
    expected = (Turn(), Turn(), Turn())
    assert actual == expected, 'Turns only are fine'

    actual = tuple(lex(')'))
    expected = (CloseParen(),)
    assert actual == expected, 'Unbalanced parens are not validated during lex'

    actual = tuple(lex('T4'))
    expected = (Turn(), Move(4))
    assert actual == expected, 'OK to turn then move'

    actual = tuple(lex(''))
    expected = ()
    assert actual == expected, 'Empty is OK'


def negative_test() -> None:
    it = lex('42h')
    assert next(it) == Repeat(4), 'First token should work'
    try:
        next(it)
        raise AssertionError('Should have failed parsing')
    except ValueError as e:
        assert e.args == ('Invalid syntax at position 1 "2h"',), '2 precedes an invalid char'



if __name__ == '__main__':
    op_test()
    comments_test()
    positive_test()
    negative_test()

toolic · Accepted Answer · 2025-11-29 12:31:25Z

Portability

I realize this question was posted many years ago when Python version 2.x was prevalent, but now that it is deprecated, consider porting to 3.x. To do so, it is necessary to add parentheses for print calls. Change:

print 'move ', steps

to:

print('move ', steps)

You can also use an f-string:

print(f'move {steps}')

Simpler

These lines:

start = index + 1
index += 1

can be simplified as:

index += 1
start = index

This saves one addition. This is done in 2 places in the code.

In this line:

for t in range(times):

the variable t is commonly replaced with the _ placeholder because t is not used otherwise:

for _ in range(times):

DRY

This expression is used a couple times:

len(source)

You can set it to a variable before the while loop. That should also be more efficient since you will only execute len once before the loop instead of each time through the loop.

Documentation

The PEP 8 style guide recommends adding docstrings for classes and functions.

class Movement:
    """
    What does this class do...
    Describe the inputs x, y, etc., their types, their ranges, etc.
    """

For the parse_path function, replace the comments with a docstring. Describe how the function works, mention that it is called recursively and describe the input type.

Also consider using type hints to describe input and return types to make the code more self-documenting.

Layout

The code uses some inconsistent whitespace. The black program can be used to automatically reformat the code with consistency.

Stack Exchange Network

Parse path string

2 Answers 2

Portability

Simpler

DRY

Documentation

Layout

You must log in to answer this question.

Hot Network Questions

2 Answers 2

Portability

Simpler

Documentation

Layout

You must log in to answer this question.

Related