0

I have a sting containing multiple informations which I want to save in a dictionary:

s1 = "10:12:01    R1 3    E44"
s2 = "11:11:01    R100    E400"

pattern = "\d{2}:\d{2}:\d{2}(\,\d+)?" + \
          " +" + \
          "[0-9A-Za-z _]{2}([0-9A-Za-z _]{1})?([0-9A-Za-z _]{1})?" + \
          " +" + \
          "[0-9A-Za-z _]{2}([0-9A-Za-z _]{1})?([0-9A-Za-z _]{1})?$"

# --> 

d1 = {"time" : "10:12:01",
      "id1" : "R1 3", 
      "id2" : "E44"}

d2 = {"time" : "11:11:01",
      "id1" : "R100", 
      "id2" : "E400"}

is there a way doing this directly with python re?

Note: I'm aware that there is a similar question here: regex expression string dictionary python, however the formulation is not precisly pointing to what I expact as answer.

4
  • If you want regex to directly return a dict, no. Commented May 21, 2019 at 14:19
  • 1
    Named groups will make it easier, but I'm not aware of a way to directly return a map from a regex result. Commented May 21, 2019 at 14:21
  • Your d1 and d2 don't match with s1 and s2, and you can use whitespaces to split the string, check my anwer below @OliverWilken Commented May 21, 2019 at 14:26
  • @Devesh Kumar Singh: Thanks, I corrected it Commented May 21, 2019 at 14:34

2 Answers 2

1
>>> import re
>>> pattern = "(?P<time>\d{2}:\d{2}:\d{2}(\,\d+)?) +(?P<id1>[0-9A-Za-z_]{2}([0-9A-Za-z1-9_]{1})?([0-9A-Za-z_]{1})?) +(?P<id2>[0-9A-Za-z_]{2}([0-9A-Za-z1-9_]{1})?([0-9A-Za-z_]{1})?$)"
>>>
>>> s1 = "10:12:01    R123    E44"
>>> print(re.match(pattern, s1).groupdict())
{'time': '10:12:01', 'id1': 'R123', 'id2': 'E44'}
Sign up to request clarification or add additional context in comments.

1 Comment

if you change the pattern to pattern = "(?P<time>\d{2}:\d{2}:\d{2}(\,\d+)?) +(?P<id1>[0-9A-Za-z _]{2}([0-9A-Za-z1-9 _]{1})?([0-9A-Za-z _]{1})?) +(?P<id2>[0-9A-Za-z _]{2}([0-9A-Za-z1-9 _]{1})?([0-9A-Za-z _]{1})?$)" it works
1

If the information is cleanly divided by whitespaces, why not use that information to split the string by whitespace and create the resultant list of dictionaries.
If we have multiple whitespaces, we can ignore those whitespaces while splitting using re.split

import re

#List of strings
li = [ "10:12:01    R1 3    E44", "11:11:01    R100    E400"]

#List of kyes
keys = ['time', 'id1', 'id2']

#Create the dictionary from keys from keys listand values obtained by splitting string on 2 or more whitespaces
result = [{keys[idx]:re.split(r'\s{2,}', s)[idx] for idx in range(len(keys))} for s in li]

print(result)

The output will be

[
{'time': '10:12:01', 'id1': 'R1 3', 'id2': 'E44'}, 
{'time': '11:11:01', 'id1': 'R100', 'id2': 'E400'}
]

3 Comments

there might be white spaces in the id as well. I will edit my question to make that clear
How many whitespaces? Please include all possible variations of the string
I have updated my example to consider your updated example, please check again @OliverWilken :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.