2

I have following string.

DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO

I need to create dictionary so it would be like

{
    "DATE": "12242010",
    "Key Type": "Nod32 Anti-Vir (30d trial)",
    "Key": "a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO"
}

The problem is that string is unformatted

DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) 
  • there is no space after Date before Key Type
  • also it would be nice to have some validation for Key, eg if there are 5 chars in each box of key and number of boxes

I am a beginner in python and moreover in regular expressions. Thanks a lot.


Here is my code. I am getting string from xpath. Why I can't use it in regex?

import re
import lxml.html as my_lxml_hmtl
tree = my_lxml_hmtl.parse("test.html")
text = tree.xpath("string(//*[contains(text(),'DATE')])")
# this works
print re.match('DATE:\s+([0-9]{8})\s*Key Type:\s+(.+)\s+Key:\s+((?:[^-]{5}(?:-[^-]{5})*))', 'DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO').groups()

# and this doesn't work, why?
ss = str(text)
# print ss gives the same string which worked in re fabove
print re.match('DATE:\s+([0-9]{8})\s*Key Type:\s+(.+)\s+Key:\s+((?:[^-]{5}(?:-[^-]{5})*))', ss).groups()

when I'm trying to use text or str(text) instead of 'DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO' I'm getting an error AttributeError: 'NoneType' object has no attribute 'groups'

What is wrong here?

3
  • Do you always have a string with DATE, Key Type and Key there or are there sometimes differences? Commented Dec 24, 2010 at 10:22
  • possible duplicate of Python: Split list in array Commented Dec 24, 2010 at 10:28
  • DATE, Key Type and Key are always present Commented Dec 24, 2010 at 10:32

3 Answers 3

1
>>> import re
>>> regex = re.compile(r"DATE: (\d+)Key Type: (.*?) Key: ((?:\w{5}-){5}\w{5})")
>>> match = regex.match("DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO")
>>> mydict = {"DATE": match.group(1),
...           "Key Type": match.group(2),
...           "Key": match.group(3)}
>>> mydict
{'DATE': '12242010', 'Key': 'a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO', 'Key Type': '
Nod32 Anti-Vir (30d trial)'}
>>>

The regex DATE: (\d+)Key Type: (.*?) Key: ((?:\w{5}-){5}\w{5}) matches the date (digits only) and key type (any characters); then it matches a key if it consists of six groups of five alphanumeric characters each, separated by dashes.

Sign up to request clarification or add additional context in comments.

Comments

1

If you can rely on the headers being the same then you've lucked out.

>>> re.match('DATE:\s+([0-9]{8})\s*Key Type:\s+(.+)\s+Key:\s+((?:[^-]{5}(?:-[^-]{5})*))', 'DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO').groups()
('12242010', 'Nod32 Anti-Vir (30d trial)', 'a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO')

You may have to get the count of groups in post-processing though, if you ever expect it to change.

Comments

1
import re

def strToDict(inStr, keyList, sep=''):
    rxPieces = [pc + sep + '(.*?)' for pc in keyList]
    rx = re.compile(''.join(rxPieces) + '$')
    match = rx.match(inStr)
    return dict(zip(kl, match.groups()))

def isKey(inStr):
    rx = re.compile('(\w{5}-\w{5}-\w{5}-\w{5}-\w{5}-\w{5})')
    return (rx.match(inStr) is not None)

s = "DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO"
res = strToDict(s, ['DATE','Key Type','Key'], ': ')

returns

{
    'DATE': '12242010',
    'Key': 'a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO',
    'Key Type': 'Nod32 Anti-Vir (30d trial) '
}

and

if isKey(res['Key']):
    print 'Found valid key'

returns True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.