I have following string.
DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO
I need to create dictionary so it would be like
{
"DATE": "12242010",
"Key Type": "Nod32 Anti-Vir (30d trial)",
"Key": "a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO"
}
The problem is that string is unformatted
DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial)
- there is no space after Date before Key Type
- also it would be nice to have some validation for Key, eg if there are 5 chars in each box of key and number of boxes
I am a beginner in python and moreover in regular expressions. Thanks a lot.
Here is my code. I am getting string from xpath. Why I can't use it in regex?
import re
import lxml.html as my_lxml_hmtl
tree = my_lxml_hmtl.parse("test.html")
text = tree.xpath("string(//*[contains(text(),'DATE')])")
# this works
print re.match('DATE:\s+([0-9]{8})\s*Key Type:\s+(.+)\s+Key:\s+((?:[^-]{5}(?:-[^-]{5})*))', 'DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO').groups()
# and this doesn't work, why?
ss = str(text)
# print ss gives the same string which worked in re fabove
print re.match('DATE:\s+([0-9]{8})\s*Key Type:\s+(.+)\s+Key:\s+((?:[^-]{5}(?:-[^-]{5})*))', ss).groups()
when I'm trying to use text or str(text) instead of 'DATE: 12242010Key Type: Nod32 Anti-Vir (30d trial) Key: a5B2s-sH12B-hgtY3-io87N-srg98-KLMNO' I'm getting an error AttributeError: 'NoneType' object has no attribute 'groups'
What is wrong here?