-1

I'm trying to transform a text file which looks like the following:

14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}

With many more rows of the logs. I need to convert it so it is all in a single json object like the following:

{"date_time": "2019-10-14 13:00:19", "url": "www.google.com","type":"click", "user":"root", "ip":"0.0.0.0"}

But I cannot seem to work out an obvious way in Python, any help appreciated

1
  • Welcome to StackOverflow! Why don't you add headers to your file with names of your fields, load it to Pandas DataFrame and convert it to json like it's described here - stackoverflow.com/questions/50384883/… Commented Oct 29, 2019 at 18:00

3 Answers 3

1

You could use datetime and json module. Open the file and iterate over lines, you may need to adapt some parts of the code.

strptime behavior

Working example:

import datetime
import json

in_text = """14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}"""

item_list = []
for line in in_text.split("\n"):
    date, url, json_part = line.split("|")
    item = {
        "date_time": datetime.datetime.strptime(date.strip(), "%d/%m/%Y %H:%M:%S"),
        "url": url.strip(),
    }
    item.update(json.loads(json_part))
    item_list.append(item)

print(item_list)

To read lines from a file:

with open("your/file/path.txt") as fh:
    for line in fh:
        # Copy the code from the above example.
        ...
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Got this to work whilst using putting the text as a string. Haven't been able to work out how parse the file through it yet though
@devnotdev I updated my answer to also cover reading from a file
Thank you very much
0
import json
from ast import literal_eval

def transform_to_json(row):

    d = literal_eval(row[2].strip())
    d["date_time"] = row[0]
    d["url"] = row[1]

    return d


with open('example.txt', 'r') as file:
    json_objs = [transform_to_json(row.split('|')) for row in file.readlines()]

single_json_result = json.dumps(json_objs)

Comments

0

Use pandas:

  • Given your data, as described, in a .txt file.
  • .to_json has various parameters to customize the final look of the JSON file.
  • Having the data in a dataframe has the advantage of allowing for additional analysis
  • The data has a number of issues that can easily be fixed
    • No column names
    • Improper datatime format
    • Whitespace around the URL
import pandas as pd

# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})

# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])

# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())

# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())

# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')

# drop old column 2
df3.drop(columns=[2], inplace=True)

# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)

# dataframe view
          date_time               url   type  user       ip
2019-10-14 13:00:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:02:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:05:19   www.google.com   click  root  0.0.0.0

# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')

JSON file

[{
        "date_time": "2019-10-14T13:00:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }, {
        "date_time": "2019-10-14T13:02:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }, {
        "date_time": "2019-10-14T13:05:19.000Z",
        "url": "www.google.com",
        "type": "click",
        "user": "root",
        "ip": "0.0.0.0"
    }
]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.