1

I am writing a script to gather results from an output file of a programme. The file contains headers, captions, and data in scientific format. I only want the data and I need a script that can do this repeatedly for different output files with the same results format.

This is the data:

    GROUP              1             2             3             4             5             6             7             8
VELOCITY (m/s)    59.4604E+06   55.5297E+06   52.4463E+06   49.3329E+06   45.4639E+06   41.6928E+06   37.7252E+06   34.9447E+06

GROUP              9            10            11            12            13            14            15            16
VELOCITY (m/s)    33.2405E+06   30.8868E+06   27.9475E+06   25.2880E+06   22.8815E+06   21.1951E+06   20.1614E+06   18.7338E+06

GROUP             17            18            19            20            21            22            23            24
VELOCITY (m/s)    16.9510E+06   15.7017E+06   14.9359E+06   14.2075E+06   13.5146E+06   12.8555E+06   11.6805E+06   10.5252E+06

This is my code at the moment. I want it to open the file, search for the keyword 'INPUT:BETA' which indicates the start of the results I want to extract. It then takes the information between this input keyword and the end identifier that signals the end of the data I want. I don't think this section needs changing but I have included it just in case.

I have then tried to use regex to specify the lines that start with VELOCITY (m/s) as these contain the data I need. This works and extracts each line, whitespace and all, into an array. However, I want each numerical value to be a single element, so the next line is supposed to strip the whitespace out and split the lines into individual array elements.

with open(file_name) as f:
        t=f.read()
        t=t[t.find('INPUT:BETA'):]
        t=t[t.find(start_identifier):t.find(end_identifier)]
        regex = r"VELOCITY \(m\/s\)\s(.*)"
        res = re.findall(regex, t)
        res = [s.split() for s in res]
        print(res)
        print(len(res))

This isn't working, here is the output:

[['33.2405E+06', '30.8868E+06', '27.9475E+06', '25.2880E+06', '22.8815E+06', '21.1951E+06', '20.1614E+06', '18.7338E+06'], ['16.9510E+06', '15.7017E+06', '14.9359E+06', '14.2075E+06', '13.5146E+06', '12.8555E+06', '11.6805E+06', '10.5252E+06']]
2

It's taking out the whitespace but not putting the values into separate elements, which I need for the next stage of the processing.

My question is therefore: How can I extract each value into a separate array element, leaving the rest of the data behind, in a way that will work with different output files with different data?

4
  • 1
    What do you mean by 'separate elements'? Commented Dec 22, 2020 at 15:12
  • I mean I need each value (e.g. 33.2405E+06) in its own element of the array. So the total size should be 16 (or 24 if the first line is included), not 2 as it is currently. Commented Dec 22, 2020 at 15:16
  • 1
    So you want to flatten a list of lists. Commented Dec 22, 2020 at 15:22
  • Yes, that did it thank you! Commented Dec 22, 2020 at 15:45

1 Answer 1

2

Here is how you can flatten your list, which is your point 1.

import re

text = """
 GROUP              1             2             3             4             5             6             7             8
VELOCITY (m/s)    59.4604E+06   55.5297E+06   52.4463E+06   49.3329E+06   45.4639E+06   41.6928E+06   37.7252E+06   34.9447E+06

GROUP              9            10            11            12            13            14            15            16
VELOCITY (m/s)    33.2405E+06   30.8868E+06   27.9475E+06   25.2880E+06   22.8815E+06   21.1951E+06   20.1614E+06   18.7338E+06

GROUP             17            18            19            20            21            22            23            24
VELOCITY (m/s)    16.9510E+06   15.7017E+06   14.9359E+06   14.2075E+06   13.5146E+06   12.8555E+06   11.6805E+06   10.5252E+06
"""

regex = r"VELOCITY \(m\/s\)\s(.*)"
res = re.findall(regex, text)
res = [s.split() for s in res]
res = [value for lst in res for value in lst]
print(res)
print(len(res))

Your regex isn't skipping your first line though. There must be an error in the rest of your code.

Sign up to request clarification or add additional context in comments.

1 Comment

This is great, thank you! And yes, I just found the error in the rest of the code - will update the question now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.