How To Read And Organize Text Files Divided By Keywords

January 20, 2024 Post a Comment

I'm working on this code (on python) that reads a text file. The text file contains information to construct a certain geometry, and it is separated by sections by using keywords,

Solution 1:

You can read the file once and store the contents in a dictionary. Since you have conveniently labeled the "command" lines with a *, you can use all lines beginning with a * as the dictionary key and all following lines as the values for that key. You can do this with a for loop:

withopen('geometry.txt') as f:
    x = {}  
    key = None# store the most recent "command" herefor y in f.readlines()
        if y[0] == '*':
            key = y[1:] # your "command"
            x[key] = []
        else:
            x[key].append(y.split()) # add subsequent lines to the most recent key

Or you can take advantage of python's list and dictionary comprehensions to do the same thing in one line:

with open('test.txt') as f:
    x = {y.split('\n')[0]:[z.split() for z in y.strip().split('\n')[1:]] fory in f.read().split('*')[1:]}

which I'll admit is not very nice looking but it gets the job done by splitting the entire file into chunks between '*' characters and then using new lines and spaces as delimiters to break up the remaining chunks into dictionary keys and lists of lists (as dictionary values).

Details about splitting, stripping, and slicing strings can be found here

Solution 2:

The fact that they are unordered I think lends itself well for parsing into a dictionary from which you can access values later. I wrote a function that you may find useful for this task:

features = ['POINTS','EDGES']

defparseFile(dictionary, f, features):
    """
    Creates a format where you can access a shape feature like:
        dictionary[shapeID][feature] = [  [1 1 1], [1,1,1] ... ]

    Assumes: all features although out of order occurs in the order
        shape1
            *feature1
                .
                .
                .
            *featuren
    Assumes all possible features are in in the list features

    f is input file handle
    """
    shapeID = 0
    found = []
    for line in f:

        if line[0] == '*'and found != features:
            found.append(line[1:]) #appends feature like POINTS to found
            feature = line[1:]

        elif line[0] == '*'and found == features:
            found = []
            shapeID += 1
            feature = line[1:] #current featureelse:
            dictionary[shapeID][feature].append(
                [int(i) for i in line.split(' ')]
                )

    return dictionary

#to access the shape features you can get vertices like:for vertice in dictionary[shapeID]['POINTS']:
    print vertice

#to access edgesfor edge in dictionary[shapeID]['EDGES']:
    print edge

Solution 3:

You should just create a dictionary of the sections. You could use a generator to read the file and yield each section in whatever order they arrive and build a dictionary from the results. Here's some incomplete code that might help you along:

defload(f):
    withopen(f) as file:
        section = next(file).strip()  # Assumes first line is always a section
        data = []
        for line in file:
            if line[0] == '*':        # Any appropriate test for a new sectionyield section, data
                section = line.strip()
                data = []
            else:
                data.append(list(map(int, line.strip().split())))
        yield section, data

Assuming the data above is in a file called data.txt:

>>> data = dict(load('data.txt'))
>>> data
{'*EDGES': [[1, 1, 2], [2, 1, 4], [3, 2, 3], [4, 3, 4]],
 '*VERTICES': [[1, 0, 0, 0], [2, 10, 0, 0], [3, 10, 10, 0], [4, 0, 10, 0]]}

Then you can reference each section, e.g.:

for edge in data['*EDGES']:
    ...

Solution 4:

Assuming your file is named 'data.txt'

from collections import defaultdict

defget_data():
    d = defaultdict(list)
    withopen('data.txt') as f:
        key = Nonefor line in f:
            if line.startswith('*'):
                key = line.rstrip()
                continue
            d[key].append(line.rstrip())
    return d

The returned defaultdict looks like this:

defaultdict(list,
            {'*EDGES': ['1 1 2', '2 1 4', '3 2 3', '4 3 4'],
             '*VERTICES': ['1 0 0 0', '2 10 0 0', '3 10 10 0', '4 0 10 0']})

You access the data just like a normal dictionary

d['*EDGES']['1 1 2', '2 1 4', '3 2 3', '4 3 4']

Solution 5:

A common strategy with this type of parsing is to build a function that can yield the data a section at a time. Then your top-level calling code can be fairly simple because it doesn't have to worry about the section logic at all. Here's an example with your data:

import sys

defmain(file_path):
    # An example usage.for section_name, rows in sections(file_path):
        print('===============')
        print(section_name)
        for row in rows:
            print(row)

defsections(file_path):
    # Setup.
    section_name = None
    rows = []

    # Process the file.withopen(file_path) as fh:
        for line in fh:
            # Section start: yield any rows we have so far,# and then update the section name.if line.startswith('*'):
                if rows:
                    yield (section_name, rows)
                    rows = []
                section_name = line[1:].strip()
            # Otherwise, just add another row.else:
                row = line.split()
                rows.append(row)

    # Don't forget the last batch of rows.if rows:
        yield (section_name, rows)

main(sys.argv[1])

Getting Started with Python