Parsing Graph Data File With Python
I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows: first line
Solution 1:
Is that what you want ?
{1: {'downs': [], 'ups': [2, 3], 'node_type': 1},
2: {'downs': [1, 3], 'ups': [], 'node_type': 1},
3: {'downs': [2], 'ups': [1], 'node_type': 2}}
Then here's the code:
def parse_chunk(chunk):
node_id = int(chunk[0])
node_type = int(chunk[1])
nb_up = int(chunk[2])
if nb_up:
ups = map(int, chunk[3].split())
next_pos = 4
else:
ups = []
next_pos = 3
nb_down = int(chunk[next_pos])
if nb_down:
downs = map(int, chunk[next_pos+1].split())
else:
downs = []
return node_id, dict(
node_type=node_type,
ups=ups,
downs=downs
)
def collect_chunks(lines):
chunk = []
for line in lines:
line = line.strip()
if line:
chunk.append(line)
else:
yield chunk
chunk = []
if chunk:
yield chunk
def parse(stream):
nb_nodes = int(stream.next().strip())
if not nb_nodes:
return []
stream.next()
return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))
def main(*args):
with open(args[0], "r") as f:
print parse(f)
if __name__ == "__main__":
import sys
main(*sys.argv[1:])
Solution 2:
I would do it as presented below. I would add a try-catch around file-reading, and read your files with the with
-statement
nodes = {}
counter = 0
with open(node_file, 'r', encoding='utf-8') as file:
file.readline() # skip first line, not a node
for line in file.readline():
if line == "\n":
line = file.readline() # read next line
counter = line[0]
nodes[counter] = {} # create a nested dict per node
line = file.readline()
nodes[counter]['type'] = line[0] # add node type
line = file.readline()
if line[0] != '0':
line = file.readline() # there are many ways
up_edges = line[0].split() # you can store edges
nodes[counter]['up'] = up_edges # here a list
line = file.readline()
else:
line = file.readline()
if line[0] != '0':
line = file.readline()
down_edges = line[0].split() # store down-edges as a list
nodes[counter]['down'] = down_edges
# end of chunk/node-set, let for-loop read next line
else:
print("this should never happen! line: ", line[0])
This reads the files per line. I'm not sure about your data-files, but this is easier on your memory. IF memory is an issue, this will be slower in terms of HDD reading (although a SSD does miracles)
Haven't tested the code, but the concept is clear :)
Post a Comment for "Parsing Graph Data File With Python"