Skip to content Skip to sidebar Skip to footer

Python Script To Concatenate All The Files In The Directory Into One File

I have written the following script to concatenate all the files in the directory into one single file. Can this be optimized, in terms of idiomatic python time Here is the sn

Solution 1:

Use shutil.copyfileobj to copy data:

import shutil

withopen(outfilename, 'wb') as outfile:
    for filename in glob.glob('*.txt'):
        if filename == outfilename:
            # don't want to copy the output into the outputcontinuewithopen(filename, 'rb') as readfile:
            shutil.copyfileobj(readfile, outfile)

shutil reads from the readfile object in chunks, writing them to the outfile fileobject directly. Do not use readline() or a iteration buffer, since you do not need the overhead of finding line endings.

Use the same mode for both reading and writing; this is especially important when using Python 3; I've used binary mode for both here.

Solution 2:

You can iterate over the lines of a file object directly, without reading the whole thing into memory:

withopen(fname, 'r') as readfile:
    for line in readfile:
        outfile.write(line)

Solution 3:

No need to use that many variables.

withopen(outfilename, 'w') as outfile:
    for fname in filenames:
        withopen(fname, 'r') as readfile:
            outfile.write(readfile.read() + "\n\n")

Solution 4:

Using Python 2.7, I did some "benchmark" testing of

outfile.write(infile.read())

vs

shutil.copyfileobj(readfile, outfile)

I iterated over 20 .txt files ranging in size from 63 MB to 313 MB with a joint file size of ~ 2.6 GB. In both methods, normal read mode performed better than binary read mode and shutil.copyfileobj was generally faster than outfile.write.

When comparing the worst combination (outfile.write, binary mode) with the best combination (shutil.copyfileobj, normal read mode), the difference was quite significant:

outfile.write, binary mode: 43 seconds, on average.

shutil.copyfileobj, normal mode: 27 seconds, on average.

The outfile had a final size of 2620 MB in normal read mode vs 2578 MB in binary read mode.

Solution 5:

The fileinput module provides a natural way to iterate over multiple files

for line in fileinput.input(glob.glob("*.txt")):
    outfile.write(line)

Post a Comment for "Python Script To Concatenate All The Files In The Directory Into One File"