Skip to content Skip to sidebar Skip to footer

Limitation To Python's Glob?

I'm using glob to feed file names to a loop like so: inputcsvfiles = glob.iglob('NCCCSM*.csv') for x in inputcsvfiles: csvfilename = x do stuff here The toy example that

Solution 1:

Try doing a ls * on shell for those 10,000 entries and shell would fail too. How about walking the directory and yield those files one by one for your purpose?

#credit - @dabeaz - generators tutorialimport os
import fnmatch

defgen_find(filepat,top):
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)

# Example useif __name__ == '__main__':
    lognames = gen_find("NCCCSM*.csv",".")
    for name in lognames:
        print name

Solution 2:

One issue that arose was not with Python per se, but rather with ArcPy and/or MS handling of CSV files (more the latter, I think). As the loop iterates, it creates a schema.ini file whereby information on each CSV file processed in the loop gets added and stored. Over time, the schema.ini gets rather large and I believe that's when the performance issues arise.

My solution, although perhaps inelegant, was do delete the schema.ini file during each loop to avoid the issue. Doing so allowed me to process the 10k+ CSV files, although rather slowly. Truth be told, we wound up using GRASS and BASH scripting in the end.

Solution 3:

If it works for 100 files but fails for 10000, then check that arcpy.AddJoin_management closes csvfile after it is done with it.

There is a limit on the number of open files that a process may have at any one time (which you can check by running ulimit -n).

Post a Comment for "Limitation To Python's Glob?"