Skip to content Skip to sidebar Skip to footer

Read File Headers And Delimiters Using Python

I am reading all the files from a given folder (contains Dir, Sub dir and files of type .csv, .txt ..) I need to get the following information into an output file in the following

Solution 1:

The following approach should work for you, it uses Python's csv.sniffer feature to attempt to determine the correct dialect to use for reading the file. This also contains the delimiter that is used.

import os, csv

header_output = ['FolderLocation', 'FileName', 'Delimiter', 'Columns']
path = r'D:\UnZipFiles'withopen(r'D:\OutputFile\Columns_Info.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header_output)

    for root, folders, files in os.walk(path):
        for file in files:
            full_file_path = os.path.join(root, file)

            withopen(full_file_path, 'rb') as f_input:
                try:
                    dialect = csv.Sniffer().sniff(f_input.read(1024))
                    f_input.seek(0)
                    csv_input = csv.reader(f_input, dialect)
                    header_input = next(csv_input)
                    csv_output.writerow([root, file, dialect.delimiter] + header_input)
                except csv.Error as e:
                    print"{} - could not determine the delimiter".format(file)

As an alternative to csv.sniffer, you could devise your own, but the Python one is much more powerful than this:

defget_delimiter(file_name):            
    cols_found = []

    for delim in [',', ';', '|', '\t']:
        withopen(file_name, 'rb') as f_in:
            cols_found.append([len(next(csv.reader(f_in, delimiter=delim))), delim])

    if cols_found[-1][0] > 1:
        returnsorted(cols_found)[-1][1]
    else:
        returnNoneprint get_delimiter('my.csv')

This returns a possible delimiter by counting which delimiter results in the most columns in the first row. If only one column is found, it returns None to indicate no matching delimiter was found. It could instead raise an exception.

Post a Comment for "Read File Headers And Delimiters Using Python"