Skip to content Skip to sidebar Skip to footer

Import Kaggle Csv From Download Url To Pandas Dataframe

I've been trying different methods to import the SpaceX missions csv file on Kaggle directly into a pandas DataFrame, without any success. I'd need to send requests to login. This

Solution 1:

You are creating a stream and passing it directly to pandas. I think you need to pass a file like object to pandas. Take a look at this answer for a possible solution (using post and not get in the request though).

Also i think the login url with redirect that you use is not working as it is. I know i suggested that here. But i ended up not using is because the post request call did not handle the redirect (i suspect).

The code i ended up using in my project was this:

deffrom_kaggle(data_sets, competition):
    """Fetches data from Kaggle

    Parameters
    ----------
    data_sets : (array)
        list of dataset filenames on kaggle. (e.g. train.csv.zip)

    competition : (string)
        name of kaggle competition as it appears in url
        (e.g. 'rossmann-store-sales')

    """
    kaggle_dataset_url = "https://www.kaggle.com/c/{}/download/".format(competition)

    KAGGLE_INFO = {'UserName': config.kaggle_username,
                   'Password': config.kaggle_password}

    for data_set in data_sets:
        data_url = path.join(kaggle_dataset_url, data_set)
        data_output = path.join(config.raw_data_dir, data_set)
        # Attempts to download the CSV file. Gets rejected because we are not logged in.
        r = requests.get(data_url)
        # Login to Kaggle and retrieve the data.
        r = requests.post(r.url, data=KAGGLE_INFO, stream=True)
        # Writes the data to a local file one chunk at a time.withopen(data_output, 'wb') as f:
            # Reads 512KB at a time into memoryfor chunk in r.iter_content(chunk_size=(512 * 1024)):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)

Example use:

sets = ['train.csv.zip',
        'test.csv.zip',
        'store.csv.zip',
        'sample_submission.csv.zip',]
from_kaggle(sets, 'rossmann-store-sales')

You might need to unzip the files.

def_unzip_folder(destination):
    """Unzip without regards to the folder structure.

    Parameters
    ----------
    destination : (str)
        Local path and filename where file is should be stored.
    """with zipfile.ZipFile(destination, "r") as z:
        z.extractall(config.raw_data_dir)

So i never really directly loaded it into the DataFrame, but rather stored it to disk first. But you could modify it to use a temp directory and just delete the files after you read them.

Post a Comment for "Import Kaggle Csv From Download Url To Pandas Dataframe"