Gzipfile Not Supported By S3?

July 09, 2024 Post a Comment

I am trying to iterate through some file paths so that I gzip each file individually. Each item in the testList contains strings (paths) like this: /tmp/File. After gzipping them,

Solution 1:

Assuming each file can fit into memory, you can simply do this to compress the data in-memory and package it in a BytesIO for the S3 API to read.

import boto3
import gzip
import io


s3 = boto3.client("s3")
bucket = s3_resource.Bucket("testunzipping")
for i in testList:
    fileName = i.replace("/tmp/DataPump_10000838/", "")
    withopen(i, "rb") as f_in:
        gzipped_content = gzip.compress(f_in.read())
        bucket.upload_fileobj(
            io.BytesIO(gzipped_content),
            fileName,
            ExtraArgs={"ContentType": "text/plain", "ContentEncoding": "gzip"},
        )

If that's not the case, you can use a tempfile to compress the data onto disk first:

import boto3
import gzip
import io
import shutil


s3 = boto3.client("s3")
bucket = s3_resource.Bucket("testunzipping")
for i in testList:
    fileName = i.replace("/tmp/DataPump_10000838/", "")
    with tempfile.TemporaryFile() as tmpf:
        withopen(i, "rb") as f_in, gzip.GzipFile(mode="wb", fileobj=tmpf) as gzf:
            shutil.copyfileobj(f_in, gzf)
        tmpf.seek(0)
        bucket.upload_fileobj(
            tmpf,
            fileName,
            ExtraArgs={"ContentType": "text/plain", "ContentEncoding": "gzip"},
        )

Getting Started with Python

Gzipfile Not Supported By S3?

Solution 1:

Post a Comment for "Gzipfile Not Supported By S3?"