Skip to content Skip to sidebar Skip to footer

How To Use Aws Lambda To Convert Pdf Files To .txt With Python

I need to automate the conversion of many pdf to text files using AWS lambda in python 3.7 I've successfully converted pdf files using poppler/pdftotext, tika, and PyPDF2 on my own

Solution 1:

AWS lambda only allows you to write into the /tmp folder, so you should download the file and put it in there

Solution 2:

As the error states, you are trying to write to a read-only filesystem. You are using the download_file method which tries to save the file to 'test.pdf' which fails. Try using download_fileobj(link) together with an in-memory buffer (e.g. io.BytesIO) instead. Then, feed that stream to PyPDF2.

Example:

import io
[...]

pdf_stream = io.StringIO()
object.download_fileobj(pdf_stream)
pdf_obj = PdfFileReader(pdf_stream)

[...]

Post a Comment for "How To Use Aws Lambda To Convert Pdf Files To .txt With Python"