How To Use Aws Lambda To Convert Pdf Files To .txt With Python
I need to automate the conversion of many pdf to text files using AWS lambda in python 3.7 I've successfully converted pdf files using poppler/pdftotext, tika, and PyPDF2 on my own
Solution 1:
AWS lambda only allows you to write into the /tmp folder, so you should download the file and put it in there
Solution 2:
As the error states, you are trying to write to a read-only filesystem. You are using the download_file
method which tries to save the file to 'test.pdf' which fails. Try using download_fileobj
(link) together with an in-memory buffer (e.g. io.BytesIO
) instead. Then, feed that stream to PyPDF2.
Example:
import io
[...]
pdf_stream = io.StringIO()
object.download_fileobj(pdf_stream)
pdf_obj = PdfFileReader(pdf_stream)
[...]
Post a Comment for "How To Use Aws Lambda To Convert Pdf Files To .txt With Python"