Python-Parse Email Body And Truncate MIME Headers
I have an email body which looks somewhat like . Now I want to remove all the header from it and just have the conversation email text. How can I do it in python? I tried email.par
Solution 1:
import imaplib,email
hst = "your.host.adresse.com"
usr = "login"
pwd = "password"
imap = imaplib.IMAP4(hst)
try:
imap.login(usr, pwd)
except Exception as e:
raise IOError(e)
try:
imap.select("Inbox") # Tell Imap where to go
result, data = imap.uid('search', None, "ALL")
latest = data[0].split()[-1]
result, data = imap.uid('fetch', latest, '(RFC822)')
a = data[0][1] # This contains the Mail Data
except Exception as e:
raise IOError(e)
b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
b = (payload.get_payload())
else:
b = (b.get_payload())
print b
This removes all the stuff from the mail you don't want in the final text. I've tested this with your code. You didn't show how you import the mail (your a
) so i guess that's where you get the decoding problem from.
If you have any trouble with HTML Mails:
from bs4 import BeautifulSoup
soup = BeautifulSoup(b, 'html.parser')
soup = soup.get_text()
print soup
That should do the job for now, but I'd advise you to change the default python parser to lxml or html5lib.
Post a Comment for "Python-Parse Email Body And Truncate MIME Headers"