Pandas Parse Csv With Left And Right Quote Chars
I am trying to read a file in pandas which is structured as follows $$><$$$$><$$$$> $$><$$$$>&
Solution 1:
You need escape $
by \
, because it is read as regex (end of string):
(separators > 1 char and different from '\s+' are interpreted as regex)
import pandas as pd
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>$$>
<foo>$$><$$<bar>$$><$$<baz>$$>"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
header=None,
engine='python')
print (df)
0 1 2
0 <first> <second> <first>$$>
1 <foo> <bar> <baz>$$>
And then for remove $$>
from last column is possible use replace
(added &
for end of string):
df.iloc[:, -1] = df.iloc[:, -1].str.replace('\$\$>$', '')
print (df)
0 1 2
0 <first> <second> <first>
1 <foo> <bar> <baz>
And for remove quoting:
df = df.replace(['^<', '>$'], ['', ''], regex=True)
print (df)
0 1 2
0 first second first
1 foo bar baz
Both replace together:
df = df.replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
print (df)
0 1 2
0 first second first
1 foo bar baz
Post a Comment for "Pandas Parse Csv With Left And Right Quote Chars"