Regex To Remove New Lines Up To A Specific Character
Solution 1:
As noted in the comments, your best bet is to use an existing FASTA parser. Why not?
Here's how I would join lines based on the leading greater-than:
def joinup(f):
buf = []
for line in f:
if line.startswith('>'):
if buf:
yield " ".join(buf)
yield line.rstrip()
buf = []
else:
buf.append(line.rstrip())
yield " ".join(buf)
for joined_line in joinup(open("...")):
# blah blah...
Solution 2:
you don't have to use regex:
[ x.startswith('>') and x or x.replace('\n','') for x in f.readlines()]
should work.
In [43]: f=open('test.txt')
In [44]: contents=[ x.startswith('>') and x or x.replace('\n','') for x in f.readlines()]
In [45]: contents
Out[45]:
['>HEADER_Text1\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada',
'>HEADER_Text2\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada',
'>HEADER_Text3\n',
'Information here, yada yada yada',
'Some more information here, yada yada yada',
'Even some more information here, yada yada yada']
Solution 3:
this should also work.
sampleText=""">HEADER_Text1 Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
HEADER_Text2 Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada HEADER_Text3 Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada""""
cleartext = re.sub ("\n(?!>)", "", sampleText)
print cleartext
HEADER_Text1Information here, yada yada yadaSome more information here, yada yada yadaEven some more information here, yada yada yada HEADER_Text2Information here, yada yada yadaSome more information here, yada yada yadaEven some more information here, yada yada yada HEADER_Text3Information here, yada yada yadaSome more information here, yada yada yadaEven some more information here, yada yada yada
Solution 4:
Given that the > is always expected to be the first character on the new line
"\n([^>])" with " \1"
Solution 5:
You really don't want a regex. And for this job, python and biopython are superfluous. If that's actually FASTQ format, just use sed
:
sed '/^>/ { N; N; N; s/\n/ /2g }' file
Results:
>HEADER_Text1
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text2
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
>HEADER_Text3
Information here, yada yada yada Some more information here, yada yada yada Even some more information here, yada yada yada
Post a Comment for "Regex To Remove New Lines Up To A Specific Character"