Skip to content Skip to sidebar Skip to footer

How To Extract Data From A Dataset Using Regex In Python?

I have a dataset and I would like to extract the appositive feature from this dataset. در همین حال ،

Solution 1:

I reduced your dataset file to:

A
<corefcoref_coref_class="set_0"coref_mentiontype="ne"markable_scheme="coref"coref_coreftype="ident">
B
</coref><corefcoref_coref_class="set_0"coref_mentiontype="np"markable_scheme="coref"coref_coreftype="atr">
C
</coref>
D
<corefcoref_coreftype="ident"coref_coref_class="empty"coref_mentiontype="ne"markable_scheme="coref">
E
</coref>
F

And tried this code, which is almost the same you provided:

import re

withopen ("test_dataset.log", "r") as myfile:
    read_dataset = myfile.read()

i_ident = []
j_atr = []
find_ident = re.findall(r'<coref.*?coref_coref_class="set_.*?coref_mentiontype="ne".*?coref_coreftype="ident".*?>(.*?)</coref>', read_dataset, re.S)
ident_list = list(map(lambda x: x.replace('\n', ' '), find_ident))
for i inrange(len(ident_list)):
    i_ident.append(str(ident_list[i]))

find_atr = re.findall(r'<coref.*?coref_coreftype="atr".*?>(.*?)</coref>', read_dataset, re.S)
atr_list = list(map(lambda x: x.replace('\n', ' '), find_atr))
#print(coref_list)for i inrange(len(atr_list)):
    j_atr.append(str(atr_list[i]))

print(i_ident)
print()
print(j_atr)

And got this output, which seems right to me:

[' B '][' C ']

Post a Comment for "How To Extract Data From A Dataset Using Regex In Python?"