Skip to content Skip to sidebar Skip to footer

Removing Stop Words Without Using Nltk Corpus

I am trying to remove stop words in a text file without using nltk. I have f1,f2,f3 three text files. f1 has text line by line and f2 has stop words list and f3 is empty file. I wa

Solution 1:

YOu can use Linux Sed method for removing the stopwords

sed -f <(sed 's/.*/s|\\\<&\\\>||g/' stopwords.txt) all_lo.txt > all_remove1.txt

Solution 2:

What I would personally do is loop through the list of stop words (f2) and append each word to a list in your script. Ex:

stoplist = []
file1 = open('f1.txt','r')
file2 = open('f2.txt','r')
file3 = open('f3.txt','a') # append mode. Similar to rw
for line in f2:
    w = line.split()
    for word in w:
        stoplist.append(word)
#end 
for line in file1:
    w = line.split()
    for word in w:
        if word in stoplist: continue
        else: 
            file3.write(word)
#end 
file1.close()
file2.close()
file3.close()

Solution 3:

your first for loop is wrong because by this command for word in words: t=word you havnt all words in t the words is a list and you can work with it : also if your files contain multiple line your list dont contain all words !! you must do it like this ! it works correctly !

f1 = open("a.txt","r")
f2 = open("b.txt","r")
f3 = open("c.txt","w")
first_words=[]
second_words=[]
for line in f1:
 words = line.split()
 for w in words:
  first_words.append(w)

for line in f2:
 w = line.split()
 for i in w:
  second_words.append(i)


for word1 in first_words :
 for word2 in second_words:
   if word1==word2:
    first_words.remove(word2)

for word in first_words:
 f3.write(word)
 f3.write(' ')

f1.close()
f2.close()
f3.close()

Post a Comment for "Removing Stop Words Without Using Nltk Corpus"