Skip to content Skip to sidebar Skip to footer

Lambda Functions In Python

In the NLTK toolkit, I try to use the lambda function to filter the results. I have a test_file and a terms_file What I'm doing is to use the likelihood_ratio in NLTK to rank the m

Solution 1:

(This is sort of a wild guess, but I'm pretty confident that this is the cause of your problem.

Judging from your pseudo-code, the lem function operates on a file handle, reading some information from that file. You need to understand that a file handle is an iterator, and it will be exhausted when iterated once. That is, the first call to lem works as expected, but then the file is fully read and further calls will yield no results.

Thus, I suggest storing the result of lem in a list. This should also be much faster than reading the file again and again. Try something like this:

all_lemma = lem(terms_file) # temporary variable holding the result of `lem`
finder.apply_ngram_filter(lambda *w: w not in all_lemma)

Your line finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)]) does not work, because while this creates a list from the result of lem, it does so each time the lambda is executed, so you end up with the same problem.

(Not sure what apply_ngram_filter does, so there might be more problems after that.)


Update: Judging from your other question, it seems like lem itself is a generator function. In this case, you have to explicitly convert the results to a list; otherwise you will run into just the same problem when that generator is exhausted.

all_lemma = list(lem(terms_file))

If the elements yielded by lem are hashable, you can also create a set instead of a list, i.e. all_lemma = set(lem(terms_file)); this will make the lookup in the filter much faster.

Solution 2:

If I understand what you are saying, lem(terms_file) returns a list of lemmas. But what do "lemmas" look like? apply_ngram_filter() will only work if each "lemma" is a tuple of exactly two words. If that is indeed the case, then your code should work after you've fixed the file input as suggested by @tobias_k.

Even if your code works, the output of lem() should be stored as a set, not a list. Otherwise your code will be abysmally slow.

all_lemmas = set(lem(terms_file))

But I'm not too sure the above assumptions are right. Why would all lemmas be exactly two words long? I'm guessing that "lemmas" are one word long, and you intended to discard any ngram containing a word that is not in your list. If that's true you need apply_word_filter(), not apply_ngram_filter(). Note that it expects one argument (a word), so it should be written like this:

finder.apply_word_filter(lambda w: w not in all_lemmas)

Post a Comment for "Lambda Functions In Python"