Skip to content Skip to sidebar Skip to footer

Counting The Number Of Unique Words In A List

Using the following code from https://stackoverflow.com/a/11899925, I am able to find if a word is unique or not (by comparing if it was used once or greater than once): helloStrin

Solution 1:

The best way to solve this is to use the set collection type. A set is a collection in which all elements are unique. Therefore:

unique = set([ 'one', 'two', 'two']) 
len(unique) # is 2

You can use a set from the outset, adding words to it as you go:

unique.add('three')

This will throw out any duplicates as they are added. Or, you can collect all the elements in a list and pass the list to the set() function, which will remove the duplicates at that time. The example I provided above shows this pattern:

unique = set([ 'one', 'two', 'two'])
unique.add('three')

# unique now contains {'one', 'two', 'three'}

Read more about sets in Python.


Solution 2:

You have many options for this, I recommend a set, but you can also use a counter, which counts the amount a number shows up, or you can look at the number of keys for the dictionary you made.


Set

You can also convert the list to a set, where all elements have to be unique. Not unique elements are discarded:

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
helloSet = set(helloString) #=> ['doing', 'how', 'are', 'world', 'you', 'hello', 'today']
uniqueWordCount = len(set(helloString)) #=> 7

Here's a link to further reading on sets

Counter

You can also use a counter, which can also tell you how often a word was used, if you still need that information.

from collections import Counter

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
len(counter) #=> 7
counter["world"] #=> 2

Loop

At the end for your loop, you can check the len of count, also, you mistyped helloString as words:

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in helloString:
   if word in count :
      count[word] += 1
   else:
      count[word] = 1
len(count) #=> 7

Solution 3:

You can use collections.Counter

helloString = ['hello', 'world', 'world']

from collections import Counter

c = Counter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

I know the question doesn't specifically ask for this, but to maintain order

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

c = OrderedCounter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

Solution 4:

In your current code you can either increment uniqueWordCount in the else case where you already set count[word], or just lookup the number of keys in the dictionary: len(count).

If you only want to know the number of unique elements, then get the elements in the set: len(set(helloString))


Solution 5:

I would do this using a set.

def stuff(helloString):
    hello_set = set(helloString)
    return len(hello_set)

Post a Comment for "Counting The Number Of Unique Words In A List"