Skip to content Skip to sidebar Skip to footer

Lowercase First Element Of Tuple In List Of Tuples

I have a list of documents, labeled with their appropriate categories: documents = [(list(corpus.words(fileid)), category) for category in corpus.categories()

Solution 1:

so your data structure is [([str], str)]. A list of tuples where each tuple is (list of strings, string). It's important to deeply understand what that means before you try to pull data out of it.

That means that for item in documents will get you a list of tuples, where item is each tuple.

That means that item[0] is the list in each tuple.

That means that for item in documents: for s in item[0]: will iterate through each string inside that list. Let's try that!

[s.lower() for item in documents for s in item[0]]

This should give, from your example data:

[u'a', u'p', u'i', u'o', u'a', u'm', ...]

If you're trying to keep the tuple format, you could do:

[([s.lower() for s in item[0]], item[1]) for item in documents]

# or perhaps more readably
[([s.lower() for s in lst], val) for lst, val in documents]

Both these statements give:

[([u'a', u'p', u'i', u'o', u'a', u'm', ...], 'cancer'), ... ]

Solution 2:

You are close. You are looking for a construction like this:

[([s.lower() for s in ls], cat) for ls, cat in documents]

Which essentially puts these two together:

[[x.lower() for x in element] for element in documents],
[(x.lower(), y) for x,y in documents]

Solution 3:

Try this:

documents = [([word.lower() for word in corpus.words(fileid)], category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]

Solution 4:

Normally, tuples are immutable. However, since your first element of each tuple is a list, that list is mutable, so you can modify its contents without changing the tuple ownership of that list:

documents = [(...what you originally posted...) ... etc. ...]

for d in documents:
    # to lowercase all strings in the list
    # trailing '[:]' is important, need to modify list in place using slice
    d[0][:] = [w.lower() for w in d[0]]

    # or to just lower-case the first element of the list (which is what you asked for)
    d[0][0] = d[0][0].lower()

You can't just call lower() on a string and have it get updated - lower() returns a new string. So to modify the string to be the lowercased version, you have to assign over it. This would not be possible if the string were itself a tuple member, but since the string you are modifying is in a list in the tuple, you can modify the list contents without modifying the tuple's ownership of the list.


Post a Comment for "Lowercase First Element Of Tuple In List Of Tuples"