Skip to content Skip to sidebar Skip to footer

Slow Code In "inner Joins" Lists In Python

I have seen several posts about lists in python here, but I don't find a correct answer to my question; because it is about optimize a code. I have a python code to compare two lis

Solution 1:

I'll suggest to transform/keep your data structure into/as dicts. In that way, you won't need to iterate over both lists with nested for loops - an O(n) or O(n x m) operation - searching for where the lists' code numbers align before updating the score value.

You'll simply update the value of the score where the key at the corresponding dict matches the search string:

dct_score = dict(itemswithscore)
dct_license = dict(itemswithlicense)
for k in dct_score:
    if dct_license.get(k) == 'THIS': # use dict.get in case key does not exist
         dct_score[k] += 50

Solution 2:

I'm pretty sure the slowness is mostly due to the looping itself, which is not very fast in Python. You can speed up the code somewhat by caching variables, like so:

for sublist1 in itemswithscore:a=sublist1[0]# Save to variable to avoid repeated list-lookupfor sublist2 in itemswithlicense:ifa==sublist2[0]:ifsublist2[1]=='THIS':sublist1[1]+=50

Also, if you happen to know that 'THIS' does not occur in itemswithlicense more than once, you should insert a break after you update sublist1[1].

Let me know how much of a different this make.

Solution 3:

It would be very efficient if you can use pandas.

So You can make two dataframes and merge them on a single column

Something like this

itemswithscore = [5675, 0], [6676, 0], [9898, 0], [4545, 0]
itemswithlicense = [9999, 'ATR'], [9191, 'OPOP'], [9898, 'THIS'], [2222, 'PLPL']

df1 = pd.DataFrame(list(itemswithscore), columns =['code', 'points'])
df2 = pd.DataFrame(list(itemswithlicence), columns=['code', 'license'])

df3 = pd.merge(df1, df2 , on='code', how='inner')
df3 = df3.drop('points', axis=1)

Hope this helps, accept if correct

Cheers!

Post a Comment for "Slow Code In "inner Joins" Lists In Python"