Slow Code In "inner Joins" Lists In Python
Solution 1:
I'll suggest to transform/keep your data structure into/as dicts. In that way, you won't need to iterate over both lists with nested for loops - an O(n) or O(n x m) operation - searching for where the lists' code numbers align before updating the score value.
You'll simply update the value of the score where the key at the corresponding dict matches the search string:
dct_score = dict(itemswithscore)
dct_license = dict(itemswithlicense)
for k in dct_score:
if dct_license.get(k) == 'THIS': # use dict.get in case key does not exist
dct_score[k] += 50
Solution 2:
I'm pretty sure the slowness is mostly due to the looping itself, which is not very fast in Python. You can speed up the code somewhat by caching variables, like so:
for sublist1 in itemswithscore:a=sublist1[0]# Save to variable to avoid repeated list-lookupfor sublist2 in itemswithlicense:ifa==sublist2[0]:ifsublist2[1]=='THIS':sublist1[1]+=50
Also, if you happen to know that 'THIS'
does not occur in itemswithlicense
more than once, you should insert a break
after you update sublist1[1]
.
Let me know how much of a different this make.
Solution 3:
It would be very efficient if you can use pandas.
So You can make two dataframes and merge them on a single column
Something like this
itemswithscore = [5675, 0], [6676, 0], [9898, 0], [4545, 0]
itemswithlicense = [9999, 'ATR'], [9191, 'OPOP'], [9898, 'THIS'], [2222, 'PLPL']
df1 = pd.DataFrame(list(itemswithscore), columns =['code', 'points'])
df2 = pd.DataFrame(list(itemswithlicence), columns=['code', 'license'])
df3 = pd.merge(df1, df2 , on='code', how='inner')
df3 = df3.drop('points', axis=1)
Hope this helps, accept if correct
Cheers!
Post a Comment for "Slow Code In "inner Joins" Lists In Python"