How Can I Match All The Key Value Pair In Python Which Running Too Long
User-item affinity and recommendations : I am creating a table which suggests 'customers who bought this item also bought algorithm ' Input dataset productId userId Prod1
Solution 1:
Yes, algorithm could be improved. You are recalculating user list for items in inside loop multiple times. You can just get a dictionary of item and their users outside loops.
# get unique items
items = set(main.productId)
n_users = len(set(main.userId))
# make a dictionary of item and users who bought that item
item_users = main.groupby('productId')['userId'].apply(set).to_dict()
# iterate over combinations of item1 and item2 and store scores
result = []
for item1, item2 in itertools.combinations(items, 2):
score = len(item_users[item1] & item_users[item2]) / n_users
item_tuples = [(item1, item2), (item2, item1)]
result.append((item1, item2, score))
result.append((item2, item1, score)) # store score for reverse order as well# convert results to a dataframe
result = pd.DataFrame(result, columns=["item1", "item2", "score"])
Timing differences:
Original implementation from question
# 3 loops, best of 3: 41.8 ms per loop
Mark's Method 2
# 3 loops, best of 3: 19.9 ms per loop
Implementation in this answer
# 3 loops, best of 3: 3.01 ms per loop
Solution 2:
The key here is to create a cartesian product of productId. See code below,
Method 1(works with smaller dataset)
result=(main.drop_duplicates(['productId','userId'])
.assign(cartesian_key=1)
.pipe(lambda x:x.merge(x,on='cartesian_key'))
.drop('cartesian_key',axis=1)
.loc[lambda x:(x.productId_x!=x.productId_y) & (x.userId_x==x.userId_y)]
.groupby(['productId_x','productId_y']).size()
.div(data['userId'].nunique()))
result
Prod1 prod2 0.75
Prod1 prod3 0.75
Prod1 prod4 0.75
Prod1 prod5 0.5
prod2 Prod1 0.75
prod2 prod3 0.5
prod2 prod4 0.5
prod2 prod5 0.25
prod3 Prod1 0.75
prod3 prod2 0.5
prod3 prod4 0.5
prod3 prod5 0.5
prod4 Prod1 0.75
prod4 prod2 0.5
prod4 prod3 0.5
prod4 prod5 0.5
prod5 Prod1 0.5
prod5 prod2 0.25
prod5 prod3 0.5
prod5 prod4 0.5
Method 2
result = (df.groupby(['productId','userId']).size()
.clip(upper=1)
.unstack()
.assign(key=1)
.reset_index()
.pipe(lambda x:x.merge(x,on='key'))
.drop('key',axis=1)
.loc[lambda x:(x.productId_x!=x.productId_y)]
.set_index(['productId_x','productId_y'])
.pipe(lambda x:x.set_axis(x.columns.str.split('_',expand=True),axis=1,inplace=False))
.swaplevel(axis=1)
.pipe(lambda x:(x['x']+x['y']))
.fillna(0)
.div(2)
.mean(axis=1))
Post a Comment for "How Can I Match All The Key Value Pair In Python Which Running Too Long"