Optimality of the support-confidence border
You return to the founder with the scatterplot produced in the previous exercise and ask whether she would like you to use pruning to recover the support-confidence border. You tell her about the Bayardo-Agrawal result, but she seems skeptical and asks whether you can demonstrate this in an example.
Recalling that scatterplots can scale the size of dots according to a third metric, you decide to use that to demonstrate optimality of the support-confidence border. You will show this by scaling the dot size using the lift metric, which was one of the metrics to which Bayardo-Agrawal applies. The one-hot encoded data has been imported for you and is available as onehot. Additionally, apriori() and association_rules() have been imported and pandas is available as pd.
Diese Übung ist Teil des Kurses
Market Basket Analysis in Python
Anleitung zur Übung
- Apply the Apriori algorithm to the DataFrame
onehot. - Compute the association rules using the
supportmetric and a minimum threshold of 0.0. - Complete the expression for the scatterplot such that the dot size is scaled by
lift.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Import seaborn under its standard alias
import seaborn as sns
# Apply the Apriori algorithm with a support value of 0.0075
frequent_itemsets = ____(____, min_support = 0.0075,
use_colnames = True, max_len = 2)
# Generate association rules without performing additional pruning
rules = ____(frequent_itemsets, metric = "support",
min_threshold = ____)
# Generate scatterplot using support and confidence
sns.scatterplot(x = "support", y = "confidence",
size = "____", data = rules)
plt.show()