- Solutions
- AI Accelerators
- Recommendation Engine with the DataRobot MultiLabel
Recommendation Engine with the DataRobot MultiLabel
This notebook will explore how to use historical user purchase data in order to create a recommendation model which will attempt to guess which products out of a basket of items the customer will be likely to purchase at a given point in time.
Request a DemoAt DataRobot, Multilabel modelling is a kind of classification task that, while similar to multiclass modelling, provides more flexibility. In multi-label modelling, each row in a dataset is associated with one or several labels. Extending this framework in our unlimited label mode and paired with feature discovery, allow the user frame a model that can be used to serve recommendations. Given the use case, this recommendation model can provide rank ordered suggestions of content, product, or services that a specific customer might like.
As an example, if we have historic purchases of a sample of customers, we can look at common spending habits across demographics and shopping baskets, identify new features and able to rank order anticipated items at the customer level. Some of the features automatically generated might be most common category of item per a specific geography, or the degree of a customers proclivity to try new things.
In [1]:
import json
import numpy as np
import pandas as pd
Load in relevant datasets
In [2]:
orders = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/orders.csv",
parse_dates=["order_time"],
infer_datetime_format=True,
)
order_products = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/order_products.csv"
)
test = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/test.csv"
)
In [3]:
orders.head()
Out [3]:
order_id | user_id | order_time | |
0 | 1913204 | 79 | 2015-01-01 12:00:00 |
1 | 3189274 | 79 | 2015-01-09 2:00:00 |
2 | 2667939 | 79 | 2015-01-17 19:00:00 |
3 | 2246974 | 79 | 2015-01-31 11:00:00 |
4 | 1390523 | 79 | 2015-02-21 13:00:00 |
Products
This dataset contains the products, inclusing the product details that made up all the orders. The total amount of orders in the dataset is 22064 and as an example, below is what a products from a couple of orders look like:
In [4]:
order_products.order_id = order_products.order_id.astype(int)
print("total_number of orders: {}".format(order_products.order_id.nunique()))
order_products[order_products.order_id.isin([518, 4418.0])].sort_values("order_id")
total_number of orders: 22064
Out[4]:
total_number of orders: 22064
Out[4]:
order_id | reordered | order_product_id | product_name | aisle | department | |
---|---|---|---|---|---|---|
94991 | 518 | 1 | 518_1 | Organic Avocado | fresh fruits | produce |
84002 | 518 | 1 | 518_5 | Bag of Organic Bananas | fresh fruits | produce |
137477 | 518 | 1 | 518_3 | Smoked Fresh Turkey Kielbasa | hot dogs bacon sausage | meat seafood |
43512 | 518 | 0 | 518_7 | Carrots | fresh vegetables | produce |
203134 | 518 | 1 | 518_2 | Organic Extra Firm Tofu | tofu meat alternatives | deli |
59407 | 518 | 1 | 518_4 | Organic Zucchini | fresh vegetables | produce |
136005 | 518 | 0 | 518_6 | Beef Short Ribs | meat counter | meat seafood |
57576 | 518 | 1 | 518_8 | Russet Potato | fresh vegetables | produce |
141401 | 4418 | 1 | 4418_12 | Gluten Free Millet-Chia Bread | bread | bakery |
145245 | 4418 | 0 | 4418_16 | Gluten free Sesame Bagels | breakfast bakery | bakery |
1002 | 4418 | 1 | 4418_11 | Organic Grade A Free Range Large Brown Eggs | eggs | dairy eggs |
196537 | 4418 | 1 | 4418_13 | Direct Trade Organic El Gallo Breakfast Blend … | coffee | beverages |
225315 | 4418 | 1 | 4418_6 | Light and Lean Quinoa Black Beans with Buttern… | frozen meals | frozen |
226139 | 4418 | 0 | 4418_17 | Organic Gluten Free Non-Dairy Beans & Rice Bur… | frozen meals | frozen |
227244 | 4418 | 0 | 4418_18 | Vegetable Lasagna | frozen meals | frozen |
166939 | 4418 | 0 | 4418_22 | Dark Chocolate Sea Salt Caramels | candy chocolate | snacks |
146465 | 4418 | 1 | 4418_15 | Blueberry Cinnamon Bread | bakery desserts | bakery |
87980 | 4418 | 1 | 4418_2 | Organic Hass Avocado | fresh fruits | produce |
231478 | 4418 | 0 | 4418_20 | Gluten Free Apple Cinnamon Waffles | frozen breakfast | frozen |
80888 | 4418 | 0 | 4418_14 | Organic Spring Mix | packaged vegetables fruits | produce |
73518 | 4418 | 1 | 4418_7 | Watermelon Chunks | packaged vegetables fruits | produce |
72267 | 4418 | 0 | 4418_1 | Organic Raspberries | packaged vegetables fruits | produce |
55016 | 4418 | 1 | 4418_10 | Red Peppers | fresh vegetables | produce |
54610 | 4418 | 0 | 4418_19 | Organic Roma Tomato | fresh vegetables | produce |
48179 | 4418 | 1 | 4418_9 | Green Bell Pepper | fresh vegetables | produce |
46515 | 4418 | 1 | 4418_3 | Asparagus | fresh vegetables | produce |
38967 | 4418 | 0 | 4418_23 | Whipped Cream Cheese | other creams cheeses | dairy eggs |
12282 | 4418 | 1 | 4418_8 | Strawberry Rhubarb Yoghurt | yogurt | dairy eggs |
7733 | 4418 | 1 | 4418_4 | Raspberry Yoghurt | yogurt | dairy eggs |
89873 | 4418 | 0 | 4418_21 | Banana | fresh fruits | produce |
231487 | 4418 | 1 | 4418_5 | Gluten Free Tofu Scramble Breakfast Wrap | frozen breakfast | frozen |
Orders
This dataset contains the orders along with the order time and the corresponding user_id of who-ever placed the order. Below is a sample of orders from March 1st in this example dataset.
In [5]:
orders.loc[orders.order_time.dt.strftime("%Y-%m-%d") == "2015-03-01"].head(10)
Out[5]:
order_id | user_id | order_time | |
---|---|---|---|
53 | 3000001 | 180 | 2015-03-01 07:00:00 |
178 | 1672206 | 311 | 2015-03-01 22:00:00 |
260 | 1785222 | 606 | 2015-03-01 04:00:00 |
435 | 822880 | 800 | 2015-03-01 13:00:00 |
591 | 359937 | 1169 | 2015-03-01 00:00:00 |
619 | 444002 | 1199 | 2015-03-01 02:00:00 |
949 | 2396674 | 1866 | 2015-03-01 02:00:00 |
1692 | 228107 | 3190 | 2015-03-01 18:00:00 |
1796 | 1036072 | 3369 | 2015-03-01 11:00:00 |
1963 | 1428014 | 3659 | 2015-03-01 19:00:00 |
Primary Training Data
Most common products
The next thing that we’ll need to do is create the train
dataset. The target for each row in the train
dataset will contain which of the top most common global items that this customer has purchased on a specific date. This list of the top most common global items will be the pool of the items to recommend. Rationale here is that rare items that are purchased rarely will be difficult to recommend so we will omit them from modelling. Below is a calculation to calculate the most common items purchased from the order_products
dataset
In [6]:
order_counts = (
pd.DataFrame(order_products.groupby("product_name").size(), columns=["count"])
.reset_index()
.sort_values("count", ascending=False)
.reset_index()
)
order_counts.head(10)
Out[6]:
index | product_name | count | |
---|---|---|---|
0 | 1201 | Banana | 5124 |
1 | 1118 | Bag of Organic Bananas | 3975 |
2 | 11241 | Organic Strawberries | 2202 |
3 | 9553 | Organic Baby Spinach | 1957 |
4 | 10348 | Organic Hass Avocado | 1842 |
5 | 9535 | Organic Avocado | 1529 |
6 | 15039 | Strawberries | 1130 |
7 | 11596 | Organic Yellow Onion | 1088 |
8 | 11612 | Organic Zucchini | 1082 |
9 | 7408 | Large Lemon | 1066 |
For this example we will set the max number of items to 500. In a multilabel framework the maximum distinct number of classes DataRobot can track is 1000.
In [7]:
max_number_of_items = 500
frequent_orders = order_counts.product_name.head(max_number_of_items).to_list()
print(frequent_orders[0:9])
['Banana', 'Bag of Organic Bananas', 'Organic Strawberries', 'Organic Baby Spinach', 'Organic Hass Avocado', 'Organic Avocado', 'Strawberries', 'Organic Yellow Onion', 'Organic Zucchini']
DataRobot MultiLabel
Multilabel modeling is a kind of classification task that, while similar to multiclass modeling, provides more flexibility. In multilabel modeling, each row in a dataset is associated with one, several, or zero labels. One common multilabel classification problem is text categorization (e.g., a movie description can include both “Crime”, “Drama” and “Black and White”).
More information: https://docs.datarobot.com/en/docs/modeling/special-workflows/multilabel.html#multilabel-modeling
In this case we will use this framework for the model to pull out the relevant most common items in each basket for the customer and use this as a framework. So if a person baught Bananas, Strawberries and garbage bags, the target variale for that user would be [‘Banana’, ‘Strawberries’]
.
In [8]:
multi_label_product_training = orders.copy()
multi_label_product_training["items"] = "[]"
for row in multi_label_product_training.itertuples():
order_id = row.order_id
order_data = order_products.loc[order_products.order_id == order_id]
items = order_data.loc[order_data.product_name.isin(frequent_orders)][
"product_name"
].to_list()
multi_label_product_training.loc[row.Index, "items"] = json.dumps(items)
multi_label_product_training = multi_label_product_training.drop("order_id", axis=1)
multi_label_product_training.order_time = (
multi_label_product_training.order_time.dt.date
)
multi_label_product_training.user_id = multi_label_product_training.user_id.astype(int)
In [9]:
multi_label_product_training.head(10)
Out[9]:
user_id | order_time | items | |
---|---|---|---|
0 | 79 | 2015-01-01 | [“Organic Extra Large Grade AA Brown Eggs”, “O… |
1 | 79 | 2015-01-09 | [“Organic Extra Large Grade AA Brown Eggs”, “O… |
2 | 79 | 2015-01-17 | [“Organic Extra Large Grade AA Brown Eggs”, “W… |
3 | 79 | 2015-01-31 | [“Organic Greek Plain Nonfat Yogurt”, “Organic… |
4 | 79 | 2015-02-21 | [“Organic Extra Large Grade AA Brown Eggs”, “O… |
5 | 79 | 2015-03-24 | [“Organic Greek Plain Nonfat Yogurt”, “Shredde… |
6 | 79 | 2015-04-23 | [“Organic Extra Large Grade AA Brown Eggs”, “T… |
7 | 79 | 2015-05-24 | [] |
8 | 80 | 2015-01-01 | [“Bag of Organic Bananas”] |
9 | 80 | 2015-01-22 | [“Hass Avocado”, “Seedless Red Grapes”, “Bag o… |
DataRobot API
In order to pass all of the datasets to DataRobot, we will use the feature discovery technique to allow DataRobot to automatically generate the necessary features for us. For more information about Feature Discovery please refer to:
Mastering many tables in production ML Accelerator: https://community.datarobot.com/t5/ai-accelerators/mastering-many-tables-in-production-ml-complete-workflow/td-p/15905
Feature Discovery Documentation: https://docs.datarobot.com/en/docs/data/transform-data/feature-discovery/index.html#feature-discovery
In [10]:
import datarobot as dr
from datarobot.utils import dataframe_to_buffer
dr.Client()
Out [10]:
<datarobot.rest.RESTClientObject at 0x7f4875b958d0>
Upload Datasets to AI Catalog
In [11]:
buff = dataframe_to_buffer(multi_label_product_training)
buff.name = "multi_label_product_training_dr"
multi_label_product_training_dr = dr.Dataset.create_from_file(filelike=buff)
buff = dataframe_to_buffer(order_products)
buff.name = "order_products_dr"
order_products_dr = dr.Dataset.create_from_file(filelike=buff)
buff = dataframe_to_buffer(orders)
buff.name = "orders_dr"
orders_dr = dr.Dataset.create_from_file(filelike=buff)
Define Relationships
In [12]:
dataset_definitions = [
{
"identifier": "orders_dr",
"catalogVersionId": orders_dr.version_id,
"catalogId": orders_dr.id,
"snapshotPolicy": "latest",
"primary_temporal_key": "order_time",
},
{
"identifier": "order_products_dr",
"catalogVersionId": order_products_dr.version_id,
"catalogId": order_products_dr.id,
"snapshotPolicy": "latest",
},
]
In [13]:
relationships = [
{
"dataset2Identifier": "orders_dr",
"dataset1Keys": ["user_id"],
"dataset2Keys": ["user_id"],
"feature_derivation_window_start": -31,
"feature_derivation_window_end": -1,
"feature_derivation_window_time_unit": "DAY",
"prediction_point_rounding": 1,
"prediction_point_rounding_time_unit": "MINUTE",
},
{
"dataset1Identifier": "orders_dr",
"dataset2Identifier": "order_products_dr",
"dataset1Keys": ["order_id"],
"dataset2Keys": ["order_id"],
},
]
relationship_config = dr.RelationshipsConfiguration.create(
dataset_definitions=dataset_definitions, relationships=relationships
)
Start Project
In [14]:
project = dr.Project.create_from_dataset(
multi_label_product_training_dr.id,
project_name="AI Accelerator - Recommendation Engine",
)
project.analyze_and_model(
target="items",
relationships_configuration_id=relationship_config.id,
partitioning_method=dr.GroupCV(
holdout_pct=20, reps=5, partition_key_cols=["user_id"]
),
# metric='MAPE',
feature_engineering_prediction_point="order_time",
mode=dr.enums.AUTOPILOT_MODE.QUICK,
max_wait=36000,
worker_count=-1,
)
Out [14]:
Project(AI Accelerator - Recommendation Engine)
In [15]:
project.wait_for_autopilot()
In progress: 14, queued: 0 (waited: 0s)
In progress: 14, queued: 0 (waited: 1s)
In progress: 14, queued: 0 (waited: 3s)
In progress: 14, queued: 0 (waited: 4s)
In progress: 14, queued: 0 (waited: 6s)
In progress: 14, queued: 0 (waited: 9s)
In progress: 14, queued: 0 (waited: 13s)
In progress: 14, queued: 0 (waited: 21s)
In progress: 13, queued: 0 (waited: 35s)
In progress: 13, queued: 0 (waited: 57s)
In progress: 13, queued: 0 (waited: 78s)
In progress: 13, queued: 0 (waited: 99s)
In progress: 13, queued: 0 (waited: 120s)
In progress: 13, queued: 0 (waited: 141s)
In progress: 13, queued: 0 (waited: 162s)
In progress: 13, queued: 0 (waited: 184s)
In progress: 13, queued: 0 (waited: 205s)
In progress: 13, queued: 0 (waited: 226s)
In progress: 13, queued: 0 (waited: 247s)
In progress: 13, queued: 0 (waited: 268s)
In progress: 13, queued: 0 (waited: 289s)
In progress: 13, queued: 0 (waited: 310s)
In progress: 13, queued: 0 (waited: 331s)
In progress: 13, queued: 0 (waited: 352s)
In progress: 10, queued: 0 (waited: 374s)
In progress: 10, queued: 0 (waited: 395s)
In progress: 8, queued: 0 (waited: 416s)
In progress: 8, queued: 0 (waited: 437s)
In progress: 8, queued: 0 (waited: 458s)
In progress: 8, queued: 0 (waited: 479s)
In progress: 8, queued: 0 (waited: 500s)
In progress: 8, queued: 0 (waited: 521s)
In progress: 8, queued: 0 (waited: 542s)
In progress: 8, queued: 0 (waited: 563s)
In progress: 7, queued: 0 (waited: 584s)
In progress: 7, queued: 0 (waited: 605s)
In progress: 6, queued: 0 (waited: 627s)
In progress: 6, queued: 0 (waited: 648s)
In progress: 6, queued: 0 (waited: 669s)
In progress: 6, queued: 0 (waited: 691s)
In progress: 6, queued: 0 (waited: 712s)
In progress: 6, queued: 0 (waited: 733s)
In progress: 6, queued: 0 (waited: 754s)
In progress: 6, queued: 0 (waited: 775s)
In progress: 6, queued: 0 (waited: 796s)
In progress: 6, queued: 0 (waited: 817s)
In progress: 6, queued: 0 (waited: 838s)
In progress: 6, queued: 0 (waited: 860s)
In progress: 5, queued: 0 (waited: 881s)
In progress: 3, queued: 0 (waited: 902s)
In progress: 3, queued: 0 (waited: 923s)
In progress: 3, queued: 0 (waited: 944s)
In progress: 3, queued: 0 (waited: 965s)
In progress: 3, queued: 0 (waited: 987s)
In progress: 3, queued: 0 (waited: 1008s)
In progress: 3, queued: 0 (waited: 1029s)
In progress: 3, queued: 0 (waited: 1050s)
In progress: 3, queued: 0 (waited: 1072s)
In progress: 3, queued: 0 (waited: 1093s)
In progress: 3, queued: 0 (waited: 1114s)
In progress: 3, queued: 0 (waited: 1135s)
In progress: 3, queued: 0 (waited: 1156s)
In progress: 3, queued: 0 (waited: 1177s)
In progress: 3, queued: 0 (waited: 1199s)
In progress: 3, queued: 0 (waited: 1220s)
In progress: 3, queued: 0 (waited: 1241s)
In progress: 3, queued: 0 (waited: 1262s)
In progress: 3, queued: 0 (waited: 1284s)
In progress: 3, queued: 0 (waited: 1305s)
In progress: 3, queued: 0 (waited: 1326s)
In progress: 3, queued: 0 (waited: 1347s)
In progress: 3, queued: 0 (waited: 1368s)
In progress: 3, queued: 0 (waited: 1389s)
In progress: 3, queued: 0 (waited: 1410s)
In progress: 3, queued: 0 (waited: 1432s)
In progress: 3, queued: 0 (waited: 1453s)
In progress: 3, queued: 0 (waited: 1474s)
In progress: 3, queued: 0 (waited: 1495s)
In progress: 3, queued: 0 (waited: 1516s)
In progress: 3, queued: 0 (waited: 1538s)
In progress: 3, queued: 0 (waited: 1560s)
In progress: 3, queued: 0 (waited: 1581s)
In progress: 3, queued: 0 (waited: 1602s)
In progress: 3, queued: 0 (waited: 1623s)
In progress: 3, queued: 0 (waited: 1644s)
In progress: 3, queued: 0 (waited: 1665s)
In progress: 3, queued: 0 (waited: 1687s)
In progress: 3, queued: 0 (waited: 1708s)
In progress: 3, queued: 0 (waited: 1729s)
In progress: 3, queued: 0 (waited: 1750s)
In progress: 3, queued: 0 (waited: 1772s)
In progress: 2, queued: 0 (waited: 1793s)
In progress: 2, queued: 0 (waited: 1814s)
In progress: 2, queued: 0 (waited: 1835s)
In progress: 2, queued: 0 (waited: 1856s)
In progress: 2, queued: 0 (waited: 1877s)
In progress: 2, queued: 0 (waited: 1898s)
In progress: 1, queued: 0 (waited: 1919s)
In progress: 1, queued: 0 (waited: 1941s)
In progress: 1, queued: 0 (waited: 1962s)
In progress: 1, queued: 0 (waited: 1983s)
In progress: 1, queued: 0 (waited: 2004s)
In progress: 1, queued: 0 (waited: 2025s)
In progress: 1, queued: 0 (waited: 2046s)
In progress: 1, queued: 0 (waited: 2067s)
In progress: 1, queued: 0 (waited: 2088s)
In progress: 1, queued: 0 (waited: 2109s)
In progress: 1, queued: 0 (waited: 2131s)
In progress: 1, queued: 0 (waited: 2152s)
In progress: 1, queued: 0 (waited: 2174s)
In progress: 1, queued: 0 (waited: 2195s)
In progress: 1, queued: 0 (waited: 2216s)
In progress: 1, queued: 0 (waited: 2237s)
In progress: 1, queued: 0 (waited: 2258s)
In progress: 1, queued: 0 (waited: 2279s)
In progress: 1, queued: 0 (waited: 2301s)
In progress: 1, queued: 0 (waited: 2322s)
In progress: 1, queued: 0 (waited: 2343s)
In progress: 1, queued: 0 (waited: 2364s)
In progress: 1, queued: 0 (waited: 2385s)
In progress: 1, queued: 0 (waited: 2406s)
In progress: 1, queued: 0 (waited: 2428s)
In progress: 1, queued: 0 (waited: 2449s)
In progress: 1, queued: 0 (waited: 2470s)
In progress: 1, queued: 0 (waited: 2491s)
In progress: 1, queued: 0 (waited: 2512s)
In progress: 1, queued: 0 (waited: 2533s)
In progress: 1, queued: 0 (waited: 2555s)
In progress: 1, queued: 0 (waited: 2577s)
In progress: 1, queued: 0 (waited: 2598s)
In progress: 1, queued: 0 (waited: 2620s)
In progress: 1, queued: 0 (waited: 2642s)
In progress: 1, queued: 0 (waited: 2663s)
In progress: 1, queued: 0 (waited: 2684s)
In progress: 1, queued: 0 (waited: 2705s)
In progress: 1, queued: 0 (waited: 2726s)
In progress: 1, queued: 0 (waited: 2747s)
In progress: 1, queued: 0 (waited: 2768s)
In progress: 1, queued: 0 (waited: 2789s)
In progress: 1, queued: 0 (waited: 2811s)
In progress: 1, queued: 0 (waited: 2832s)
In progress: 1, queued: 0 (waited: 2853s)
In progress: 1, queued: 0 (waited: 2874s)
In progress: 1, queued: 0 (waited: 2895s)
In progress: 1, queued: 0 (waited: 2917s)
In progress: 1, queued: 0 (waited: 2938s)
In progress: 1, queued: 0 (waited: 2960s)
In progress: 1, queued: 0 (waited: 2981s)
In progress: 1, queued: 0 (waited: 3003s)
In progress: 1, queued: 0 (waited: 3024s)
In progress: 1, queued: 0 (waited: 3045s)
In progress: 1, queued: 0 (waited: 3066s)
In progress: 1, queued: 0 (waited: 3088s)
In progress: 1, queued: 0 (waited: 3109s)
In progress: 1, queued: 0 (waited: 3130s)
In progress: 1, queued: 0 (waited: 3151s)
In progress: 1, queued: 0 (waited: 3172s)
In progress: 1, queued: 0 (waited: 3193s)
In progress: 1, queued: 0 (waited: 3215s)
In progress: 1, queued: 0 (waited: 3236s)
In progress: 1, queued: 0 (waited: 3257s)
In progress: 1, queued: 0 (waited: 3278s)
In progress: 20, queued: 12 (waited: 3300s)
In progress: 20, queued: 9 (waited: 3321s)
In progress: 20, queued: 9 (waited: 3342s)
In progress: 20, queued: 8 (waited: 3363s)
In progress: 20, queued: 8 (waited: 3385s)
In progress: 20, queued: 8 (waited: 3406s)
In progress: 20, queued: 8 (waited: 3427s)
In progress: 20, queued: 8 (waited: 3448s)
In progress: 20, queued: 8 (waited: 3469s)
In progress: 20, queued: 8 (waited: 3491s)
In progress: 20, queued: 8 (waited: 3512s)
In progress: 20, queued: 8 (waited: 3533s)
In progress: 20, queued: 8 (waited: 3554s)
In progress: 20, queued: 8 (waited: 3576s)
In progress: 20, queued: 8 (waited: 3597s)
In progress: 20, queued: 7 (waited: 3619s)
In progress: 20, queued: 6 (waited: 3640s)
In progress: 20, queued: 6 (waited: 3661s)
In progress: 20, queued: 6 (waited: 3682s)
In progress: 20, queued: 6 (waited: 3704s)
In progress: 20, queued: 6 (waited: 3725s)
In progress: 20, queued: 5 (waited: 3746s)
In progress: 20, queued: 5 (waited: 3767s)
In progress: 20, queued: 5 (waited: 3789s)
In progress: 20, queued: 5 (waited: 3810s)
In progress: 20, queued: 5 (waited: 3831s)
In progress: 20, queued: 5 (waited: 3852s)
In progress: 20, queued: 5 (waited: 3874s)
In progress: 20, queued: 5 (waited: 3895s)
In progress: 20, queued: 5 (waited: 3916s)
In progress: 20, queued: 5 (waited: 3937s)
In progress: 20, queued: 5 (waited: 3959s)
In progress: 20, queued: 4 (waited: 3980s)
In progress: 20, queued: 3 (waited: 4001s)
In progress: 19, queued: 3 (waited: 4022s)
In progress: 16, queued: 0 (waited: 4044s)
In progress: 15, queued: 0 (waited: 4065s)
In progress: 15, queued: 0 (waited: 4087s)
In progress: 15, queued: 0 (waited: 4108s)
In progress: 13, queued: 0 (waited: 4129s)
In progress: 12, queued: 0 (waited: 4151s)
In progress: 12, queued: 0 (waited: 4172s)
In progress: 12, queued: 0 (waited: 4193s)
In progress: 11, queued: 0 (waited: 4214s)
In progress: 11, queued: 0 (waited: 4236s)
In progress: 7, queued: 0 (waited: 4257s)
In progress: 6, queued: 0 (waited: 4278s)
In progress: 5, queued: 0 (waited: 4299s)
In progress: 5, queued: 0 (waited: 4320s)
In progress: 5, queued: 0 (waited: 4342s)
In progress: 5, queued: 0 (waited: 4363s)
In progress: 5, queued: 0 (waited: 4384s)
In progress: 5, queued: 0 (waited: 4405s)
In progress: 5, queued: 0 (waited: 4426s)
In progress: 5, queued: 0 (waited: 4448s)
In progress: 5, queued: 0 (waited: 4469s)
In progress: 5, queued: 0 (waited: 4490s)
In progress: 5, queued: 0 (waited: 4512s)
In progress: 5, queued: 0 (waited: 4533s)
In progress: 4, queued: 0 (waited: 4554s)
In progress: 4, queued: 0 (waited: 4575s)
In progress: 4, queued: 0 (waited: 4596s)
In progress: 2, queued: 0 (waited: 4618s)
In progress: 2, queued: 0 (waited: 4640s)
In progress: 2, queued: 0 (waited: 4661s)
In progress: 2, queued: 0 (waited: 4682s)
In progress: 2, queued: 0 (waited: 4703s)
In progress: 0, queued: 0 (waited: 4725s)
In progress: 0, queued: 0 (waited: 4746s)
In progress: 0, queued: 0 (waited: 4767s)
In progress: 0, queued: 0 (waited: 4788s)
In progress: 0, queued: 0 (waited: 4810s)
In progress: 0, queued: 0 (waited: 4831s)
In progress: 0, queued: 0 (waited: 4852s)
In progress: 0, queued: 0 (waited: 4873s)
In progress: 0, queued: 0 (waited: 4894s)
In progress: 0, queued: 0 (waited: 4916s)
In progress: 0, queued: 0 (waited: 4937s)
In progress: 0, queued: 0 (waited: 4958s)
In progress: 0, queued: 0 (waited: 4979s)
In progress: 5, queued: 0 (waited: 5000s)
In progress: 5, queued: 0 (waited: 5021s)
In progress: 5, queued: 0 (waited: 5043s)
In progress: 5, queued: 0 (waited: 5064s)
In progress: 5, queued: 0 (waited: 5085s)
In progress: 5, queued: 0 (waited: 5106s)
In progress: 5, queued: 0 (waited: 5127s)
In progress: 5, queued: 0 (waited: 5149s)
In progress: 5, queued: 0 (waited: 5170s)
In progress: 5, queued: 0 (waited: 5191s)
In progress: 5, queued: 0 (waited: 5212s)
In progress: 5, queued: 0 (waited: 5233s)
In progress: 5, queued: 0 (waited: 5255s)
In progress: 5, queued: 0 (waited: 5276s)
In progress: 5, queued: 0 (waited: 5297s)
In progress: 4, queued: 0 (waited: 5318s)
In progress: 4, queued: 0 (waited: 5339s)
In progress: 4, queued: 0 (waited: 5360s)
In progress: 3, queued: 0 (waited: 5381s)
In progress: 2, queued: 0 (waited: 5403s)
In progress: 1, queued: 0 (waited: 5424s)
In progress: 1, queued: 0 (waited: 5445s)
In progress: 1, queued: 0 (waited: 5466s)
In progress: 1, queued: 0 (waited: 5487s)
In progress: 1, queued: 0 (waited: 5509s)
In progress: 1, queued: 0 (waited: 5530s)
In progress: 1, queued: 0 (waited: 5554s)
In progress: 1, queued: 0 (waited: 5575s)
In progress: 1, queued: 0 (waited: 5597s)
In progress: 1, queued: 0 (waited: 5618s)
In progress: 1, queued: 0 (waited: 5639s)
In progress: 1, queued: 0 (waited: 5660s)
In progress: 0, queued: 0 (waited: 5681s)
In progress: 1, queued: 0 (waited: 5702s)
In progress: 1, queued: 0 (waited: 5724s)
In progress: 1, queued: 0 (waited: 5745s)
In progress: 1, queued: 0 (waited: 5766s)
In progress: 1, queued: 0 (waited: 5788s)
In progress: 1, queued: 0 (waited: 5809s)
In progress: 1, queued: 0 (waited: 5830s)
In progress: 1, queued: 0 (waited: 5852s)
In progress: 1, queued: 0 (waited: 5873s)
In progress: 1, queued: 0 (waited: 5894s)
In progress: 1, queued: 0 (waited: 5916s)
In progress: 1, queued: 0 (waited: 5937s)
In progress: 1, queued: 0 (waited: 5958s)
In progress: 1, queued: 0 (waited: 5979s)
In progress: 1, queued: 0 (waited: 6000s)
In progress: 1, queued: 0 (waited: 6022s)
In progress: 1, queued: 0 (waited: 6043s)
In progress: 1, queued: 0 (waited: 6064s)
In progress: 1, queued: 0 (waited: 6085s)
In progress: 1, queued: 0 (waited: 6106s)
In progress: 1, queued: 0 (waited: 6127s)
In progress: 1, queued: 0 (waited: 6148s)
In progress: 1, queued: 0 (waited: 6169s)
In progress: 1, queued: 0 (waited: 6191s)
In progress: 1, queued: 0 (waited: 6212s)
In progress: 1, queued: 0 (waited: 6233s)
In progress: 1, queued: 0 (waited: 6254s)
In progress: 1, queued: 0 (waited: 6276s)
In progress: 1, queued: 0 (waited: 6297s)
In progress: 1, queued: 0 (waited: 6318s)
In progress: 1, queued: 0 (waited: 6339s)
In progress: 1, queued: 0 (waited: 6360s)
In progress: 1, queued: 0 (waited: 6381s)
In progress: 1, queued: 0 (waited: 6403s)
In progress: 1, queued: 0 (waited: 6424s)
In progress: 1, queued: 0 (waited: 6445s)
In progress: 1, queued: 0 (waited: 6466s)
In progress: 1, queued: 0 (waited: 6487s)
In progress: 1, queued: 0 (waited: 6508s)
In progress: 1, queued: 0 (waited: 6529s)
In progress: 1, queued: 0 (waited: 6551s)
In progress: 1, queued: 0 (waited: 6572s)
In progress: 1, queued: 0 (waited: 6593s)
In progress: 1, queued: 0 (waited: 6614s)
In progress: 1, queued: 0 (waited: 6636s)
In progress: 1, queued: 0 (waited: 6657s)
In progress: 1, queued: 0 (waited: 6679s)
In progress: 1, queued: 0 (waited: 6700s)
In progress: 1, queued: 0 (waited: 6721s)
In progress: 1, queued: 0 (waited: 6742s)
In progress: 1, queued: 0 (waited: 6763s)
In progress: 1, queued: 0 (waited: 6784s)
In progress: 1, queued: 0 (waited: 6806s)
In progress: 1, queued: 0 (waited: 6827s)
In progress: 1, queued: 0 (waited: 6848s)
In progress: 1, queued: 0 (waited: 6869s)
In progress: 1, queued: 0 (waited: 6890s)
In progress: 1, queued: 0 (waited: 6912s)
In progress: 1, queued: 0 (waited: 6933s)
In progress: 1, queued: 0 (waited: 6954s)
In progress: 1, queued: 0 (waited: 6976s)
In progress: 1, queued: 0 (waited: 6997s)
In progress: 1, queued: 0 (waited: 7019s)
In progress: 1, queued: 0 (waited: 7040s)
In progress: 1, queued: 0 (waited: 7061s)
In progress: 1, queued: 0 (waited: 7082s)
In progress: 1, queued: 0 (waited: 7103s)
In progress: 1, queued: 0 (waited: 7124s)
In progress: 1, queued: 0 (waited: 7146s)
In progress: 2, queued: 0 (waited: 7167s)
In progress: 2, queued: 0 (waited: 7188s)
In progress: 2, queued: 0 (waited: 7209s)
In progress: 2, queued: 0 (waited: 7230s)
In progress: 2, queued: 0 (waited: 7252s)
In progress: 2, queued: 0 (waited: 7273s)
In progress: 2, queued: 0 (waited: 7294s)
In progress: 2, queued: 0 (waited: 7315s)
In progress: 2, queued: 0 (waited: 7336s)
In progress: 2, queued: 0 (waited: 7358s)
In progress: 1, queued: 0 (waited: 7379s)
In progress: 1, queued: 0 (waited: 7400s)
In progress: 1, queued: 0 (waited: 7422s)
In progress: 1, queued: 0 (waited: 7443s)
In progress: 0, queued: 0 (waited: 7464s)
In progress: 0, queued: 0 (waited: 7485s)
In progress: 0, queued: 0 (waited: 7506s)
In progress: 0, queued: 0 (waited: 7527s)
In progress: 0, queued: 0 (waited: 7548s)
In progress: 0, queued: 0 (waited: 7570s)
In progress: 0, queued: 0 (waited: 7591s)
In progress: 0, queued: 0 (waited: 7612s)
In progress: 0, queued: 0 (waited: 7633s)
In progress: 0, queued: 0 (waited: 7654s)
In progress: 0, queued: 0 (waited: 7675s)
In progress: 0, queued: 0 (waited: 7696s)
In progress: 0, queued: 0 (waited: 7718s)
In progress: 0, queued: 0 (waited: 7739s)
Predictions
Once the model is ready we can pass the predictions file through the model to get a feel for how the the predictions look.
For every puchase DataRobot will output the class probabilities for each of the 500 most common products.
From here we can take the top n probabilities and suggestions for the recommendations. In this example we will look at the top 3 products to recommend for an incoming user.
In [16]:
predict_dataset = test.sample(10)
In [17]:
model = dr.ModelRecommendation.get(
project.id, dr.enums.RECOMMENDED_MODEL_TYPE.RECOMMENDED_FOR_DEPLOYMENT
).get_model()
dataset = project.upload_dataset(predict_dataset)
pred_job = model.request_predictions(dataset.id)
preds = pred_job.get_result_when_complete()
In [18]:
preds = preds[[c for c in preds.columns if "class" in c]]
preds.columns = [c.replace("class_", "") for c in preds.columns]
In [19]:
top_recommendations = pd.DataFrame(
preds.apply(
lambda x: list(preds.columns[np.array(x).argsort()[::-1][:3]]), axis=1
).to_list(),
columns=["recommendation 1", "recommendation 2", "recommendation 3"],
)
output = pd.concat(
[
predict_dataset[["user_id", "order_time"]].reset_index(drop="True"),
top_recommendations,
],
axis=1,
)
Below we see two datasets. The first one, predict_dataset
that is the one that we used to pass to the DataRobot Model. The second one, output
contains the original order times for each user id, along with the associated recommendations for each of those customers.
In [20]:
predict_dataset
Out[20]:
user_id | order_time | |
---|---|---|
356 | 14315 | 2015-03-15 |
123 | 4892 | 2015-03-15 |
212 | 8619 | 2015-03-15 |
962 | 39036 | 2015-03-15 |
98 | 3572 | 2015-03-15 |
692 | 27200 | 2015-03-15 |
23 | 692 | 2015-03-15 |
258 | 10860 | 2015-03-15 |
986 | 39997 | 2015-03-15 |
522 | 20748 | 2015-03-15 |
In [21]:
output
Out[21]:
user_id | order_time | recommendation 1 | recommendation 2 | recommendation 3 | |
---|---|---|---|---|---|
0 | 14315 | 2015-03-15 | Original Whipped Cream Cheese | Banana | Sour Cream |
1 | 4892 | 2015-03-15 | Total 2% Lowfat Greek Strained Yogurt With Blu… | Organic Avocado | Marinara Sauce |
2 | 8619 | 2015-03-15 | Organic Strawberries | Total 0% Nonfat Plain Greek Yogurt | Organic Baby Arugula |
3 | 39036 | 2015-03-15 | Tilapia Filet | Organic Garnet Sweet Potato (Yam) | Banana |
4 | 3572 | 2015-03-15 | Lemon Fruit & Nut Food Bar | Organic Baby Spinach | Bag of Organic Bananas |
5 | 27200 | 2015-03-15 | Grapefruit Sparkling Water | Banana | Marinara Sauce |
6 | 692 | 2015-03-15 | Banana | Organic Red Onion | Red Vine Tomato |
7 | 10860 | 2015-03-15 | Bag of Organic Bananas | Organic Hass Avocado | Organic Lowfat 1% Milk |
8 | 39997 | 2015-03-15 | Original Whipped Cream Cheese | Banana | Blueberries |
9 | 20748 | 2015-03-15 | Golden Delicious Apple | Bartlett Pears | Banana |
Experience new features and capabilities previously only available in our full AI Platform product.
Get Started with Recommendation Engine
Explore more AI Accelerators
-
HorizontalObject Classification on Video with DataRobot Visual AI
This AI Accelerator demonstrates how deep learning model trained and deployed with DataRobot platform can be used for object detection on the video stream (detection if person in front of camera wears glasses).
Learn More -
HorizontalPrediction Intervals via Conformal Inference
This AI Accelerator demonstrates various ways for generating prediction intervals for any DataRobot model. The methods presented here are rooted in the area of conformal inference (also known as conformal prediction).
Learn More -
HorizontalReinforcement Learning in DataRobot
In this notebook, we implement a very simple model based on the Q-learning algorithm. This notebook is intended to show a basic form of RL that doesn't require a deep understanding of neural networks or advanced mathematics and how one might deploy such a model in DataRobot.
Learn More -
HorizontalDimensionality Reduction in DataRobot Using t-SNE
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a powerful technique for dimensionality reduction that can effectively visualize high-dimensional data in a lower-dimensional space.
Learn More
-
HorizontalObject Classification on Video with DataRobot Visual AI
This AI Accelerator demonstrates how deep learning model trained and deployed with DataRobot platform can be used for object detection on the video stream (detection if person in front of camera wears glasses).
Learn More -
HorizontalPrediction Intervals via Conformal Inference
This AI Accelerator demonstrates various ways for generating prediction intervals for any DataRobot model. The methods presented here are rooted in the area of conformal inference (also known as conformal prediction).
Learn More -
HorizontalReinforcement Learning in DataRobot
In this notebook, we implement a very simple model based on the Q-learning algorithm. This notebook is intended to show a basic form of RL that doesn't require a deep understanding of neural networks or advanced mathematics and how one might deploy such a model in DataRobot.
Learn More -
HorizontalDimensionality Reduction in DataRobot Using t-SNE
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a powerful technique for dimensionality reduction that can effectively visualize high-dimensional data in a lower-dimensional space.
Learn More