Recommendation Engine with the DataRobot MultiLabel

Retail

This notebook will explore how to use historical user purchase data in order to create a recommendation model which will attempt to guess which products out of a basket of items the customer will be likely to purchase at a given point in time.

Request a Demo

At DataRobot, Multilabel modelling is a kind of classification task that, while similar to multiclass modelling, provides more flexibility. In multi-label modelling, each row in a dataset is associated with one or several labels. Extending this framework in our unlimited label mode and paired with feature discovery, allow the user frame a model that can be used to serve recommendations. Given the use case, this recommendation model can provide rank ordered suggestions of content, product, or services that a specific customer might like.

As an example, if we have historic purchases of a sample of customers, we can look at common spending habits across demographics and shopping baskets, identify new features and able to rank order anticipated items at the customer level. Some of the features automatically generated might be most common category of item per a specific geography, or the degree of a customers proclivity to try new things.

In [1]:
import json

import numpy as np
import pandas as pd

Load in relevant datasets

In [2]:

orders = pd.read_csv(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/orders.csv",
    parse_dates=["order_time"],
    infer_datetime_format=True,
)
order_products = pd.read_csv(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/order_products.csv"
)
test = pd.read_csv(
    "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/instacart/test.csv"
)
In [3]:

orders.head()

Out [3]:

order_iduser_idorder_time
01913204792015-01-01 12:00:00
13189274792015-01-09 2:00:00
22667939792015-01-17 19:00:00
32246974792015-01-31 11:00:00
41390523792015-02-21 13:00:00

Products

This dataset contains the products, inclusing the product details that made up all the orders. The total amount of orders in the dataset is 22064 and as an example, below is what a products from a couple of orders look like:

In [4]:
order_products.order_id = order_products.order_id.astype(int)
print("total_number of orders: {}".format(order_products.order_id.nunique()))
order_products[order_products.order_id.isin([518, 4418.0])].sort_values("order_id")

total_number of orders: 22064

Out[4]:

total_number of orders: 22064

Out[4]:

order_idreorderedorder_product_idproduct_nameaisledepartment
949915181518_1Organic Avocadofresh fruitsproduce
840025181518_5Bag of Organic Bananasfresh fruitsproduce
1374775181518_3Smoked Fresh Turkey Kielbasahot dogs bacon sausagemeat seafood
435125180518_7Carrotsfresh vegetablesproduce
2031345181518_2Organic Extra Firm Tofutofu meat alternativesdeli
594075181518_4Organic Zucchinifresh vegetablesproduce
1360055180518_6Beef Short Ribsmeat countermeat seafood
575765181518_8Russet Potatofresh vegetablesproduce
141401441814418_12Gluten Free Millet-Chia Breadbreadbakery
145245441804418_16Gluten free Sesame Bagelsbreakfast bakerybakery
1002441814418_11Organic Grade A Free Range Large Brown Eggseggsdairy eggs
196537441814418_13Direct Trade Organic El Gallo Breakfast Blend …coffeebeverages
225315441814418_6Light and Lean Quinoa Black Beans with Buttern…frozen mealsfrozen
226139441804418_17Organic Gluten Free Non-Dairy Beans & Rice Bur…frozen mealsfrozen
227244441804418_18Vegetable Lasagnafrozen mealsfrozen
166939441804418_22Dark Chocolate Sea Salt Caramelscandy chocolatesnacks
146465441814418_15Blueberry Cinnamon Breadbakery dessertsbakery
87980441814418_2Organic Hass Avocadofresh fruitsproduce
231478441804418_20Gluten Free Apple Cinnamon Wafflesfrozen breakfastfrozen
80888441804418_14Organic Spring Mixpackaged vegetables fruitsproduce
73518441814418_7Watermelon Chunkspackaged vegetables fruitsproduce
72267441804418_1Organic Raspberriespackaged vegetables fruitsproduce
55016441814418_10Red Peppersfresh vegetablesproduce
54610441804418_19Organic Roma Tomatofresh vegetablesproduce
48179441814418_9Green Bell Pepperfresh vegetablesproduce
46515441814418_3Asparagusfresh vegetablesproduce
38967441804418_23Whipped Cream Cheeseother creams cheesesdairy eggs
12282441814418_8Strawberry Rhubarb Yoghurtyogurtdairy eggs
7733441814418_4Raspberry Yoghurtyogurtdairy eggs
89873441804418_21Bananafresh fruitsproduce
231487441814418_5Gluten Free Tofu Scramble Breakfast Wrapfrozen breakfastfrozen

Orders

This dataset contains the orders along with the order time and the corresponding user_id of who-ever placed the order. Below is a sample of orders from March 1st in this example dataset.

In [5]:

orders.loc[orders.order_time.dt.strftime("%Y-%m-%d") == "2015-03-01"].head(10)

Out[5]:

order_iduser_idorder_time
5330000011802015-03-01 07:00:00
17816722063112015-03-01 22:00:00
26017852226062015-03-01 04:00:00
4358228808002015-03-01 13:00:00
59135993711692015-03-01 00:00:00
61944400211992015-03-01 02:00:00
949239667418662015-03-01 02:00:00
169222810731902015-03-01 18:00:00
1796103607233692015-03-01 11:00:00
1963142801436592015-03-01 19:00:00

Primary Training Data

Most common products

The next thing that we’ll need to do is create the train dataset. The target for each row in the train dataset will contain which of the top most common global items that this customer has purchased on a specific date. This list of the top most common global items will be the pool of the items to recommend. Rationale here is that rare items that are purchased rarely will be difficult to recommend so we will omit them from modelling. Below is a calculation to calculate the most common items purchased from the order_products dataset

In [6]:

order_counts = ( 
  pd.DataFrame(order_products.groupby("product_name").size(), columns=["count"]) 
  .reset_index() 
  .sort_values("count", ascending=False) 
  .reset_index() 
) 
order_counts.head(10)

Out[6]:

indexproduct_namecount
01201Banana5124
11118Bag of Organic Bananas3975
211241Organic Strawberries2202
39553Organic Baby Spinach1957
410348Organic Hass Avocado1842
59535Organic Avocado1529
615039Strawberries1130
711596Organic Yellow Onion1088
811612Organic Zucchini1082
97408Large Lemon1066

For this example we will set the max number of items to 500. In a multilabel framework the maximum distinct number of classes DataRobot can track is 1000.

In [7]:

max_number_of_items = 500
frequent_orders = order_counts.product_name.head(max_number_of_items).to_list()

print(frequent_orders[0:9])
['Banana', 'Bag of Organic Bananas', 'Organic Strawberries', 'Organic Baby Spinach', 'Organic Hass Avocado', 'Organic Avocado', 'Strawberries', 'Organic Yellow Onion', 'Organic Zucchini']

DataRobot MultiLabel

Multilabel modeling is a kind of classification task that, while similar to multiclass modeling, provides more flexibility. In multilabel modeling, each row in a dataset is associated with one, several, or zero labels. One common multilabel classification problem is text categorization (e.g., a movie description can include both “Crime”, “Drama” and “Black and White”).

More information: https://docs.datarobot.com/en/docs/modeling/special-workflows/multilabel.html#multilabel-modeling

In this case we will use this framework for the model to pull out the relevant most common items in each basket for the customer and use this as a framework. So if a person baught Bananas, Strawberries and garbage bags, the target variale for that user would be [‘Banana’, ‘Strawberries’].

In [8]:

multi_label_product_training = orders.copy()
multi_label_product_training["items"] = "[]"
for row in multi_label_product_training.itertuples():
    order_id = row.order_id
    order_data = order_products.loc[order_products.order_id == order_id]
    items = order_data.loc[order_data.product_name.isin(frequent_orders)][
        "product_name"
    ].to_list()
    multi_label_product_training.loc[row.Index, "items"] = json.dumps(items)
multi_label_product_training = multi_label_product_training.drop("order_id", axis=1)
multi_label_product_training.order_time = (
    multi_label_product_training.order_time.dt.date
)
multi_label_product_training.user_id = multi_label_product_training.user_id.astype(int)
In [9]:

multi_label_product_training.head(10)

Out[9]:

user_idorder_timeitems
0792015-01-01[“Organic Extra Large Grade AA Brown Eggs”, “O…
1792015-01-09[“Organic Extra Large Grade AA Brown Eggs”, “O…
2792015-01-17[“Organic Extra Large Grade AA Brown Eggs”, “W…
3792015-01-31[“Organic Greek Plain Nonfat Yogurt”, “Organic…
4792015-02-21[“Organic Extra Large Grade AA Brown Eggs”, “O…
5792015-03-24[“Organic Greek Plain Nonfat Yogurt”, “Shredde…
6792015-04-23[“Organic Extra Large Grade AA Brown Eggs”, “T…
7792015-05-24[]
8802015-01-01[“Bag of Organic Bananas”]
9802015-01-22[“Hass Avocado”, “Seedless Red Grapes”, “Bag o…

DataRobot API

In order to pass all of the datasets to DataRobot, we will use the feature discovery technique to allow DataRobot to automatically generate the necessary features for us. For more information about Feature Discovery please refer to:

Mastering many tables in production ML Accelerator: https://community.datarobot.com/t5/ai-accelerators/mastering-many-tables-in-production-ml-complete-workflow/td-p/15905

Feature Discovery Documentation: https://docs.datarobot.com/en/docs/data/transform-data/feature-discovery/index.html#feature-discovery

In [10]:

import datarobot as dr
from datarobot.utils import dataframe_to_buffer

dr.Client()
Out [10]:

<datarobot.rest.RESTClientObject at 0x7f4875b958d0>

Upload Datasets to AI Catalog

In [11]:

buff = dataframe_to_buffer(multi_label_product_training)
buff.name = "multi_label_product_training_dr"
multi_label_product_training_dr = dr.Dataset.create_from_file(filelike=buff)

buff = dataframe_to_buffer(order_products)
buff.name = "order_products_dr"
order_products_dr = dr.Dataset.create_from_file(filelike=buff)

buff = dataframe_to_buffer(orders)
buff.name = "orders_dr"
orders_dr = dr.Dataset.create_from_file(filelike=buff)

Define Relationships

In [12]:

dataset_definitions = [
    {
        "identifier": "orders_dr",
        "catalogVersionId": orders_dr.version_id,
        "catalogId": orders_dr.id,
        "snapshotPolicy": "latest",
        "primary_temporal_key": "order_time",
    },
    {
        "identifier": "order_products_dr",
        "catalogVersionId": order_products_dr.version_id,
        "catalogId": order_products_dr.id,
        "snapshotPolicy": "latest",
    },
]
In [13]:

relationships = [
    {
        "dataset2Identifier": "orders_dr",
        "dataset1Keys": ["user_id"],
        "dataset2Keys": ["user_id"],
        "feature_derivation_window_start": -31,
        "feature_derivation_window_end": -1,
        "feature_derivation_window_time_unit": "DAY",
        "prediction_point_rounding": 1,
        "prediction_point_rounding_time_unit": "MINUTE",
    },
    {
        "dataset1Identifier": "orders_dr",
        "dataset2Identifier": "order_products_dr",
        "dataset1Keys": ["order_id"],
        "dataset2Keys": ["order_id"],
    },
]


relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions, relationships=relationships
)

Start Project

In [14]:

project = dr.Project.create_from_dataset(
    multi_label_product_training_dr.id,
    project_name="AI Accelerator - Recommendation Engine",
)

project.analyze_and_model(
    target="items",
    relationships_configuration_id=relationship_config.id,
    partitioning_method=dr.GroupCV(
        holdout_pct=20, reps=5, partition_key_cols=["user_id"]
    ),
    # metric='MAPE',
    feature_engineering_prediction_point="order_time",
    mode=dr.enums.AUTOPILOT_MODE.QUICK,
    max_wait=36000,
    worker_count=-1,
)
Out [14]:
Project(AI Accelerator - Recommendation Engine)
In [15]:
project.wait_for_autopilot()
In progress: 14, queued: 0 (waited: 0s)
In progress: 14, queued: 0 (waited: 1s)
In progress: 14, queued: 0 (waited: 3s)
In progress: 14, queued: 0 (waited: 4s)
In progress: 14, queued: 0 (waited: 6s)
In progress: 14, queued: 0 (waited: 9s)
In progress: 14, queued: 0 (waited: 13s)
In progress: 14, queued: 0 (waited: 21s)
In progress: 13, queued: 0 (waited: 35s)
In progress: 13, queued: 0 (waited: 57s)
In progress: 13, queued: 0 (waited: 78s)
In progress: 13, queued: 0 (waited: 99s)
In progress: 13, queued: 0 (waited: 120s)
In progress: 13, queued: 0 (waited: 141s)
In progress: 13, queued: 0 (waited: 162s)
In progress: 13, queued: 0 (waited: 184s)
In progress: 13, queued: 0 (waited: 205s)
In progress: 13, queued: 0 (waited: 226s)
In progress: 13, queued: 0 (waited: 247s)
In progress: 13, queued: 0 (waited: 268s)
In progress: 13, queued: 0 (waited: 289s)
In progress: 13, queued: 0 (waited: 310s)
In progress: 13, queued: 0 (waited: 331s)
In progress: 13, queued: 0 (waited: 352s)
In progress: 10, queued: 0 (waited: 374s)
In progress: 10, queued: 0 (waited: 395s)
In progress: 8, queued: 0 (waited: 416s)
In progress: 8, queued: 0 (waited: 437s)
In progress: 8, queued: 0 (waited: 458s)
In progress: 8, queued: 0 (waited: 479s)
In progress: 8, queued: 0 (waited: 500s)
In progress: 8, queued: 0 (waited: 521s)
In progress: 8, queued: 0 (waited: 542s)
In progress: 8, queued: 0 (waited: 563s)
In progress: 7, queued: 0 (waited: 584s)
In progress: 7, queued: 0 (waited: 605s)
In progress: 6, queued: 0 (waited: 627s)
In progress: 6, queued: 0 (waited: 648s)
In progress: 6, queued: 0 (waited: 669s)
In progress: 6, queued: 0 (waited: 691s)
In progress: 6, queued: 0 (waited: 712s)
In progress: 6, queued: 0 (waited: 733s)
In progress: 6, queued: 0 (waited: 754s)
In progress: 6, queued: 0 (waited: 775s)
In progress: 6, queued: 0 (waited: 796s)
In progress: 6, queued: 0 (waited: 817s)
In progress: 6, queued: 0 (waited: 838s)
In progress: 6, queued: 0 (waited: 860s)
In progress: 5, queued: 0 (waited: 881s)
In progress: 3, queued: 0 (waited: 902s)
In progress: 3, queued: 0 (waited: 923s)
In progress: 3, queued: 0 (waited: 944s)
In progress: 3, queued: 0 (waited: 965s)
In progress: 3, queued: 0 (waited: 987s)
In progress: 3, queued: 0 (waited: 1008s)
In progress: 3, queued: 0 (waited: 1029s)
In progress: 3, queued: 0 (waited: 1050s)
In progress: 3, queued: 0 (waited: 1072s)
In progress: 3, queued: 0 (waited: 1093s)
In progress: 3, queued: 0 (waited: 1114s)
In progress: 3, queued: 0 (waited: 1135s)
In progress: 3, queued: 0 (waited: 1156s)
In progress: 3, queued: 0 (waited: 1177s)
In progress: 3, queued: 0 (waited: 1199s)
In progress: 3, queued: 0 (waited: 1220s)
In progress: 3, queued: 0 (waited: 1241s)
In progress: 3, queued: 0 (waited: 1262s)
In progress: 3, queued: 0 (waited: 1284s)
In progress: 3, queued: 0 (waited: 1305s)
In progress: 3, queued: 0 (waited: 1326s)
In progress: 3, queued: 0 (waited: 1347s)
In progress: 3, queued: 0 (waited: 1368s)
In progress: 3, queued: 0 (waited: 1389s)
In progress: 3, queued: 0 (waited: 1410s)
In progress: 3, queued: 0 (waited: 1432s)
In progress: 3, queued: 0 (waited: 1453s)
In progress: 3, queued: 0 (waited: 1474s)
In progress: 3, queued: 0 (waited: 1495s)
In progress: 3, queued: 0 (waited: 1516s)
In progress: 3, queued: 0 (waited: 1538s)
In progress: 3, queued: 0 (waited: 1560s)
In progress: 3, queued: 0 (waited: 1581s)
In progress: 3, queued: 0 (waited: 1602s)
In progress: 3, queued: 0 (waited: 1623s)
In progress: 3, queued: 0 (waited: 1644s)
In progress: 3, queued: 0 (waited: 1665s)
In progress: 3, queued: 0 (waited: 1687s)
In progress: 3, queued: 0 (waited: 1708s)
In progress: 3, queued: 0 (waited: 1729s)
In progress: 3, queued: 0 (waited: 1750s)
In progress: 3, queued: 0 (waited: 1772s)
In progress: 2, queued: 0 (waited: 1793s)
In progress: 2, queued: 0 (waited: 1814s)
In progress: 2, queued: 0 (waited: 1835s)
In progress: 2, queued: 0 (waited: 1856s)
In progress: 2, queued: 0 (waited: 1877s)
In progress: 2, queued: 0 (waited: 1898s)
In progress: 1, queued: 0 (waited: 1919s)
In progress: 1, queued: 0 (waited: 1941s)
In progress: 1, queued: 0 (waited: 1962s)
In progress: 1, queued: 0 (waited: 1983s)
In progress: 1, queued: 0 (waited: 2004s)
In progress: 1, queued: 0 (waited: 2025s)
In progress: 1, queued: 0 (waited: 2046s)
In progress: 1, queued: 0 (waited: 2067s)
In progress: 1, queued: 0 (waited: 2088s)
In progress: 1, queued: 0 (waited: 2109s)
In progress: 1, queued: 0 (waited: 2131s)
In progress: 1, queued: 0 (waited: 2152s)
In progress: 1, queued: 0 (waited: 2174s)
In progress: 1, queued: 0 (waited: 2195s)
In progress: 1, queued: 0 (waited: 2216s)
In progress: 1, queued: 0 (waited: 2237s)
In progress: 1, queued: 0 (waited: 2258s)
In progress: 1, queued: 0 (waited: 2279s)
In progress: 1, queued: 0 (waited: 2301s)
In progress: 1, queued: 0 (waited: 2322s)
In progress: 1, queued: 0 (waited: 2343s)
In progress: 1, queued: 0 (waited: 2364s)
In progress: 1, queued: 0 (waited: 2385s)
In progress: 1, queued: 0 (waited: 2406s)
In progress: 1, queued: 0 (waited: 2428s)
In progress: 1, queued: 0 (waited: 2449s)
In progress: 1, queued: 0 (waited: 2470s)
In progress: 1, queued: 0 (waited: 2491s)
In progress: 1, queued: 0 (waited: 2512s)
In progress: 1, queued: 0 (waited: 2533s)
In progress: 1, queued: 0 (waited: 2555s)
In progress: 1, queued: 0 (waited: 2577s)
In progress: 1, queued: 0 (waited: 2598s)
In progress: 1, queued: 0 (waited: 2620s)
In progress: 1, queued: 0 (waited: 2642s)
In progress: 1, queued: 0 (waited: 2663s)
In progress: 1, queued: 0 (waited: 2684s)
In progress: 1, queued: 0 (waited: 2705s)
In progress: 1, queued: 0 (waited: 2726s)
In progress: 1, queued: 0 (waited: 2747s)
In progress: 1, queued: 0 (waited: 2768s)
In progress: 1, queued: 0 (waited: 2789s)
In progress: 1, queued: 0 (waited: 2811s)
In progress: 1, queued: 0 (waited: 2832s)
In progress: 1, queued: 0 (waited: 2853s)
In progress: 1, queued: 0 (waited: 2874s)
In progress: 1, queued: 0 (waited: 2895s)
In progress: 1, queued: 0 (waited: 2917s)
In progress: 1, queued: 0 (waited: 2938s)
In progress: 1, queued: 0 (waited: 2960s)
In progress: 1, queued: 0 (waited: 2981s)
In progress: 1, queued: 0 (waited: 3003s)
In progress: 1, queued: 0 (waited: 3024s)
In progress: 1, queued: 0 (waited: 3045s)
In progress: 1, queued: 0 (waited: 3066s)
In progress: 1, queued: 0 (waited: 3088s)
In progress: 1, queued: 0 (waited: 3109s)
In progress: 1, queued: 0 (waited: 3130s)
In progress: 1, queued: 0 (waited: 3151s)
In progress: 1, queued: 0 (waited: 3172s)
In progress: 1, queued: 0 (waited: 3193s)
In progress: 1, queued: 0 (waited: 3215s)
In progress: 1, queued: 0 (waited: 3236s)
In progress: 1, queued: 0 (waited: 3257s)
In progress: 1, queued: 0 (waited: 3278s)
In progress: 20, queued: 12 (waited: 3300s)
In progress: 20, queued: 9 (waited: 3321s)
In progress: 20, queued: 9 (waited: 3342s)
In progress: 20, queued: 8 (waited: 3363s)
In progress: 20, queued: 8 (waited: 3385s)
In progress: 20, queued: 8 (waited: 3406s)
In progress: 20, queued: 8 (waited: 3427s)
In progress: 20, queued: 8 (waited: 3448s)
In progress: 20, queued: 8 (waited: 3469s)
In progress: 20, queued: 8 (waited: 3491s)
In progress: 20, queued: 8 (waited: 3512s)
In progress: 20, queued: 8 (waited: 3533s)
In progress: 20, queued: 8 (waited: 3554s)
In progress: 20, queued: 8 (waited: 3576s)
In progress: 20, queued: 8 (waited: 3597s)
In progress: 20, queued: 7 (waited: 3619s)
In progress: 20, queued: 6 (waited: 3640s)
In progress: 20, queued: 6 (waited: 3661s)
In progress: 20, queued: 6 (waited: 3682s)
In progress: 20, queued: 6 (waited: 3704s)
In progress: 20, queued: 6 (waited: 3725s)
In progress: 20, queued: 5 (waited: 3746s)
In progress: 20, queued: 5 (waited: 3767s)
In progress: 20, queued: 5 (waited: 3789s)
In progress: 20, queued: 5 (waited: 3810s)
In progress: 20, queued: 5 (waited: 3831s)
In progress: 20, queued: 5 (waited: 3852s)
In progress: 20, queued: 5 (waited: 3874s)
In progress: 20, queued: 5 (waited: 3895s)
In progress: 20, queued: 5 (waited: 3916s)
In progress: 20, queued: 5 (waited: 3937s)
In progress: 20, queued: 5 (waited: 3959s)
In progress: 20, queued: 4 (waited: 3980s)
In progress: 20, queued: 3 (waited: 4001s)
In progress: 19, queued: 3 (waited: 4022s)
In progress: 16, queued: 0 (waited: 4044s)
In progress: 15, queued: 0 (waited: 4065s)
In progress: 15, queued: 0 (waited: 4087s)
In progress: 15, queued: 0 (waited: 4108s)
In progress: 13, queued: 0 (waited: 4129s)
In progress: 12, queued: 0 (waited: 4151s)
In progress: 12, queued: 0 (waited: 4172s)
In progress: 12, queued: 0 (waited: 4193s)
In progress: 11, queued: 0 (waited: 4214s)
In progress: 11, queued: 0 (waited: 4236s)
In progress: 7, queued: 0 (waited: 4257s)
In progress: 6, queued: 0 (waited: 4278s)
In progress: 5, queued: 0 (waited: 4299s)
In progress: 5, queued: 0 (waited: 4320s)
In progress: 5, queued: 0 (waited: 4342s)
In progress: 5, queued: 0 (waited: 4363s)
In progress: 5, queued: 0 (waited: 4384s)
In progress: 5, queued: 0 (waited: 4405s)
In progress: 5, queued: 0 (waited: 4426s)
In progress: 5, queued: 0 (waited: 4448s)
In progress: 5, queued: 0 (waited: 4469s)
In progress: 5, queued: 0 (waited: 4490s)
In progress: 5, queued: 0 (waited: 4512s)
In progress: 5, queued: 0 (waited: 4533s)
In progress: 4, queued: 0 (waited: 4554s)
In progress: 4, queued: 0 (waited: 4575s)
In progress: 4, queued: 0 (waited: 4596s)
In progress: 2, queued: 0 (waited: 4618s)
In progress: 2, queued: 0 (waited: 4640s)
In progress: 2, queued: 0 (waited: 4661s)
In progress: 2, queued: 0 (waited: 4682s)
In progress: 2, queued: 0 (waited: 4703s)
In progress: 0, queued: 0 (waited: 4725s)
In progress: 0, queued: 0 (waited: 4746s)
In progress: 0, queued: 0 (waited: 4767s)
In progress: 0, queued: 0 (waited: 4788s)
In progress: 0, queued: 0 (waited: 4810s)
In progress: 0, queued: 0 (waited: 4831s)
In progress: 0, queued: 0 (waited: 4852s)
In progress: 0, queued: 0 (waited: 4873s)
In progress: 0, queued: 0 (waited: 4894s)
In progress: 0, queued: 0 (waited: 4916s)
In progress: 0, queued: 0 (waited: 4937s)
In progress: 0, queued: 0 (waited: 4958s)
In progress: 0, queued: 0 (waited: 4979s)
In progress: 5, queued: 0 (waited: 5000s)
In progress: 5, queued: 0 (waited: 5021s)
In progress: 5, queued: 0 (waited: 5043s)
In progress: 5, queued: 0 (waited: 5064s)
In progress: 5, queued: 0 (waited: 5085s)
In progress: 5, queued: 0 (waited: 5106s)
In progress: 5, queued: 0 (waited: 5127s)
In progress: 5, queued: 0 (waited: 5149s)
In progress: 5, queued: 0 (waited: 5170s)
In progress: 5, queued: 0 (waited: 5191s)
In progress: 5, queued: 0 (waited: 5212s)
In progress: 5, queued: 0 (waited: 5233s)
In progress: 5, queued: 0 (waited: 5255s)
In progress: 5, queued: 0 (waited: 5276s)
In progress: 5, queued: 0 (waited: 5297s)
In progress: 4, queued: 0 (waited: 5318s)
In progress: 4, queued: 0 (waited: 5339s)
In progress: 4, queued: 0 (waited: 5360s)
In progress: 3, queued: 0 (waited: 5381s)
In progress: 2, queued: 0 (waited: 5403s)
In progress: 1, queued: 0 (waited: 5424s)
In progress: 1, queued: 0 (waited: 5445s)
In progress: 1, queued: 0 (waited: 5466s)
In progress: 1, queued: 0 (waited: 5487s)
In progress: 1, queued: 0 (waited: 5509s)
In progress: 1, queued: 0 (waited: 5530s)
In progress: 1, queued: 0 (waited: 5554s)
In progress: 1, queued: 0 (waited: 5575s)
In progress: 1, queued: 0 (waited: 5597s)
In progress: 1, queued: 0 (waited: 5618s)
In progress: 1, queued: 0 (waited: 5639s)
In progress: 1, queued: 0 (waited: 5660s)
In progress: 0, queued: 0 (waited: 5681s)
In progress: 1, queued: 0 (waited: 5702s)
In progress: 1, queued: 0 (waited: 5724s)
In progress: 1, queued: 0 (waited: 5745s)
In progress: 1, queued: 0 (waited: 5766s)
In progress: 1, queued: 0 (waited: 5788s)
In progress: 1, queued: 0 (waited: 5809s)
In progress: 1, queued: 0 (waited: 5830s)
In progress: 1, queued: 0 (waited: 5852s)
In progress: 1, queued: 0 (waited: 5873s)
In progress: 1, queued: 0 (waited: 5894s)
In progress: 1, queued: 0 (waited: 5916s)
In progress: 1, queued: 0 (waited: 5937s)
In progress: 1, queued: 0 (waited: 5958s)
In progress: 1, queued: 0 (waited: 5979s)
In progress: 1, queued: 0 (waited: 6000s)
In progress: 1, queued: 0 (waited: 6022s)
In progress: 1, queued: 0 (waited: 6043s)
In progress: 1, queued: 0 (waited: 6064s)
In progress: 1, queued: 0 (waited: 6085s)
In progress: 1, queued: 0 (waited: 6106s)
In progress: 1, queued: 0 (waited: 6127s)
In progress: 1, queued: 0 (waited: 6148s)
In progress: 1, queued: 0 (waited: 6169s)
In progress: 1, queued: 0 (waited: 6191s)
In progress: 1, queued: 0 (waited: 6212s)
In progress: 1, queued: 0 (waited: 6233s)
In progress: 1, queued: 0 (waited: 6254s)
In progress: 1, queued: 0 (waited: 6276s)
In progress: 1, queued: 0 (waited: 6297s)
In progress: 1, queued: 0 (waited: 6318s)
In progress: 1, queued: 0 (waited: 6339s)
In progress: 1, queued: 0 (waited: 6360s)
In progress: 1, queued: 0 (waited: 6381s)
In progress: 1, queued: 0 (waited: 6403s)
In progress: 1, queued: 0 (waited: 6424s)
In progress: 1, queued: 0 (waited: 6445s)
In progress: 1, queued: 0 (waited: 6466s)
In progress: 1, queued: 0 (waited: 6487s)
In progress: 1, queued: 0 (waited: 6508s)
In progress: 1, queued: 0 (waited: 6529s)
In progress: 1, queued: 0 (waited: 6551s)
In progress: 1, queued: 0 (waited: 6572s)
In progress: 1, queued: 0 (waited: 6593s)
In progress: 1, queued: 0 (waited: 6614s)
In progress: 1, queued: 0 (waited: 6636s)
In progress: 1, queued: 0 (waited: 6657s)
In progress: 1, queued: 0 (waited: 6679s)
In progress: 1, queued: 0 (waited: 6700s)
In progress: 1, queued: 0 (waited: 6721s)
In progress: 1, queued: 0 (waited: 6742s)
In progress: 1, queued: 0 (waited: 6763s)
In progress: 1, queued: 0 (waited: 6784s)
In progress: 1, queued: 0 (waited: 6806s)
In progress: 1, queued: 0 (waited: 6827s)
In progress: 1, queued: 0 (waited: 6848s)
In progress: 1, queued: 0 (waited: 6869s)
In progress: 1, queued: 0 (waited: 6890s)
In progress: 1, queued: 0 (waited: 6912s)
In progress: 1, queued: 0 (waited: 6933s)
In progress: 1, queued: 0 (waited: 6954s)
In progress: 1, queued: 0 (waited: 6976s)
In progress: 1, queued: 0 (waited: 6997s)
In progress: 1, queued: 0 (waited: 7019s)
In progress: 1, queued: 0 (waited: 7040s)
In progress: 1, queued: 0 (waited: 7061s)
In progress: 1, queued: 0 (waited: 7082s)
In progress: 1, queued: 0 (waited: 7103s)
In progress: 1, queued: 0 (waited: 7124s)
In progress: 1, queued: 0 (waited: 7146s)
In progress: 2, queued: 0 (waited: 7167s)
In progress: 2, queued: 0 (waited: 7188s)
In progress: 2, queued: 0 (waited: 7209s)
In progress: 2, queued: 0 (waited: 7230s)
In progress: 2, queued: 0 (waited: 7252s)
In progress: 2, queued: 0 (waited: 7273s)
In progress: 2, queued: 0 (waited: 7294s)
In progress: 2, queued: 0 (waited: 7315s)
In progress: 2, queued: 0 (waited: 7336s)
In progress: 2, queued: 0 (waited: 7358s)
In progress: 1, queued: 0 (waited: 7379s)
In progress: 1, queued: 0 (waited: 7400s)
In progress: 1, queued: 0 (waited: 7422s)
In progress: 1, queued: 0 (waited: 7443s)
In progress: 0, queued: 0 (waited: 7464s)
In progress: 0, queued: 0 (waited: 7485s)
In progress: 0, queued: 0 (waited: 7506s)
In progress: 0, queued: 0 (waited: 7527s)
In progress: 0, queued: 0 (waited: 7548s)
In progress: 0, queued: 0 (waited: 7570s)
In progress: 0, queued: 0 (waited: 7591s)
In progress: 0, queued: 0 (waited: 7612s)
In progress: 0, queued: 0 (waited: 7633s)
In progress: 0, queued: 0 (waited: 7654s)
In progress: 0, queued: 0 (waited: 7675s)
In progress: 0, queued: 0 (waited: 7696s)
In progress: 0, queued: 0 (waited: 7718s)
In progress: 0, queued: 0 (waited: 7739s)

Predictions

Once the model is ready we can pass the predictions file through the model to get a feel for how the the predictions look.

For every puchase DataRobot will output the class probabilities for each of the 500 most common products.

From here we can take the top n probabilities and suggestions for the recommendations. In this example we will look at the top 3 products to recommend for an incoming user.

In [16]:

predict_dataset = test.sample(10)
In [17]:
model = dr.ModelRecommendation.get(
    project.id, dr.enums.RECOMMENDED_MODEL_TYPE.RECOMMENDED_FOR_DEPLOYMENT
).get_model()


dataset = project.upload_dataset(predict_dataset)

pred_job = model.request_predictions(dataset.id)
preds = pred_job.get_result_when_complete()
In [18]:

preds = preds[[c for c in preds.columns if "class" in c]]
preds.columns = [c.replace("class_", "") for c in preds.columns]
In [19]:

top_recommendations = pd.DataFrame(
    preds.apply(
        lambda x: list(preds.columns[np.array(x).argsort()[::-1][:3]]), axis=1
    ).to_list(),
    columns=["recommendation 1", "recommendation 2", "recommendation 3"],
)

output = pd.concat(
    [
        predict_dataset[["user_id", "order_time"]].reset_index(drop="True"),
        top_recommendations,
    ],
    axis=1,
)

Below we see two datasets. The first one, predict_dataset that is the one that we used to pass to the DataRobot Model. The second one, output contains the original order times for each user id, along with the associated recommendations for each of those customers.

In [20]:

predict_dataset

Out[20]:

user_idorder_time
356143152015-03-15
12348922015-03-15
21286192015-03-15
962390362015-03-15
9835722015-03-15
692272002015-03-15
236922015-03-15
258108602015-03-15
986399972015-03-15
522207482015-03-15
In [21]:
output

Out[21]:

user_idorder_timerecommendation 1recommendation 2recommendation 3
0143152015-03-15Original Whipped Cream CheeseBananaSour Cream
148922015-03-15Total 2% Lowfat Greek Strained Yogurt With Blu…Organic AvocadoMarinara Sauce
286192015-03-15Organic StrawberriesTotal 0% Nonfat Plain Greek YogurtOrganic Baby Arugula
3390362015-03-15Tilapia FiletOrganic Garnet Sweet Potato (Yam)Banana
435722015-03-15Lemon Fruit & Nut Food BarOrganic Baby SpinachBag of Organic Bananas
5272002015-03-15Grapefruit Sparkling WaterBananaMarinara Sauce
66922015-03-15BananaOrganic Red OnionRed Vine Tomato
7108602015-03-15Bag of Organic BananasOrganic Hass AvocadoOrganic Lowfat 1% Milk
8399972015-03-15Original Whipped Cream CheeseBananaBlueberries
9207482015-03-15Golden Delicious AppleBartlett PearsBanana
Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

Get Started with Recommendation Engine

Consumer Packaged Goods
Explore more Retail AI Accelerators
Retailers deliver better demand forecasting, marketing efficiency, transparency in the supply chain, and profitability through advanced AI applications and a myriad of AI use cases throughout the value chain: from store-level demand and out-of-stock (OOS) predictions to marketing channel modeling and customer LTV predictions. With the abundance of consumer data, changing consumption patterns, global supply chain shakeups, and increased pressure to drive better forecasting, retail businesses can no longer ignore the potential of AI in their industry.

Explore more AI Accelerators