Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicting a users next action based on current day and time

I'm using Microsoft Azure Machine Learning Studio to try an experiment where I use previous analytics captured about a user (at a time, on a day) to try and predict their next action (based on day and time) so that I can adjust the UI accordingly. So if a user normally visits a certain page every Thursday at 1pm, then I would like to predict that behaviour.

Warning - I am a complete novice with ML, but have watched quite a few videos and worked through tutorials like the movie recommendations example.

I have a csv dataset with userid,action,datetime and would like to train a matchbox recommendation model, which, from my research appears to be the best model to use. I can't see a way to use date/time in the training. The idea being that if I could pass in a userid and the date, then the recommendation model should be able to give me a probably result of what that user is most likely to do.

I get results from the predictive endpoint, but the training endpoint gives the following error:

{
    "error": {
        "code": "ModuleExecutionError",
        "message": "Module execution encountered an error.",
        "details": [
            {
                "code": "18",
                "target": "Train Matchbox Recommender",
                "message": "Error 0018: Training dataset of user-item-rating triples contains invalid data."
            }
        ]
    }
}

Here is a link to a public version of the experiment

Any help would be appreciated.

Thanks.

enter image description here

like image 233
BigBadOwl Avatar asked Mar 19 '18 13:03

BigBadOwl


People also ask

Which machine learning model would you use to predict whether a customer will buy your product?

Building Machine Learning Models From the results in Figure 20 above, we see that the LogisticRegression model is the best in terms of the metrics accuracy and F₁-score.


2 Answers

So from messing with this for a while, I think I may see where the issue may lie. I think that the first three inputs of the Train Matchbox Recommender would need to be filled in for an accurate prediction. I'll include screenshots of the sample for recommending restaurants, as well.

The first input would be the dataset consisting of the user, item, and rating. Ratings data

The second input would be the features of each user. User data

And the third input would be the features of each feature (restaurant in this case). Restaurant data

So to help with the date/time issue, I'm wondering if the data would need to be munged to match something similar to the restaurant and user data.

I know it's not much, but I hope it helps lead you down the right track.

like image 23
Jon Avatar answered Nov 02 '22 22:11

Jon


Maybe this answer could be helpful, you may also take a look on this where you can read:

The problem is probably with the range of rating data. There's an upper limit for rating range, because the training gets expensive if the range between smallest and largest rating is too large.

[...]

One option would be to scale the ratings to a narrower range.

According to this MSDN, please note that you cannot have a gap between the min and max note higher than 100.

So you have to make a pre-processing on your csv file column data (userid, action, datetime etc...) in order to keep all column data in the [0-99] range.

Please see bellow a Python implementation (to share the logic):

#!/usr/bin/env python
# -*- coding: UTF-8 -*- 

big_gap_arr = [-250,-2350,850,-120,-1235,3212,1,5,65,48,265,1204,65,23,45,895,5000,3,325,3244,5482] #data with big gap

abs_min =  abs(min(big_gap_arr)) #get the absolute minimal value
max_diff= ( max(big_gap_arr) + abs_min ) #get the maximal diff

specific_range_arr=[]
for each_value in big_gap_arr:
    new_value = ( 99/1. * float( abs_min + each_value) / max_diff ) #get a corresponding value in the [0,99] range
    specific_range_arr.append(new_value)

print specific_range_arr #post computed data => all in range [0,99]

Which give you:

[26.54494382022472, 0.0, 40.449438202247194, 28.18820224719101, 14.094101123595506, 70.3061797752809, 29.71769662921348, 29.76825842696629, 30.526685393258425, 30.31179775280899, 33.05477528089887, 44.924157303370784, 30.526685393258425, 29.995786516853933, 30.27387640449438, 41.01825842696629, 92.90730337078652, 29.742977528089888, 33.813202247191015, 70.71067415730337, 99.0]

Note that all data are now in the [0,99] range


Following this process:

  • User id could be float instead an integer

  • Action is an integer (if you got less than 100 actions) or float (if more than 100 actions)

  • Datetime will be splited in two integer (or one integer and one float), please see bellow:


Concerning:

(A) way to use date/time in the training

You may split your datetime in two column, something like:

  • one column for the weekday:

    • 0: Sunday
    • 1: Monday
    • 2: Tuesday
    • [...]
    • 6: Saturday
  • one column for the time in the day:

    • 0: Between 00:00 & 00:15
    • 1: Between 00:15 & 00:30
    • 2: Between 00:30 & 00:40
    • [...]
    • 95 : Between 23:45 & 00:00

If you need a better granularity (here it is a 15 min window) you may also use float number for the time column.

like image 70
A STEFANI Avatar answered Nov 02 '22 22:11

A STEFANI