Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a Graphlab SFrame Date column into three columns (Year Month Day)

Given a graphlab SFrame where there's a column with dates, e.g.:

+-------+------------+---------+-----------+
| Store |    Date    |  Sales  | Customers |
+-------+------------+---------+-----------+
|   1   | 2015-07-31 |  5263.0 |   555.0   |
|   2   | 2015-07-31 |  6064.0 |   625.0   |
|   3   | 2015-07-31 |  8314.0 |   821.0   |
|   4   | 2015-07-31 | 13995.0 |   1498.0  |
|   3   | 2015-07-20 |  4822.0 |   559.0   |
|   2   | 2015-07-10 |  5651.0 |   589.0   |
|   4   | 2015-07-11 | 15344.0 |   1414.0  |
|   5   | 2015-07-23 |  8492.0 |   833.0   |
|   2   | 2015-07-19 |  8565.0 |   687.0   |
|   10  | 2015-07-09 |  7185.0 |   681.0   |
+-------+------------+---------+-----------+
[986159 rows x 4 columns]

Is there an easy way in graphlab / other python function to convert the Date column to Year|Month|Day?

+-------+------+----+----+---------+-----------+
| Store | YYYY | MM | DD |  Sales  | Customers |
+-------+------+----+----+---------+-----------+
|   1   | 2015 | 07 | 31 |  5263.0 |   555.0   |
|   2   | 2015 | 07 | 31 |  6064.0 |   625.0   |
|   3   | 2015 | 07 | 31 |  8314.0 |   821.0   |
+-------+------------+---------+-----------+
[986159 rows x 4 columns]

In pandas, I can do this: Which is the fastest way to extract day, month and year from a given date?

But to convert an SFrame into Panda to split date and convert back into SFrame is quite a chore.

like image 763
alvas Avatar asked Nov 18 '15 23:11

alvas


2 Answers

You could also do it with the split-datetime method. It gives you a bit more flexibility.

sf.add_columns(sf['Date'].split_datetime(column_name_prefix = ''))

The split_datetime method itself is on the SArray (a single column of the SFrame) and it returns an SFrame which you can then add back to the original data (at basically 0 cost)

like image 118
srikris Avatar answered Oct 05 '22 22:10

srikris


A quick and dirty way to do this is

sf['date2'] = sf['Date'].apply(lambda x: x.split('-'))
sf = sf.unpack('date2')

Another option would be to convert the Date column to a datetime type, then use the graphlab.SArray.split_datetime function.

like image 45
papayawarrior Avatar answered Oct 06 '22 00:10

papayawarrior