Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Dataframe assignment

I am following a Lynda tutorial where they use the following code:

import pandas as pd
import seaborn
flights = seaborn.load_dataset('flights')
flights_indexed = flights.set_index(['year','month'])
flights_unstacked = flights_indexed.unstack()
flights_unstacked['passengers','total']  = flights_unstacked.sum(axis=1)

and it works perfectly. However, in my case it seems that the code is not compiling, for the last line I keep getting an error.

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

I know in the video they are using Python 2, however I have Python 3 since I am learning for work (which uses Python 3). Most of the differences I have been able to figure out, however I cannot figure out how to create this new column called 'total' with the sums of the passengers.

like image 671
Mathephysicist Avatar asked Feb 06 '17 20:02

Mathephysicist


1 Answers

The root cause of this error message is the categorical nature of the month column:

In [42]: flights.dtypes
Out[42]:
year             int64
month         category
passengers       int64
dtype: object

 In [43]: flights.month.cat.categories
Out[43]: Index(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], d
type='object')

and you are trying to add a category total - Pandas doesn't like that.

Workaround:

In [45]: flights.month.cat.add_categories('total', inplace=True)

In [46]: x = flights.pivot(index='year', columns='month', values='passengers')

In [47]: x['total'] = x.sum(1)

In [48]: x
Out[48]:
month  January  February  March  April    May   June   July  August  September  October  November  December   total
year
1949     112.0     118.0  132.0  129.0  121.0  135.0  148.0   148.0      136.0    119.0     104.0     118.0  1520.0
1950     115.0     126.0  141.0  135.0  125.0  149.0  170.0   170.0      158.0    133.0     114.0     140.0  1676.0
1951     145.0     150.0  178.0  163.0  172.0  178.0  199.0   199.0      184.0    162.0     146.0     166.0  2042.0
1952     171.0     180.0  193.0  181.0  183.0  218.0  230.0   242.0      209.0    191.0     172.0     194.0  2364.0
1953     196.0     196.0  236.0  235.0  229.0  243.0  264.0   272.0      237.0    211.0     180.0     201.0  2700.0
1954     204.0     188.0  235.0  227.0  234.0  264.0  302.0   293.0      259.0    229.0     203.0     229.0  2867.0
1955     242.0     233.0  267.0  269.0  270.0  315.0  364.0   347.0      312.0    274.0     237.0     278.0  3408.0
1956     284.0     277.0  317.0  313.0  318.0  374.0  413.0   405.0      355.0    306.0     271.0     306.0  3939.0
1957     315.0     301.0  356.0  348.0  355.0  422.0  465.0   467.0      404.0    347.0     305.0     336.0  4421.0
1958     340.0     318.0  362.0  348.0  363.0  435.0  491.0   505.0      404.0    359.0     310.0     337.0  4572.0
1959     360.0     342.0  406.0  396.0  420.0  472.0  548.0   559.0      463.0    407.0     362.0     405.0  5140.0
1960     417.0     391.0  419.0  461.0  472.0  535.0  622.0   606.0      508.0    461.0     390.0     432.0  5714.0

UPDATE: alternatively if you don't want to touch the original DF you can get rid of categorical columns in the flights_unstacked DF:

In [76]: flights_unstacked.columns = \
    ...:     flights_unstacked.columns \
    ...:     .set_levels(flights_unstacked.columns.get_level_values(1).categories,
    ...:                 level=1)
    ...:

In [77]: flights_unstacked['passengers','total']  = flights_unstacked.sum(axis=1)

In [78]: flights_unstacked
Out[78]:
      passengers
month    January February March April  May June July August September October November December total
year
1949         112      118   132   129  121  135  148    148       136     119      104      118  1520
1950         115      126   141   135  125  149  170    170       158     133      114      140  1676
1951         145      150   178   163  172  178  199    199       184     162      146      166  2042
1952         171      180   193   181  183  218  230    242       209     191      172      194  2364
1953         196      196   236   235  229  243  264    272       237     211      180      201  2700
1954         204      188   235   227  234  264  302    293       259     229      203      229  2867
1955         242      233   267   269  270  315  364    347       312     274      237      278  3408
1956         284      277   317   313  318  374  413    405       355     306      271      306  3939
1957         315      301   356   348  355  422  465    467       404     347      305      336  4421
1958         340      318   362   348  363  435  491    505       404     359      310      337  4572
1959         360      342   406   396  420  472  548    559       463     407      362      405  5140
1960         417      391   419   461  472  535  622    606       508     461      390      432  5714
like image 68
MaxU - stop WAR against UA Avatar answered Sep 20 '22 02:09

MaxU - stop WAR against UA