When I do a data[genres].sum() I get the following result
Action        1891
Adult            9
Adventure     1313
Animation      314
Biography      394
Comedy        3922
Crime         1867
Drama         5697
Family         754
Fantasy        916
Film-Noir       40
History        358
Horror        1215
Music          371
Musical        260
Mystery       1009
News             1
Reality-TV       1
Romance       2441
Sci-Fi         897
Sport          288
Thriller      2832
War            512
Western        235
dtype: int64
But when I try to sort on the sum using np.sort
genre_count = np.sort(data[genres].sum())[::-1]
pd.DataFrame({'Genre Count': genre_count})`
I get the following result
`Out[19]:
    Genre Count
0   5697
1   3922
2   2832
3   2441
4   1891
5   1867
6   1313
7   1215
8   1009
9   916
10  897
11  754
12  512
13  394
14  371
15  358
16  314
17  288
18  260
19  235
20  40
21  9
22  1
23  1
The expected result should be like this:
Genre Count
Drama   5697
Comedy  3922
Thriller    2832
Romance     2441
Action  1891
Crime   1867
Adventure   1313
Horror  1215
Mystery     1009
Fantasy     916
Sci-Fi  897
Family  754
War     512
Biography   394
Music   371
History     358
Animation   314
Sport   288
Musical     260
Western     235
Film-Noir   40
Adult   9
News    1
Reality-TV  1
It seems like numpy is ignoring the genre column.
Could somebody help me understand where I am going wrong?
Python lists are better optimized for "plain Python" code: reading or writing to a list element is faster than it is for a NumPy array. The benefit of NumPy array comes from "whole array operations" (so called array operations) and from compiled extensions.
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
sort() function sorts the NumPy array in ascending order. Let's see how to sort NumPy arrays in descending order. By Sorting a NumPy array in descending order sorts the elements from largest to smallest value. You can use the syntax array[::-1] to reverse the array.
Pandas has a better performance when a number of rows is 500K or more. Numpy has a better performance when number of rows is 50K or less. Indexing of the pandas series is very slow as compared to numpy arrays.
data[genres].sum() returns a Series. The genre column isn't actually a column - it's an index.
np.sort just looks at the values of the DataFrame or Series, not at the index, and it returns a new NumPy array with the sorted data[genres].sum() values. The index information is lost.
The way to sort data[genres].sum() and keep the index information would be to do something like:
genre_count = data[genres].sum()
genre_count.sort(ascending=False) # in-place sort of genre_count, high to low
You can then turn the sorted genre_count Series back into a DataFrame if you like:
pd.DataFrame({'Genre Count': genre_count})
                        data[genres].sum() returns a Series.
And if you're using pandas version 0.2, the command have little bit changes.
    genre_count = data[genres].sum()
    genre_count.sort_values(ascending=False)`
You could find reference on pandas documentation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With