Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RuntimeWarning: invalid value encountered in longlong_scalars

Tags:

python

pandas

What I'm trying to do

I want to report the weekly rejection rate for multiple users. I use a for loop to go through a monthly dataset to get the numbers for every user. The final dataframe, rates, should look something like:

The end product, rates

Description

I have an initial dataframe (numbers), that contains only the ACCEPT, REJECT and REVIEW numbers, where I added these rows and columns:

  • Rows: Grand Total, Rejection Rate
  • Columns: Grand Total

Here's how numbers look like:

|---|--------|--------|--------|--------|-------------|
|   | Week 1 | Week 2 | Week 3 | Week 4 | Grand Total | 
|---|--------|--------|--------|--------|-------------|
| 0 |  994   |  699   |  529   |   877  |     3099    | 
|---|--------|--------|--------|--------|-------------|
| 1 |   27   |   7    |    8   |   13   |      55     |
|---|--------|--------|--------|--------|-------------|
| 2 |  100   |   86   |   64   |   107  |      357    |
|---|--------|--------|--------|--------|-------------|
| 3 |  1121  |  792   |  601   |  997   |    3511     |
|---|--------|--------|--------|--------|-------------|

The indexes represent the following values:

  • 0 - ACCEPT
  • 1 - REJECT
  • 2 - REVIEW
  • 3 - TOTAL (Accept+Reject+Review)

I wrote 2 pre-defined functions:

  1. get_decline_rates(df): The get the decline rates by week in the numbers dataframe.
  2. copy(empty_df, data): To transfer all data to a new dataframe with "double" headers (for reporting purposes).

Here's my code where I add rows and columns to numbers, then re-format it:

# Adding "Grand Total" column and rows
totals = numbers.sum(axis=0) # column sum
numbers = numbers.append(totals, ignore_index=True)
grand_total = numbers.sum(axis=1) # row sum
numbers.insert(len(numbers.columns), "Grand Total", grand_total)

# Adding "Rejection Rate" and re-indexing numbers
decline_rates = get_decline_rates(numbers)
numbers = numbers.append(decline_rates, ignore_index=True)
numbers.index = ["ACCEPT","REJECT","REVIEW","Grand Total","Rejection Rate"]

# Creating a new df with report format requirements 
final = pd.DataFrame(0, columns=numbers.columns, index=["User A"]+list(numbers.index))
final.ix["User A",:] = final.columns

# Copying data from numbers to newly formatted df
copy(final,numbers) 

# Append final df of this user to the final dataframe
rates = rates.append(final)

I'm using Python 3.5.2 and Pandas 0.19.2. If it helps, here's how the initial dataset looks like:

Data format

I do a resampling on the date column to get the data by week.

What's going wrong

Here's the funny part - the code runs fine and I get all the required information in rates. However, I'm seeing this warning message:

RuntimeWarning: invalid value encountered in longlong_scalars

If i break down the code and run it line by line, this message does not appear. Even the message looks weird (what does longlong_scalars even mean?) Does anyone know what this warning message mean, and what's causing it?

UPDATE:

I just ran a similar script that takes in exactly the same input and produces a similar output (except I get daily rejection rates instead of weekly). I get the same Runtime warning, except more information is given:

RuntimeWarning: invalid value encountered in longlong_scalars

rej_rate = str(int(round((col.ix[1 ]/col.ix[3 ])*100))) + "%"

I suspect something must have gone wrong when I was trying to calculate the decline rates with my pre-defined function, get_decline_rates(df). Could it be due to the dtype of the values? All columns on the input df, numbers, are int64.

Here's the code for my pre-defined function (the input, numbers, can be found under Description):

# Description: Get rejection rates for all weeks.
# Parameters: Pandas Dataframe with ACCEPT, REJECT, REVIEW count by week.
# Output: Pandas Series with rejection rates for all days in input df.
def get_decline_rates(df):
    decline_rates = []
    for i in range(len(df.columns)):
        col = df.ix[:,i]

        try:
            rej_rate = str(int(round((col[1]/col[3])*100))) + "%"
        except ValueError:
            rej_rate = "0%"

        decline_rates.append(rej_rate)

    return pd.Series(decline_rates, index=df.columns)
like image 562
Jing Yu Avatar asked Dec 23 '22 19:12

Jing Yu


1 Answers

I had the same RuntimeWarning, and after looking into the data, it was because of a null-division. I did not have the time to look into your sample, but you could look around id=0, or some other records, where null-division or such could occur.

like image 55
lnksz Avatar answered Mar 11 '23 19:03

lnksz