Pandas Dataframe: Multiplying Two Columns

Tags:

I am trying to multiply two columns (ActualSalary * FTE) within the dataframe (OPR) to create a new column (FTESalary), but somehow it has stopped at row 21357, I don't understand what went wrong or how to fix it. The two columns came from importing a csv file using the line: OPR = pd.read_csv('OPR.csv', encoding='latin1')

[In] OPR
[out]
ActualSalary    FTE
44600           1
58,000.00       1
70,000.00       1
17550           1
34693           1
15674           0.4

[In] OPR["FTESalary"] = OPR["ActualSalary"].str.replace(",", "").astype("float")*OPR["FTE"]
[In] OPR
[out]
ActualSalary    FTE FTESalary
44600           1   44600
58,000.00       1   58000
70,000.00       1   70000
17550           1   NaN
34693           1   NaN
15674           0.4 NaN

I am not expecting any NULL values as an output at all, I am really struggling with this. I would really appreciate the help. Many thanks in advance! (I am new to both coding and here, please let me know via message if I have made mistakes or can improve the way I post questions here)

Sharing the data @oppresiveslayer

[In] OPR[0:6].to_dict()
[out]
{'ActualSalary': {0: '44600',
1: '58,000.00',
2: '70,000.00',
3: '39,780.00',
4: '0.00',
5: '78,850.00'},
 'FTE': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0}}

For more information on the two columns @charlesreid1

[in] OPR['ActualSalary'].astype
[out]
Name: ActualSalary, Length: 21567, dtype: object>

[in] OPR['FTE'].astype
[out]
Name: FTE, Length: 21567, dtype: float64>

The version I am using: python: 3.7.3, pandas: 0.25.1 on jupyter Notebook 6.0.0

250

asked Dec 12 '19 23:12

SyLviA

1 Answers

I believe that your ActualSalary column is a mix of strings and integers. That is the only way I've been able to recreate your error:

df = pd.DataFrame(
    {'ActualSalary': ['44600', '58,000.00', '70,000.00', 17550, 34693, 15674],
     'FTE': [1, 1, 1, 1, 1, 0.4]})

>>> df['ActualSalary'].str.replace(',', '').astype(float) * df['FTE']
0    44600.0
1    58000.0
2    70000.0
3        NaN
4        NaN
5        NaN
dtype: float64

The issue arises when you try to remove the commas:

>>> df['ActualSalary'].str.replace(',', '')
0       44600
1    58000.00
2    70000.00
3         NaN
4         NaN
5         NaN
Name: ActualSalary, dtype: object

First convert them to strings, before converting back to floats.

fte_salary = (
    df['ActualSalary'].astype(str).str.replace(',', '')  # Remove commas in string, e.g. '55,000.00' -> '55000.00'
    .astype(float)  # Convert string column to floats.
    .mul(df['FTE'])  # Multiply by new salary column by Full-Time-Equivalent (FTE) column.
)
>>> df.assign(FTESalary=fte_salary)  # Assign new column to dataframe.
      ActualSalary  FTE  FTESalary
    0        44600  1.0    44600.0
    1    58,000.00  1.0    58000.0
    2    70,000.00  1.0    70000.0
    3        17550  1.0    17550.0
    4        34693  1.0    34693.0
    5        15674  0.4     6269.6

159

answered Oct 16 '22 10:10

Alexander

Related questions
                            
                                How to get only function blocks using sly
                            
                                How does pandas Dataframe.loc accept the [...] syntax?
                            
                                What are the Tensorflow qint8, quint8, qint32, qint16, and quint16 datatypes?
                            
                                impossible to catch asyncio.TimeoutError?
                            
                                How to sort a list by length and then in reverse alphabetical order
                            
                                Intel MKL FATAL ERROR: Cannot load mkl_intel_thread.dll
                            
                                What solver should I use if my objective function is an nonlinear (also exponential explanation) function? Python GEKKO
                            
                                How do I count letters in a string?
                            
                                Cannot Import Name 'keras_export' From 'tensorflow.python.util.tf_export'
                            
                                How do I pass a keyword argument to the forward used by a pre-forward hook?
                            
                                Why does reading a whole file take up more RAM than its size on DISK?
                            
                                Add keys to a dictionary with automatically incremented values
                            
                                How can I cancel an active boto3 s3 file_download?
                            
                                Which SSIM is correct : skimage.metrics.structural_similarity()?
                            
                                What exactly does pygame.init() do?
                            
                                Can I train a Tensorflow keras model with complex input/output?
                            
                                How to generate all possible combinations with a given condition to make it more efficient?
                            
                                How to use Font Awesome icons in python plotly dash
                            
                                Zero predictions despite masking support for zero-padded mini batch LSTM training in keras
                            
                                How do I crop an image using a binary mask image of the same picture to remove the background in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Dataframe: Multiplying Two Columns

Tags:

python

pandas

matrix-multiplication

SyLviA

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us