I have a csv file that contains information like <pre class="prettyprint"><code>name salary department a 2500 x b 5000 y c 10000 y d 20000 x </code></pre> I need to convert this using Pandas to the form like <pre class="prettyprint"><code>dept name position x a Normal Employee x b Normal Employee y c Experienced Employee y d Experienced Employee </code></pre> if the salary <=8000 Position is Normal Employee if the salary >8000 && <=25000 Position is Experienced Employee My default code for group by <pre class="prettyprint"><code>import csv import pandas pandas.set_option('display.max_rows', 999) data_df = pandas.read_csv('employeedetails.csv') #print(data_df.columns) t = data_df.groupby(['dept']) print t </code></pre> What are the changes i need to make in this code to get the output that i mentioned above

I would use a simple function like: <pre class="prettyprint"><code>def f(x): if x <= 8000: x = 'Normal Employee' elif 8000 < x <= 25000: x = 'Experienced Employee' return x </code></pre> and then apply it to the <code>df</code>: <pre class="prettyprint"><code>df['position'] = df['salary'].apply(f) </code></pre>

How to define user defined function in pandas

Q: How do you define user define function?

User-defined functions are functions that you use to organize your code in the body of a policy. Once you define a function, you can call it in the same way as the built-in action and parser functions. Variables that are passed to a function are passed by reference, rather than by value.

Q: How do user-defined functions work in pandas?

You use a Series to Series pandas UDF to vectorize scalar operations. You can use them with APIs such as select and withColumn . The Python function should take a pandas Series as an input and return a pandas Series of the same length, and you should specify these in the Python type hints.

Q: How do pandas define UDF?

Scalar Pandas UDFs are used for vectorizing scalar operations. To define a scalar Pandas UDF, simply use @pandas_udf to annotate a Python function that takes in pandas. Series as arguments and returns another pandas. Series of the same size.

Tags:

python

pandas

I have a csv file that contains information like

name    salary  department
a        2500      x
b        5000      y
c        10000      y
d        20000      x

I need to convert this using Pandas to the form like

dept    name    position
x        a       Normal Employee
x        b       Normal Employee
y        c       Experienced Employee
y        d       Experienced Employee

if the salary <=8000 Position is Normal Employee

if the salary >8000 && <=25000 Position is Experienced Employee

My default code for group by

import csv
import pandas
pandas.set_option('display.max_rows', 999)
data_df = pandas.read_csv('employeedetails.csv')
#print(data_df.columns)
t = data_df.groupby(['dept'])
print t

What are the changes i need to make in this code to get the output that i mentioned above

873

asked Feb 15 '16 16:02

Edwin Baby

2 Answers

I would use a simple function like:

def f(x):
    if x <= 8000:
        x = 'Normal Employee'
    elif 8000 < x <= 25000:
        x = 'Experienced Employee'
    return x

and then apply it to the df:

df['position'] = df['salary'].apply(f)

answered Nov 14 '22 21:11

Fabio Lamanna

You could define 2 masks and pass these to np.where:

In [91]:
normal = df['salary'] <= 8000
experienced = (df['salary'] > 8000) & (df['salary'] <= 25000)
df['position'] = np.where(normal, 'normal emplyee', np.where(experienced, 'experienced employee', 'unknown'))
df

Out[91]:
  name  salary department              position
0    a    2500          x        normal emplyee
1    b    5000          y        normal emplyee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

Or slightly more readable is to pass them to loc:

In [92]:
df.loc[normal, 'position'] = 'normal employee'
df.loc[experienced,'position'] = 'experienced employee'
df

Out[92]:
  name  salary department              position
0    a    2500          x       normal employee
1    b    5000          y       normal employee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

answered Nov 14 '22 21:11

EdChum

Related questions
                            
                                Scikit set_params()
                            
                                Call a function through a variable in Python
                            
                                Data structure to represent multiple equivalent keys in set in Python?
                            
                                Move columns within Pandas DATA FRAME
                            
                                What is difference between root and base directory?
                            
                                Matplotlib + Seaborn - two lines with the same color?
                            
                                Write boolean dataframe to csv with 1s and 0s
                            
                                numpy.disutils.system_info.NotFoundError: no lapack/blas resources found
                            
                                Create 2D array from Pandas dataframe
                            
                                For loop syntax in Python without using range() or xrange()
                            
                                django-registration (1048, "Column 'last_login' cannot be null")
                            
                                Sort python dictionary by date key
                            
                                Call python script using node.js child_process
                            
                                Does Python's csv.reader(filename) REALLY return a list? Doesn't seem so
                            
                                Load .csv with unknown delimiter into Pandas DataFrame
                            
                                Python Connector for Django 1.9 and Python 3.5? [closed]
                            
                                Choose three different values from list in Python
                            
                                PyMongo find_one() returns nothing when passed _id as query parameter
                            
                                Forbidden (CSRF token missing or incorrect) Django error
                            
                                How do I check if a network is contained in another network in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With