classifying a series to a new column in pandas

Tags:

pandas

I want to be able to take my current set of data, which is filled with ints, and classify them according to certain criteria. The table looks something like this:

Click to copy

[in]> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
[out]>
   A  B  C
0  0  1  0
1  2  0  0
2  3  2  1
3  2  0  0
4  0  0  1
5  0  0  0

I'd like to classify these in a separate column by string. Being more familiar with R, I tried to create a new column with the rules in that column's definition. Following that I attempted with .ix and lambdas which both resulted in a type errors (between ints & series ). I'm under the impression that this is a fairly simple question. Although the following is completely wrong, here is the logic from attempt 1:

Click to copy

df['D']=(
if ((df['A'] > 0) & (df['B'] == 0) & df['C']==0): 
    return "c1";
elif ((df['A'] == 0) & ((df['B'] > 0) | df['C'] >0)): 
    return "c2";
else:
    return "c3";)

for a final result of:

Click to copy

   A  B  C     D
0  0  1  0  "c2"
1  2  0  0  "c1"
2  3  2  1  "c3"
3  2  0  0  "c1"
4  0  0  1  "c2"
5  0  0  0  "c3"

If someone could help me figure this out it would be much appreciated.

944

asked Mar 07 '13 20:03

stites

1 Answers

I can think of two ways. The first is to write a classifier function and then .apply it row-wise:

Click to copy

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
>>> 
>>> def classifier(row):
...         if row["A"] > 0 and row["B"] == 0 and row["C"] == 0:
...                 return "c1"
...         elif row["A"] == 0 and (row["B"] > 0 or row["C"] > 0):
...                 return "c2"
...         else:
...                 return "c3"
...     
>>> df["D"] = df.apply(classifier, axis=1)
>>> df
   A  B  C   D
0  0  1  0  c2
1  2  0  0  c1
2  3  2  1  c3
3  2  0  0  c1
4  0  0  1  c2
5  0  0  0  c3

and the second is to use advanced indexing:

Click to copy

>>> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
>>> df["D"] = "c3"
>>> df["D"][(df["A"] > 0) & (df["B"] == 0) & (df["C"] == 0)] = "c1"
>>> df["D"][(df["A"] == 0) & ((df["B"] > 0) | (df["C"] > 0))] = "c2"
>>> df
   A  B  C   D
0  0  1  0  c2
1  2  0  0  c1
2  3  2  1  c3
3  2  0  0  c1
4  0  0  1  c2
5  0  0  0  c3

Which one is clearer depends upon the situation. Usually the more complex the logic the more likely I am to wrap it up in a function I can then document and test.

173

answered Sep 28 '22 09:09

DSM

Related questions
                            
                                pandas: generate and plot average
                            
                                How to get the coordinates from layout from graphviz?
                            
                                passing variables from python to bash shell script via os.system
                            
                                igraph: why is add_edge function so slow ompared to add_edges?
                            
                                Popen.returncode not working in Python?
                            
                                Python while loops
                            
                                App Engine: Structured Property vs Reference Property for one-to-many relationship
                            
                                Not exporting functions from Python module
                            
                                Rail Fence Cipher- Looking for a better solution
                            
                                Understanding Virtual Environment for Python
                            
                                Behavior of "and" with sets in Python
                            
                                How to call Excel VBA functions and subs using Python win32com?
                            
                                Get pip to work with git and github repository
                            
                                Is there's any python library to output dictionary in beautiful ascii table?
                            
                                python: lower() german umlauts
                            
                                python list of dictionaries find duplicates based on value
                            
                                Differentiate celery, kombu, PyAMQP and RabbitMQ/ironMQ
                            
                                Python Regex and the Copyright Symbol
                            
                                Recursion and Helper Function
                            
                                How to fix localflavor deprecation warning in django 1.5?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

classifying a series to a new column in pandas

Tags:

python

pandas

stites

People also ask

1 Answers

DSM

Recent Activity

Donate For Us