Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe: How to parse integers into string of 0s and 1s?

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

      sample      column_A         
0     sample1        6/6    
1     sample2        0/4
2     sample3        2/6    
3     sample4       12/14   
4     sample5       15/21   
5     sample6       12/12   
..    ....

The values in column_A are not fractions, and these data must be manipulated such that I can convert each value into 0s and 1s (not convert the integers into their binary counterparts).

The "numerator" above gives the total number of 1s, while the "denominator" gives the total number of 0s and 1s together.

So, the table should actually be in the following format:

      sample      column_A         
0     sample1     111111    
1     sample2     0000
2     sample3     110000    
3     sample4     11111111111100    
4     sample5     111111111111111000000 
5     sample6     111111111111  
..    ....

I've never parsed an integer to output strings of 0s and 1s like this. How does one do this? Is there a "pandas method" to use with lambda expressions? Pythonic string parsing or regex?

like image 405
ShanZhengYang Avatar asked Dec 19 '22 14:12

ShanZhengYang


1 Answers

First, suppose you write a function:

def to_binary(s):
    n_d = s.split('/')
    n, d = int(n_d[0]), int(n_d[1])
    return '1' * n + '0' * (d - n)

So that,

>>> to_binary('4/5')
'11110'

Now you just need to use pandas.Series.apply:

 df.column_A.apply(to_binary)
like image 184
Ami Tavory Avatar answered Dec 21 '22 10:12

Ami Tavory