I have the following pandas DataFrame in Python3.x:
import pandas as pd
dict1 = {
'ID':['first', 'second', 'third', 'fourth', 'fifth'],
'pattern':['AAABCDEE', 'ABBBBD', 'CCCDE', 'AA', 'ABCDE']
}
df = pd.DataFrame(dict1)
>>> df
ID pattern
0 first AAABCDEE
1 second ABBBBD
2 third CCCDE
3 fourth AA
4 fifth ABCDE
There are two columns, ID
and pattern
. The string in pattern
with the longest length is in the first row, len('AAABCDEE')
, which is length 8.
My goal is to standardize the strings such that these are the same length, with the trailing spaces as ?
.
Here is what the output should look like:
>>> df
ID pattern
0 first AAABCDEE
1 second ABBBBD??
2 third CCCDE???
3 fourth AA??????
4 fifth ABCDE???
If I was able to make the trailing spaces NaN
, then I could try something like:
df = df.applymap(lambda x: int(x) if pd.notnull(x) else str("?"))
But I'm not sure how one would efficiently (1) find the longest string in pattern
and (2) then add NaN
add the end of the strings up to this length? This may be a convoluted approach...
You can use Series.str.ljust
for this, after acquiring the max string length in the column.
df.pattern.str.ljust(df.pattern.str.len().max(), '?')
# 0 AAABCDEE
# 1 ABBBBD??
# 2 CCCDE???
# 3 AA??????
# 4 ABCDE???
# Name: pattern, dtype: object
In the source for Pandas 0.22.0
here it can be seen that ljust
is entirely equivalent to pad
with side='right'
, so pick whichever you find more clear.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With