I have the following pandas DataFrame in Python3.x:
import pandas as pd
dict1 = {
'ID':['first', 'second', 'third', 'fourth', 'fifth'],
'pattern':['AAABCDEE', 'ABBBBD', 'CCCDE', 'AA', 'ABCDE']
}
df = pd.DataFrame(dict1)
>>> df
ID pattern
0 first AAABCDEE
1 second ABBBBD
2 third CCCDE
3 fourth AA
4 fifth ABCDE
There are two columns, ID and pattern. The string in pattern with the longest length is in the first row, len('AAABCDEE'), which is length 8.
My goal is to standardize the strings such that these are the same length, with the trailing spaces as ?.
Here is what the output should look like:
>>> df
ID pattern
0 first AAABCDEE
1 second ABBBBD??
2 third CCCDE???
3 fourth AA??????
4 fifth ABCDE???
If I was able to make the trailing spaces NaN, then I could try something like:
df = df.applymap(lambda x: int(x) if pd.notnull(x) else str("?"))
But I'm not sure how one would efficiently (1) find the longest string in pattern and (2) then add NaN add the end of the strings up to this length? This may be a convoluted approach...
You can use Series.str.ljust for this, after acquiring the max string length in the column.
df.pattern.str.ljust(df.pattern.str.len().max(), '?')
# 0 AAABCDEE
# 1 ABBBBD??
# 2 CCCDE???
# 3 AA??????
# 4 ABCDE???
# Name: pattern, dtype: object
In the source for Pandas 0.22.0 here it can be seen that ljust is entirely equivalent to pad with side='right', so pick whichever you find more clear.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With