My question is similar to this. But for strings.
So I have a dataframe, each column contains strings of different length. So, how I can find the maximum string length per column?
Then, how to select the columns, where length is > 1, by sapply or similar.
A typical column of the dataframe looks like this:
clmn=c("XDX", "GUV", "FQ", "ACUE", "HIT", "AYX", "NFD", "AHBW", "GKQ", "PYF")
Thanks
To find the length of strings in a data frame you have the len method on the dataframes str property. But to do this you need to call this method on the column that contains the string data.
The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
To find the length of the longest string in a DataFrame column, use the expression df. COL. str. len().
len() function is used to compute the length of each element in the Series/Index. Compute the length of each element in the Series/Index. The element may be a sequence (such as a string, tuple or list) or a collection (such as a dictionary).
We can use nchar
max(nchar(clmn))
For finding the maximum character length for each column
lapply(df1, function(x) max(nchar(x)))
If we need to filter the columns that have maximum string length greater than 1
df1[sapply(df1, function(x) max(nchar(x)))>1]
Or
Filter(function(x) max(nchar(x)) >1, df1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With