I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
If you can read the whole thing into memory, you can use the str
method for vector operations:
>>> df = pd.read_csv("toolong.csv")
>>> df
a b c
0 1 1256378916212378918293 2
[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
a b c
0 1 1256378916 2
[1 rows x 3 columns]
Also note that you can get a Series with lengths using
>>> df["b"].str.len()
0 10
Name: b, dtype: int64
I was originally wondering if
>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
a b c
0 1 12563 2
[1 rows x 3 columns]
would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With