Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncating column width in pandas

Tags:

python

pandas

I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?

like image 627
Luke Avatar asked Apr 01 '14 17:04

Luke


People also ask

How do pandas reshape wide to long?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.


1 Answers

If you can read the whole thing into memory, you can use the str method for vector operations:

>>> df = pd.read_csv("toolong.csv")
>>> df
   a                       b  c
0  1  1256378916212378918293  2

[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
   a           b  c
0  1  1256378916  2

[1 rows x 3 columns]

Also note that you can get a Series with lengths using

>>> df["b"].str.len()
0    10
Name: b, dtype: int64

I was originally wondering if

>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
   a      b  c
0  1  12563  2

[1 rows x 3 columns]

would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.

like image 78
DSM Avatar answered Nov 14 '22 21:11

DSM