Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I manage units in pandas data?

I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame that looks like this:

   length (m)  width (m)  thickness (cm) 0         1.2        3.4             5.6 1         7.8        9.0             1.2 2         3.4        5.6             7.8 

Currently, the measurement units are encoded in column names. Downsides include:

  1. column selection is awkward -- df['width (m)'] vs. df['width']
  2. things will likely break if the units of my source data change

If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?

like image 616
ajwood Avatar asked Sep 09 '16 20:09

ajwood


People also ask

How do I limit decimal places in pandas?

round() function is used to round a DataFrame to a variable number of decimal places. This function provides the flexibility to round different columns by different places.

How do I set the number of columns in pandas?

Get the number of columns: len(df. columns) The number of columns of pandas. DataFrame can be obtained by applying len() to the columns attribute.

How do I make pandas use less memory?

Ways to optimize memory in Pandas Instead, we can downcast the data types. Simply Convert the int64 values as int8 and float64 as float8. This will reduce memory usage. By converting the data types without any compromises we can directly cut the memory usage to near half.

How do you find the range of values in pandas?

range as in: range(col_i) = max(col_i) - min(col_i).


2 Answers

There isn't any great way to do this right now, see github issue here for some discussion.

As a quick hack, could do something like this, maintaining a separate dict with the units.

In [3]: units = {}  In [5]: newcols = []    ...: for col in df:    ...:     name, unit = col.split(' ')    ...:     units[name] = unit    ...:     newcols.append(name)  In [6]: df.columns = newcols  In [7]: df Out[7]:    length  width  thickness 0     1.2    3.4        5.6 1     7.8    9.0        1.2 2     3.4    5.6        7.8  In [8]: units['length'] Out[8]: '(m)' 
like image 62
chrisb Avatar answered Sep 25 '22 03:09

chrisb


As I was searching for this, too. Here is what pint and the (experimental) pint_pandas is capable of today:

import pandas as pd import pint import pint_pandas  ureg = pint.UnitRegistry() ureg.Unit.default_format = "~P" pint_pandas.PintType.ureg.default_format = "~P"  df = pd.DataFrame({     "length": pd.Series([1.2, 7.8, 3.4], dtype="pint[m]"),     "width": pd.Series([3.4, 9.0, 5.6], dtype="pint[m]"),     "thickness": pd.Series([5.6, 1.2, 7.8], dtype="pint[cm]"), })  print(df.pint.dequantify()) 
     length width thickness unit      m     m        cm 0       1.2   3.4       5.6 1       7.8   9.0       1.2 2       3.4   5.6       7.8 
df['width'] = df['width'].pint.to("inch")  print(df.pint.dequantify()) 
     length       width thickness unit      m          in        cm 0       1.2  133.858268       5.6 1       7.8  354.330709       1.2 2       3.4  220.472441       7.8 
like image 30
P. B. Avatar answered Sep 26 '22 03:09

P. B.