I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame
that looks like this:
length (m) width (m) thickness (cm) 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8
Currently, the measurement units are encoded in column names. Downsides include:
df['width (m)']
vs. df['width']
If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?
round() function is used to round a DataFrame to a variable number of decimal places. This function provides the flexibility to round different columns by different places.
Get the number of columns: len(df. columns) The number of columns of pandas. DataFrame can be obtained by applying len() to the columns attribute.
Ways to optimize memory in Pandas Instead, we can downcast the data types. Simply Convert the int64 values as int8 and float64 as float8. This will reduce memory usage. By converting the data types without any compromises we can directly cut the memory usage to near half.
range as in: range(col_i) = max(col_i) - min(col_i).
There isn't any great way to do this right now, see github issue here for some discussion.
As a quick hack, could do something like this, maintaining a separate dict with the units.
In [3]: units = {} In [5]: newcols = [] ...: for col in df: ...: name, unit = col.split(' ') ...: units[name] = unit ...: newcols.append(name) In [6]: df.columns = newcols In [7]: df Out[7]: length width thickness 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8 In [8]: units['length'] Out[8]: '(m)'
As I was searching for this, too. Here is what pint and the (experimental) pint_pandas is capable of today:
import pandas as pd import pint import pint_pandas ureg = pint.UnitRegistry() ureg.Unit.default_format = "~P" pint_pandas.PintType.ureg.default_format = "~P" df = pd.DataFrame({ "length": pd.Series([1.2, 7.8, 3.4], dtype="pint[m]"), "width": pd.Series([3.4, 9.0, 5.6], dtype="pint[m]"), "thickness": pd.Series([5.6, 1.2, 7.8], dtype="pint[cm]"), }) print(df.pint.dequantify())
length width thickness unit m m cm 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8
df['width'] = df['width'].pint.to("inch") print(df.pint.dequantify())
length width thickness unit m in cm 0 1.2 133.858268 5.6 1 7.8 354.330709 1.2 2 3.4 220.472441 7.8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With