When I have a pd.DataFrame
with paths, I end up doing a lot of .map(lambda path: Path(path).{method_name}
, or apply(axis=1)
e.g:
(
pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
.assign(full_path=lambda df: df.apply(lambda row: Path(row.base_dir) / row.file_name, axis=1))
)
base_dir file_name full_path
0 dir_A file_0 dir_A/file_0
1 dir_B file_1 dir_B/file_1
It seems odd to me especially because pathlib
does implement /
so that something like df.base_dir / df.file_name
would be more pythonic and natural.
I have not found any path
type implemented in pandas, is there something I am missing?
I have found it may be better to once for all do sort of a astype(path)
then at least for path concatenation with pathlib
it is vectorized:
(
pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
# this is where I would expect `astype({'base_dir': Path})`
.assign(**{col_name:lambda df: df[col_name].map(Path) for col_name in ["base_dir", "file_name"]})
.assign(full_path=lambda df: df.base_dir / df.file_name)
)
It seems like the easiest way would be:
df.base_dir.map(Path) / df.file_name.map(Path)
It saves the need for a lambda function, but you still need to map to 'Path'.
Alternatively, just do:
df.base_dir.str.cat(df.file_name, sep="/")
The latter won't work on Windows (who cares, right? :) but will probably run faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With