I have a MultiIndex csv file which I would like to read in.
The data is saved in the csv file as follows:
import pandas as pd
import numpy as np
dfcsv = pd.read_csv("/FilePath/MultiIndex_Example.csv")
dfcsv
Which essentially leads to a data frame below:
Python Dataframe construction below: (easy reconstruction)
d = {'Country': ['City', 'PostCode','Day1','Day2','Day3'], 'UK': ['London', '123',47,42,40],'USA': ['New York', '456',31,22,58]}
dfstd = pd.DataFrame(data=d)
However, when I read in the data I need the 1st column to act as the multiIndex. Essentially creating a data frame as below:
arrays = [['UK','USA'],['London','New York'],['123','456']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['Country', 'City','Postcode'])
df = pd.DataFrame(np.random.randn(3, 2), index=['Day1', 'Day2', 'Day3'], columns=index)
df.columns
I was wondering if there is a simple way of achieving this via pd.read_csv or a pd.MultIndex construction ?
FYI I tried the below but couldn't get it working Load CSV to Pandas MultiIndex DataFrame
Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
I think the following is what you need:
dfcsv = pd.read_csv("/FilePath/MultiIndex_Example.csv", index_col=[0], header=[0,1,2])
Here, index_col
will take your first column which is 0
as index and header as 1st and 2nd row as header's which are 0,1,2
as its 0-indexed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With