I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:
names = ['A','B','C','D'] dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset['A','D'] I would like to create a new dataframe with the columns A and D from the original dataframe.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .
It is called subset - passed list of columns in []:
dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset[['A','D']] what is same as:
new_dataset = dataset.loc[:, ['A','D']] If need only filtered output add parameter usecols to read_csv:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D']) EDIT:
If use only:
new_dataset = dataset[['A','D']] and use some data manipulation, obviously get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.
As pointed EdChum add copy for remove warning:
new_dataset = dataset[['A','D']].copy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With