I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:
names = ['A','B','C','D'] dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset['A','D']
I would like to create a new dataframe with the columns A and D from the original dataframe.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .
It is called subset
- passed list of columns in []
:
dataset = pandas.read_csv('file.csv', names=names) new_dataset = dataset[['A','D']]
what is same as:
new_dataset = dataset.loc[:, ['A','D']]
If need only filtered output add parameter usecols
to read_csv
:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
EDIT:
If use only:
new_dataset = dataset[['A','D']]
and use some data manipulation, obviously get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you modify values in new_dataset
later you will find that the modifications do not propagate back to the original data (dataset
), and that Pandas does warning.
As pointed EdChum add copy
for remove warning:
new_dataset = dataset[['A','D']].copy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With