Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Name columns when importing csv to dataframe in dask

I would like to name columns when I import a csv to a dataframe with dask in Python.The code I use looks like this:

for i  in range(1, files + 1):
    filename = str(i) + 'GlobalActorsHeatMap.csv'
    runs[i] = dd.read_csv(filename, header=None)

I would like to use an array with names for each column:

names = ['tribute', 'percent_countries_active', 'num_wars', 'num_tributes', 'war', 'war_to_tribute_ratio', 'US_wealth', 'UK_wealth', 'NZ_wealth' ]

Is this possible to do directly?

like image 227
Jim Caton Avatar asked Mar 17 '16 13:03

Jim Caton


People also ask

How do I name a column in pandas CSV?

Call pandas. read_csv(filepath_or_buffer, names = None) with filepath_or_buffer set to the filename of the . csv and names set to the list of column names. The column names will be assigned to each column of the resultant DataFrame in the order they appear in names .

What is names in PD read_csv?

names parameter in read_csv function is used to define column names. If you pass extra name in this list, it will add another new column with that name with NaN values. header=None is used to trim column names is already exists in CSV file.

Is Dask faster than pandas?

Dask runs faster than pandas for this query, even when the most inefficient column type is used, because it parallelizes the computations. pandas only uses 1 CPU core to run the query. My computer has 4 cores and Dask uses all the cores to run the computation.


1 Answers

Just use the names argument for the read_csv

names = [...]
dd.read_csv(filename, header=None, names=names)

Read more here

like image 52
Sevanteri Avatar answered Nov 14 '22 23:11

Sevanteri