I would like to name columns when I import a csv to a dataframe with dask in Python.The code I use looks like this:
for i in range(1, files + 1): filename = str(i) + 'GlobalActorsHeatMap.csv' runs[i] = dd.read_csv(filename, header=None)
I would like to use an array with names for each column:
names = ['tribute', 'percent_countries_active', 'num_wars', 'num_tributes', 'war', 'war_to_tribute_ratio', 'US_wealth', 'UK_wealth', 'NZ_wealth' ]
Is this possible to do directly?
Call pandas. read_csv(filepath_or_buffer, names = None) with filepath_or_buffer set to the filename of the . csv and names set to the list of column names. The column names will be assigned to each column of the resultant DataFrame in the order they appear in names .
names parameter in read_csv function is used to define column names. If you pass extra name in this list, it will add another new column with that name with NaN values. header=None is used to trim column names is already exists in CSV file.
Dask runs faster than pandas for this query, even when the most inefficient column type is used, because it parallelizes the computations. pandas only uses 1 CPU core to run the query. My computer has 4 cores and Dask uses all the cores to run the computation.
Just use the names
argument for the read_csv
names = [...]
dd.read_csv(filename, header=None, names=names)
Read more here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With