Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: read many files in the working directory

Tags:

julia

I just started learning Julia and I want to read many csv files in my directory. How can I do that?

My directory has the files below and I want to read in all files from trip_data_1 to trip_data_12.

"trip_data_1.csv" "trip_data_10.csv" "trip_data_11.csv" "trip_data_12.csv" "trip_data_2.csv" "trip_data_3.csv" "trip_data_4.csv" "trip_data_5.csv" "trip_data_6.csv" "trip_data_7.csv" "trip_data_8.csv" "trip_data_9.csv" "trip_fare_1.csv" "trip_fare_10.csv" "trip_fare_11.csv" "trip_fare_12.csv" "trip_fare_2.csv" "trip_fare_3.csv" "trip_fare_4.csv" "trip_fare_5.csv" "trip_fare_6.csv" "trip_fare_7.csv" "trip_fare_8.csv" "trip_fare_9.csv"

This is what I have tried:

using DataFrames
df = readtable(filter!(r"^trip_data", readdir()))

But I get MethodError: no method matching readtable(::Array{String,1})

like image 264
Fisseha Berhane Avatar asked Mar 01 '17 03:03

Fisseha Berhane


3 Answers

I'm a big fan of . broadcasting syntax in this type of situation.

I.e. df = readtable.(filter(r"^trip_data", readdir())) will give you an array of data frames (@avysk is correct that you probably want filter not filter!.

If you want one single data frame then the mapreduceoption is good.

Or you can: vcat(readtable.(filter(r"^trip_data", readdir()))

NB: All of these are general solutions to the problem, I have a function (method) that applies f to x and now I want to apply it to many instances, or an array, of x

So if you get another error that indicates that you cannot apply a function directly to any array or collection, but you can to a single element, then map, broadcast/. & list comprehensions are your friends!

like image 73
Alexander Morley Avatar answered Sep 25 '22 06:09

Alexander Morley


You can do it like this:

reduce(vcat,  map(readtable, filter(r"^trip_data", readdir())))

Here map applies readtable to every filename matched by filter (you don't need filter! here) and joins all resulting dataframes together (vcat).

The same can be written with mapreduce:

mapreduce(readtable, vcat, filter(r"^trip_data", readdir()))
like image 20
avysk Avatar answered Sep 26 '22 06:09

avysk


Another method (which moves concatenation to the input String level instead of DataFrame level) and uses Iterators package:

readtable(IOBuffer(join(chain([drop((l for l in readlines(fn)),i>1?1:0) for (i,fn) in enumerate(filter!(r"^trip_data", readdir()))]...))))

This may actually save some time and allocations (in my pet example it did), but it depends on the parameters of the input files.

like image 40
Dan Getz Avatar answered Sep 23 '22 06:09

Dan Getz