I just started learning Julia and I want to read many csv files in my directory. How can I do that?
My directory has the files below and I want to read in all files from trip_data_1 to trip_data_12.
"trip_data_1.csv" "trip_data_10.csv" "trip_data_11.csv" "trip_data_12.csv" "trip_data_2.csv" "trip_data_3.csv" "trip_data_4.csv" "trip_data_5.csv" "trip_data_6.csv" "trip_data_7.csv" "trip_data_8.csv" "trip_data_9.csv" "trip_fare_1.csv" "trip_fare_10.csv" "trip_fare_11.csv" "trip_fare_12.csv" "trip_fare_2.csv" "trip_fare_3.csv" "trip_fare_4.csv" "trip_fare_5.csv" "trip_fare_6.csv" "trip_fare_7.csv" "trip_fare_8.csv" "trip_fare_9.csv"
This is what I have tried:
using DataFrames
df = readtable(filter!(r"^trip_data", readdir()))
But I get MethodError: no method matching readtable(::Array{String,1})
I'm a big fan of .
broadcast
ing syntax in this type of situation.
I.e. df = readtable.(filter(r"^trip_data", readdir()))
will give you an array of data frames (@avysk is correct that you probably want filter
not filter!
.
If you want one single data frame then the mapreduce
option is good.
Or you can: vcat(readtable.(filter(r"^trip_data", readdir()))
NB: All of these are general solutions to the problem, I have a function (method) that applies f
to x
and now I want to apply it to many instances, or an array, of x
So if you get another error that indicates that you cannot apply a function directly to any array or collection, but you can to a single element, then map
, broadcast
/.
& list comprehensions are your friends!
You can do it like this:
reduce(vcat, map(readtable, filter(r"^trip_data", readdir())))
Here map
applies readtable
to every filename matched by filter
(you don't need filter!
here) and joins all resulting dataframes together (vcat
).
The same can be written with mapreduce
:
mapreduce(readtable, vcat, filter(r"^trip_data", readdir()))
Another method (which moves concatenation to the input String level instead of DataFrame level) and uses Iterators
package:
readtable(IOBuffer(join(chain([drop((l for l in readlines(fn)),i>1?1:0) for (i,fn) in enumerate(filter!(r"^trip_data", readdir()))]...))))
This may actually save some time and allocations (in my pet example it did), but it depends on the parameters of the input files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With