I was going through pandas documentation. And it quoted that
I have a sample csv data file.
Date
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
22-01-1943
15-10-1932
23-11-1910
04-05-2000
02-02-1943
01-01-1943
28-08-1943
31-12-1943
Next I tried
In [174]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"])
1.5 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [175]: %timeit df = pd.read_csv("a.csv", parse_dates=["Date"], infer_datetime_format=True)
1.73 ms ± 45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So, according to the documentation it should be less time. Is my understanding correct? Or on what data does the statement hold good?
Update: Pandas version - '1.0.5'
What you actually want to do is add dayfirst = True
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"],dayfirst = True, infer_datetime_format=True)
1.96 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compared to
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"])
2.38 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
and
%timeit df = pd.read_csv("C:/Users/k_sego/Dates.csv", parse_dates=["Date"], infer_datetime_format=True)
3.02 ms ± 670 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The solution is to reduce the number of choices read_csv has to do things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With