I want to specify data types for pandas read_csv. Here's a quick look at something that does work and then doesn't when types are specified. Why doesn't the latter work?
import io
import pandas as pd
csv = """foo,1234567,a,1
foo,2345678,b,3
bar,3456789,b,5
"""
df = pd.read_csv(io.StringIO(csv),
names=["fb", "num", "loc", "x"])
print(df)
df = pd.read_csv(io.StringIO(csv),
names=["fb", "num", "loc", "x"],
dtype=["|S3", "np.int64", "|S1", "np.int8"])
print(df)
I've updated to make this much simpler and, hopefully, clearer on BrenBarn's suggestion. My real dataset is much larger, but I'd like to use the method to generate types for all my data on import.
As Jeff indicated, my syntax was bad. The names and types have to be zipped into a dic style list of relationships. The code below works, but note that you can't dtype a string width; you can only define it as an object.
import pandas as pd
import io
csv = """foo,1234567,a,1
foo,2345678,b,3
bar,3456789,b,5
"""
df = pd.read_csv(io.StringIO(csv),
names = ["fb", "num", "ab", "x"],
dtype = {"fb" : object, "num" : np.int64, "ab" : object, "x" : np.int8})
print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With