Currently, I am using a for loop to read csv files from a folder. After reading the csv file, I am storing the data into one row of a dictionary. When I print the data types using "print(list_of_dfs.dtypes)" I receive:
dtype: object DATETIME : object VALUE : float64 ID : int64 ID Name: object.
Note that this is a nested dictionary with thousands of values stored in each of these data fields. I have 26 rows of the structure listed above. I am trying to append the dictionary rows into a dataframe where I will have only 1 row consisting of the datafields:
Index DATETIME VALUE ID ID Name.
Note: I am learning python as I go. I tried using an array to store the data and then convert the array to a dataframe but I could not append the rows of the dataframe.
Using the dictionary method I attempted "df = pd.Dataframe(list_of_dfs)" This throws an error.
list_of_dfs = {}
for I in range(0,len(regionLoadArray)
list_of_dfs[I] = pd.read_csv(regionLoadArray[I])
#regionLoadArray contains my- file names from list directory.
dataframe = pd.DataFrame(list_of_dfs)
#this method was suggested at thispoint.com for nested dictionaries.
#This is where my error occurs^
ValueError: If using all scalar values, you must pass an index
I appreciate any assistance with this issue as I am new to python. My current goals is to simply produce a dataframe with my Headers that I can then send to a csv.
Depending on your needs, a simple workaround could be:
dct = {'col1': 'abc', 'col2': 123}
dct = {k:[v] for k,v in dct.items()} # WORKAROUND
df = pd.DataFrame(dct)
which results in
print(df)
col1 col2
0 abc 123
This error occurs because pandas needs an index. At first this seems sort of confusing because you think of list indexing. What this is essentially asking for is a column number for each dictionary to correspond to each dictionary. You can set this like so:
import pandas as pd
list = ['a', 'b', 'c', 'd']
df = pd.DataFrame(list, index = [0, 1, 2, 3])
The data frame then yields:
0
0 'a'
1 'b'
2 'c'
3 'd'
For you specifically, this might look something like this using numpy (not tested):
list_of_dfs = {}
for I in range(0,len(regionLoadArray)):
list_of_dfs[I] = pd.read_csv(regionLoadArray[I])
ind = np.arange[len(list_of_dfs)]
dataframe = pd.DataFrame(list_of_dfs, index = ind)
Pandas unfortunately always needs an index when creating a DataFrame. You can either set it yourself, or use an object with the following structure so pandas can determine the index itself:
data= {'a':[1],'b':[2]}
Since it won't be easy to edit the data in your case,
A hacky solution is to wrap the data into a list
dataframe = pd.DataFrame([list_of_dfs])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With