Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dictionary to Dataframe Error: "If using all scalar values, you must pass an index"

Currently, I am using a for loop to read csv files from a folder. After reading the csv file, I am storing the data into one row of a dictionary. When I print the data types using "print(list_of_dfs.dtypes)" I receive:

dtype: object DATETIME : object VALUE : float64 ID : int64 ID Name: object.

Note that this is a nested dictionary with thousands of values stored in each of these data fields. I have 26 rows of the structure listed above. I am trying to append the dictionary rows into a dataframe where I will have only 1 row consisting of the datafields:

Index DATETIME VALUE ID ID Name.

Note: I am learning python as I go. I tried using an array to store the data and then convert the array to a dataframe but I could not append the rows of the dataframe.

Using the dictionary method I attempted "df = pd.Dataframe(list_of_dfs)" This throws an error.

list_of_dfs = {} 

for I in range(0,len(regionLoadArray)
list_of_dfs[I] = pd.read_csv(regionLoadArray[I]) 

#regionLoadArray contains my- file names from list directory.

dataframe = pd.DataFrame(list_of_dfs)
#this method was suggested at thispoint.com for nested dictionaries.
#This is where my error occurs^

ValueError: If using all scalar values, you must pass an index

I appreciate any assistance with this issue as I am new to python. My current goals is to simply produce a dataframe with my Headers that I can then send to a csv.

like image 869
Lonsdale_Energy Avatar asked Aug 23 '19 19:08

Lonsdale_Energy


3 Answers

Depending on your needs, a simple workaround could be:

dct = {'col1': 'abc', 'col2': 123}
dct = {k:[v] for k,v in dct.items()}  # WORKAROUND
df = pd.DataFrame(dct)

which results in

print(df)

  col1  col2
0  abc   123
like image 113
gebbissimo Avatar answered Sep 20 '22 12:09

gebbissimo


This error occurs because pandas needs an index. At first this seems sort of confusing because you think of list indexing. What this is essentially asking for is a column number for each dictionary to correspond to each dictionary. You can set this like so:

import pandas as pd
list = ['a', 'b', 'c', 'd']
df = pd.DataFrame(list, index = [0, 1, 2, 3])

The data frame then yields:

   0  
0 'a'
1 'b'
2 'c'
3 'd'

For you specifically, this might look something like this using numpy (not tested):

list_of_dfs = {} 

for I in range(0,len(regionLoadArray)):
    list_of_dfs[I] = pd.read_csv(regionLoadArray[I]) 

ind = np.arange[len(list_of_dfs)]

dataframe = pd.DataFrame(list_of_dfs, index = ind)
like image 36
angrymantis Avatar answered Sep 19 '22 12:09

angrymantis


Pandas unfortunately always needs an index when creating a DataFrame. You can either set it yourself, or use an object with the following structure so pandas can determine the index itself:

    data= {'a':[1],'b':[2]}

Since it won't be easy to edit the data in your case,

A hacky solution is to wrap the data into a list

    dataframe = pd.DataFrame([list_of_dfs])
like image 35
embulldogs99 Avatar answered Sep 20 '22 12:09

embulldogs99