I have initialised a dataframe like this:
df = pd.DataFrame(columns=["stockname","timestamp","price","volume"])
df.timestamp = pd.to_datetime(df.timestamp, format = "%Y-%m-%d %H:%M:%S:%f")
df.set_index(['stockname', 'timestamp'], inplace = True)
Now I get stream of data from somewhere but for the sake of program let me write it like this:
filehandle = open("datasource")
for line in filehandle:
line = line.rstrip()
data = line.split(",")
stockname = data[4]
price = float(data[3])
timestamp = pd.to_datetime(data[0], format = "%Y-%m-%d %H:%M:%S:%f")
volume = int(data[6])
df.loc[stockname, timestamp] = [price, volume]
filehandle.close()
print df
but this is giving error:
ValueError: cannot set using a multi-index selection indexer with a different length than the value
Specify the column names you are assigning data to i.e
df = pd.DataFrame(columns=["a","b","c","d"])
df.set_index(['a', 'b'], inplace = True)
df.loc[('3','4'),['c','d']] = [4,5]
df.loc[('4','4'),['c','d']] = [3,1]
c d
a b
3 4 4.0 5.0
4 4 3.0 1.0
Also if you have a comma separated file then you can use read_csv i.e :
import io
import pandas as pd
st = '''2017-12-08 15:29:58:740657,245.0,426001,248.65,APPL,190342,2075673,249.35,244.2
2017-12-08 16:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
2017-12-08 18:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
'''
#instead of `io`, add the source name
df = pd.read_csv(io.StringIO(st),header=None)
# Now set the index and select what you want
df.set_index([0,4])[[1,7]]
1 7
0 4
2017-12-08 15:29:58.740657 APPL 245.0 249.35
2017-12-08 16:29:58.740657 GOOGL 245.0 249.35
2017-12-08 18:29:58.740657 GOOGL 245.0 249.35
You might want to use df.at[index, column_name] = value to escape this error
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With