I am trying to make a simple class which would store data as a dataframe and give a certain result. I have tried to write one as below:
import numpy as np
import pandas as pd
class logdata():
def __init__(self,size):
self.size = size
self.x = None
self.timestamp = None
self.confidence = 100
self.occurance = 1
def createdf(self):
self.df = pd.DataFrame(data = None, columns = ['Timestamp','Label','Occurance', 'Confidence'])
return self.df
def insertdf(self, x, timestamp):
self.occurance = self.get_occurance()
self.confidence = self.get_confidence()
self.df.loc[-1] = pd.Series({'Timestamp':timestamp, 'Label': x, 'Occurance':self.occurance, 'Confidence':self.confidence})
self.df.index = self.df.index + 1
self.df = self.df.sort_index()
self.df = self.del_row()
return self.df
def get_occurance(self):
return self.df.loc[self.df.Label == self.x, 'Label'].count()
def get_confidence(self):
y = self.df.shape[0]
if y:
conf = (self.occurance/self.df.shape[0])*100
else:
conf = 100
return conf
def del_row(self):
if self.df.shape[0] > int(self.size):
self.df = self.df.iloc[self.size:]
return self.df
def get_result(self):
return self.df.loc[self.createdf['Confidence'].idxmax()]
What this does is when I pass a data such as integer it will create a new empty dataframe if there is none present and store it in the first line by calling say ld = logdata(){I can also set max size as ld.size = 10} followed by ld.createdf(), then I would insert the first data into the dataframe by calling ld.insertdf(x,timestamp) which would compute the occurance(default = 1) and confidence(mean as percentage, default = 100) by the following functions. Finally, I want to extract the data which has the highest confidence by calling ld.getresult() which I would like to send to a server(I know this part) using pymongo.
I am not much of a data structures guy, just a noob in python. I searched for a lot of tutorials but ended up with getting tut for subclass of dataframes. This doesnt seem to work, if possible please help me with the mistakes. You are free to criticise this constructively. It'll help me a lot, thanks.
Here's an example:
Suppose I have a binary Label 1 and 0 with size 3, so i'll first set ld.size = 3
Then my inputs will be ld.insertdf(0,1500)
which will create:
Timestamp | Label | Occurance | Confidence
| 1500 | 0 | 1| 100
Then I add ld.insertdf(0,1530)
Which updates to:
Timestamp | Label | Occurance | Confidence
| 1530 | 0 | 2| 100
| 1500 | 0 | 2| 100
Finally when I add ld.insertdf(1,1600)
It should update to:
Timestamp | Label | Occurance | Confidence
| 1600 | 1 | 1| 33
| 1530 | 0 | 2| 66
| 1500 | 0 | 2| 66
When I add another ld.insertdf(0,1630)
It will change the df as:
Timestamp | Label | Occurance | Confidence
| 1630 | 0 | 2| 66
| 1600 | 1 | 1| 33
| 1530 | 0 | 2| 66
as size limit is 3. PS - In the comment I reversed the index while explaining, but the method is self explanatory.
ld.get_result() will just give me the label with the highest dataset which is also the latest input, i.e: 1630,0,2,66
Edit: I have editied the code which allows me to create a dataframe but it doesn't update the occurance and confidences.
See revised code below. This should give you the output you're looking for. If you need clarification on any of the code do let me know - but it's quite self explanatory.
import pandas as pd
class logdata:
def __init__(self, size):
self.size = size
self.df = pd.DataFrame(data = None,
columns = ['Timestamp','Label','Occurance', 'Confidence'],
)
def insertdf(self, x, timestamp):
# default values
occurance = 1
confidence = 100
self.df = self.df.append(pd.Series({
'Timestamp': timestamp,
'Label': x,
'Occurance': occurance,
'Confidence': confidence
}), ignore_index=True)
self.df.sort_index(inplace=True, ascending=False)
self.del_row()
# Calculate the confidence and occurances of labels
if self.df.shape[0] > 1:
occurance = self.get_occurance()
confidence = self.get_confidence(occurance)
self.df['Occurance'] = self.df.Label.apply(lambda x: occurance[x])
self.df['Confidence'] = self.df.Label.apply(lambda x: confidence[x])
return self.df
def get_occurance(self):
# group by label and count
occ = self.df.groupby('Label').Timestamp.count().rename('Occurance').astype(int)
return occ
def get_confidence(self, occurance):
conf = ((occurance / sum(occurance)).rename('Confidence') * 100).astype(int)
return conf
def del_row(self):
if self.df.shape[0] > int(self.size):
self.df = self.df.head(self.size)
def get_result(self):
return self.df.loc[self.df['Confidence'].idxmax()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With