Make a Class(updating) using pandas data frame

Question

I am trying to make a simple class which would store data as a dataframe and give a certain result. I have tried to write one as below:

import numpy as np
import pandas as pd

class logdata():
    def __init__(self,size):
        self.size = size
        self.x = None
        self.timestamp = None
        self.confidence = 100
        self.occurance = 1


    def createdf(self):
        self.df = pd.DataFrame(data = None, columns = ['Timestamp','Label','Occurance', 'Confidence'])
        return self.df


    def insertdf(self, x, timestamp):
        self.occurance = self.get_occurance()
        self.confidence = self.get_confidence()
        self.df.loc[-1] = pd.Series({'Timestamp':timestamp, 'Label': x, 'Occurance':self.occurance, 'Confidence':self.confidence})
        self.df.index = self.df.index + 1
        self.df = self.df.sort_index()
        self.df = self.del_row()
        return self.df

    def get_occurance(self):
        return self.df.loc[self.df.Label == self.x, 'Label'].count()

    def get_confidence(self):
        y = self.df.shape[0]
        if y:
            conf = (self.occurance/self.df.shape[0])*100
        else:
            conf = 100
        return conf

    def del_row(self):
        if self.df.shape[0] > int(self.size):
            self.df = self.df.iloc[self.size:]
        return self.df

    def get_result(self):
        return self.df.loc[self.createdf['Confidence'].idxmax()]

What this does is when I pass a data such as integer it will create a new empty dataframe if there is none present and store it in the first line by calling say ld = logdata(){I can also set max size as ld.size = 10} followed by ld.createdf(), then I would insert the first data into the dataframe by calling ld.insertdf(x,timestamp) which would compute the occurance(default = 1) and confidence(mean as percentage, default = 100) by the following functions. Finally, I want to extract the data which has the highest confidence by calling ld.getresult() which I would like to send to a server(I know this part) using pymongo.

I am not much of a data structures guy, just a noob in python. I searched for a lot of tutorials but ended up with getting tut for subclass of dataframes. This doesnt seem to work, if possible please help me with the mistakes. You are free to criticise this constructively. It'll help me a lot, thanks.

Here's an example: Suppose I have a binary Label 1 and 0 with size 3, so i'll first set ld.size = 3 Then my inputs will be ld.insertdf(0,1500) which will create:

Timestamp | Label | Occurance | Confidence 
|   1500  |     0 |          1|        100

Then I add ld.insertdf(0,1530) Which updates to:

Timestamp | Label | Occurance | Confidence 
|   1530  |     0 |          2|        100
|   1500  |     0 |          2|        100

Finally when I add ld.insertdf(1,1600) It should update to:

Timestamp | Label | Occurance | Confidence 
|   1600  |     1 |          1|         33
|   1530  |     0 |          2|         66
|   1500  |     0 |          2|         66

When I add another ld.insertdf(0,1630) It will change the df as:

Timestamp | Label | Occurance | Confidence 
|   1630  |     0 |          2|         66
|   1600  |     1 |          1|         33
|   1530  |     0 |          2|         66

as size limit is 3. PS - In the comment I reversed the index while explaining, but the method is self explanatory.

ld.get_result() will just give me the label with the highest dataset which is also the latest input, i.e: 1630,0,2,66

Edit: I have editied the code which allows me to create a dataframe but it doesn't update the occurance and confidences.

gyx-hh · Accepted Answer

See revised code below. This should give you the output you're looking for. If you need clarification on any of the code do let me know - but it's quite self explanatory.

import pandas as pd

class logdata:
    def __init__(self, size):
        self.size = size
        self.df = pd.DataFrame(data = None, 
                               columns = ['Timestamp','Label','Occurance', 'Confidence'],
                              )

    def insertdf(self, x, timestamp):
        # default values
        occurance = 1
        confidence = 100

        self.df = self.df.append(pd.Series({
            'Timestamp': timestamp, 
            'Label': x, 
            'Occurance': occurance, 
            'Confidence': confidence
        }), ignore_index=True)

        self.df.sort_index(inplace=True, ascending=False)
        self.del_row()

        # Calculate the confidence and occurances of labels
        if self.df.shape[0] > 1:
            occurance = self.get_occurance()
            confidence = self.get_confidence(occurance)

            self.df['Occurance'] = self.df.Label.apply(lambda x: occurance[x])
            self.df['Confidence'] = self.df.Label.apply(lambda x: confidence[x])

        return self.df

    def get_occurance(self):
        # group by label and count
        occ = self.df.groupby('Label').Timestamp.count().rename('Occurance').astype(int)
        return occ

    def get_confidence(self, occurance):
        conf = ((occurance / sum(occurance)).rename('Confidence') * 100).astype(int)
        return conf

    def del_row(self):
        if self.df.shape[0] > int(self.size):
            self.df = self.df.head(self.size)

    def get_result(self):
        return self.df.loc[self.df['Confidence'].idxmax()]

Make a Class(updating) using pandas data frame

Tags:

python

database

pandas

data-structures

mongodb

Sayan Mandal

1 Answers

gyx-hh

Recent Activity

Donate For Us

Make a Class(updating) using pandas data frame

Tags:

python

database

pandas

data-structures

mongodb

Sayan Mandal

1 Answers

gyx-hh

Related questions

Recent Activity

Donate For Us