Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make a Class(updating) using pandas data frame

I am trying to make a simple class which would store data as a dataframe and give a certain result. I have tried to write one as below:

import numpy as np
import pandas as pd

class logdata():
    def __init__(self,size):
        self.size = size
        self.x = None
        self.timestamp = None
        self.confidence = 100
        self.occurance = 1


    def createdf(self):
        self.df = pd.DataFrame(data = None, columns = ['Timestamp','Label','Occurance', 'Confidence'])
        return self.df


    def insertdf(self, x, timestamp):
        self.occurance = self.get_occurance()
        self.confidence = self.get_confidence()
        self.df.loc[-1] = pd.Series({'Timestamp':timestamp, 'Label': x, 'Occurance':self.occurance, 'Confidence':self.confidence})
        self.df.index = self.df.index + 1
        self.df = self.df.sort_index()
        self.df = self.del_row()
        return self.df

    def get_occurance(self):
        return self.df.loc[self.df.Label == self.x, 'Label'].count()

    def get_confidence(self):
        y = self.df.shape[0]
        if y:
            conf = (self.occurance/self.df.shape[0])*100
        else:
            conf = 100
        return conf

    def del_row(self):
        if self.df.shape[0] > int(self.size):
            self.df = self.df.iloc[self.size:]
        return self.df

    def get_result(self):
        return self.df.loc[self.createdf['Confidence'].idxmax()]

What this does is when I pass a data such as integer it will create a new empty dataframe if there is none present and store it in the first line by calling say ld = logdata(){I can also set max size as ld.size = 10} followed by ld.createdf(), then I would insert the first data into the dataframe by calling ld.insertdf(x,timestamp) which would compute the occurance(default = 1) and confidence(mean as percentage, default = 100) by the following functions. Finally, I want to extract the data which has the highest confidence by calling ld.getresult() which I would like to send to a server(I know this part) using pymongo.

I am not much of a data structures guy, just a noob in python. I searched for a lot of tutorials but ended up with getting tut for subclass of dataframes. This doesnt seem to work, if possible please help me with the mistakes. You are free to criticise this constructively. It'll help me a lot, thanks.

Here's an example: Suppose I have a binary Label 1 and 0 with size 3, so i'll first set ld.size = 3 Then my inputs will be ld.insertdf(0,1500) which will create:

Timestamp | Label | Occurance | Confidence 
|   1500  |     0 |          1|        100

Then I add ld.insertdf(0,1530) Which updates to:

Timestamp | Label | Occurance | Confidence 
|   1530  |     0 |          2|        100
|   1500  |     0 |          2|        100

Finally when I add ld.insertdf(1,1600) It should update to:

Timestamp | Label | Occurance | Confidence 
|   1600  |     1 |          1|         33
|   1530  |     0 |          2|         66
|   1500  |     0 |          2|         66

When I add another ld.insertdf(0,1630) It will change the df as:

Timestamp | Label | Occurance | Confidence 
|   1630  |     0 |          2|         66
|   1600  |     1 |          1|         33
|   1530  |     0 |          2|         66

as size limit is 3. PS - In the comment I reversed the index while explaining, but the method is self explanatory.

ld.get_result() will just give me the label with the highest dataset which is also the latest input, i.e: 1630,0,2,66

Edit: I have editied the code which allows me to create a dataframe but it doesn't update the occurance and confidences.

like image 515
Sayan Mandal Avatar asked Oct 29 '25 17:10

Sayan Mandal


1 Answers

See revised code below. This should give you the output you're looking for. If you need clarification on any of the code do let me know - but it's quite self explanatory.

import pandas as pd

class logdata:
    def __init__(self, size):
        self.size = size
        self.df = pd.DataFrame(data = None, 
                               columns = ['Timestamp','Label','Occurance', 'Confidence'],
                              )

    def insertdf(self, x, timestamp):
        # default values
        occurance = 1
        confidence = 100

        self.df = self.df.append(pd.Series({
            'Timestamp': timestamp, 
            'Label': x, 
            'Occurance': occurance, 
            'Confidence': confidence
        }), ignore_index=True)

        self.df.sort_index(inplace=True, ascending=False)
        self.del_row()

        # Calculate the confidence and occurances of labels
        if self.df.shape[0] > 1:
            occurance = self.get_occurance()
            confidence = self.get_confidence(occurance)

            self.df['Occurance'] = self.df.Label.apply(lambda x: occurance[x])
            self.df['Confidence'] = self.df.Label.apply(lambda x: confidence[x])

        return self.df

    def get_occurance(self):
        # group by label and count
        occ = self.df.groupby('Label').Timestamp.count().rename('Occurance').astype(int)
        return occ

    def get_confidence(self, occurance):
        conf = ((occurance / sum(occurance)).rename('Confidence') * 100).astype(int)
        return conf

    def del_row(self):
        if self.df.shape[0] > int(self.size):
            self.df = self.df.head(self.size)

    def get_result(self):
        return self.df.loc[self.df['Confidence'].idxmax()]
like image 123
gyx-hh Avatar answered Oct 31 '25 07:10

gyx-hh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!