Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get data from pickle files into a pandas dataframe

I'm working on a social media sentiment analysis for a class. I have gotten all of the tweets about the Kentucky Derby for a 2 month period saved into pkl files.

My question is: how do I get all of these pickle dump files loaded into a dataframe?

Here is my code:

import sklearn as sk
import pandas as pd
import  got3

def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
    yield start_date + timedelta(n)

start_date = date(2016, 3, 31)
end_date = date(2016, 6, 1)

dates = []

for single_date in daterange(start_date, end_date):
    dates.append(single_date.strftime("%Y-%m-%d"))

for i in range(len(dates)-1): 
    this_date = dates[i]
    tomorrow_date = dates[i+1]
    print("Getting tweets for " + tomorrow_date)
    tweetCriteria = got3.manager.TweetCriteria()
    tweetCriteria.setQuerySearch("Kentucky Derby")
    tweetCriteria.setQuerySearch("KYDerby")
    tweetCriteria.setSince(this_date)
    tweetCriteria.setUntil(tomorrow_date)
    Kentucky_Derby_tweets = got3.manager.TweetManager.getTweets(tweetCriteria)
    pkl.dump(Kentucky_Derby_tweets, open(tomorrow_date + ".pkl", "wb"))
like image 850
Andrew Smith Avatar asked Oct 21 '16 15:10

Andrew Smith


People also ask

How do you load data from pickles?

Python Pickle load To retrieve pickled data, the steps are quite simple. You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode.

How do I read a pickle file in Python?

The process of loading a pickled file back into a Python program is similar to the one you saw previously: use the open() function again, but this time with 'rb' as second argument (instead of wb ). The r stands for read mode and the b stands for binary mode. You'll be reading a binary file.

Is pickle better than CSV?

Pickle is around 11 times faster this time, when not compressed. The compression is a huge pain point when reading and saving files. But, let's see how much disk space does it save. The file size decrease when compared to CSV is significant, but the compression doesn't save that much disk space in this case.


1 Answers

You can use

  1. pd.read_pickle(filename)
  2. add it to a list
  3. then pd.concat(thelist)
like image 127
simon Avatar answered Sep 18 '22 23:09

simon