Pandas with different length arrays

Tags:

This is the code I have. Due to content of the raw data to be parsed, I end up with the 'user list' and the 'tweet list' being of different length. When writing the lists as columns in a data frame, I get ValueError: arrays must all be same length. I realize this, but have been looking for a way to work around it, printing 0 or NaN in the right places of the shorter array. Any ideas?

import pandas
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('#raw.html'))
chunk = soup.find_all('div', class_='content')

userlist = []
tweetlist = []

for tweet in chunk:
    username = tweet.find_all(class_='username js-action-profile-name')
    for user in username:
        user2 = user.get_text()
        userlist.append(user2)

for text in chunk:
    tweets = text.find_all(class_='js-tweet-text tweet-text')
for tweet in tweets:
    tweet2 = tweet.get_text().encode('utf-8')
    tweetlist.append('|'+tweet2)

print len(tweetlist)
print len(userlist)

#MAKE A DATAFRAME WITH THIS
data = {'tweet' : tweetlist, 'user' : userlist}
frame = pandas.DataFrame(data)
print frame

# Export dataframe to csv
frame.to_csv('#parsed.csv', index=False)

623

asked Mar 01 '15 20:03

DIGSUM

2 Answers

I'm not sure that this is exactly what you want, but anyway:

d = dict(tweets=tweetlist, users=userlist)
pandas.DataFrame({k : pandas.Series(v) for k, v in d.iteritems()})

answered Sep 28 '22 10:09

Dmitriy Kuznetsov

Try this:

frame = pandas.DataFrame.from_dict(d, orient='index')

After that, you should transpose your frame with:

frame = frame.transpose()

Then you can export to csv:

frame.to_csv('#parsed.csv', index=False)

answered Sep 28 '22 10:09

Ekrem Gurdal

Related questions
                            
                                Iterate over rows and expand pandas dataframe
                            
                                Argparse: Making required flags
                            
                                How can I have multiple clients on a TCP Python Chat Server?
                            
                                pip doesn't work after upgrading to OS X Yosemite
                            
                                Align matplotlib scatter marker left and or right
                            
                                using self in python @patch decorator
                            
                                Numpy array, insert alternate rows of zeros
                            
                                Convert int to 16 bit unsigned short
                            
                                Scipy - find bases of column space of matrix
                            
                                getting socket id of a client in flask socket.io
                            
                                Read a File from redirected stdin with python
                            
                                How do I create a dictionary from a string returning the number of characters [duplicate]
                            
                                error installing nltk supporting packages : nltk.download()
                            
                                Python - find out how much of an image is black
                            
                                How to create a Python script to automate software installation? [closed]
                            
                                Custom exceptions are not raised properly when used in Multiprocessing Pool
                            
                                How to run a Python unit test with the Atom editor?
                            
                                Assert mocked function called with json string in python
                            
                                UnicodeDecodeError when logging an Exception in Python
                            
                                Python subclassing process with initialiser

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas with different length arrays

Tags:

python

arrays

pandas

dataframe

DIGSUM

People also ask

2 Answers

Dmitriy Kuznetsov

Ekrem Gurdal

Recent Activity

Donate For Us