I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.
I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway.
classes.py:
#imports:
import urllib
import re
#classes:
class Streamer:
#constructor:
def __init__(self, name, mode, link):
self.name = name
self.mode = mode
self.link = link
class Information:
#constructor:
def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
self.TWITCH_STREAMS = TWITCH_STREAMS
self.GAME = GAME
self.STREAMER_INFO = STREAMER_INFO
def get_game_streamer_names(self):
"Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
#start connection
self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
self.info = self.con.read()
self.con.close()
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
#run in a for to reduce all "live_user_NAME" values
for name in self.streamers_names:
if name.startswith("live_user_"):
self.streamers_names.remove(name)
#end method
return self.streamers_names
def get_streamer_mode(self, name):
"Returns a streamers mode (on/off)"
#start connection
self.con = urllib2.urlopen(self.STREAMER_INFO + name)
self.info = self.con.read()
self.con.close()
#check if stream is online or offline ("stream":null indicates offline stream)
if self.info.count('"stream":null') > 0:
return "offline"
else:
return "online"
main.py:
#imports:
from classes import *
#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"
def main():
#create an information object
info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
streamer_list = [] #create a streamer list
for name in info.get_game_streamer_names():
#run for every streamer name, create a streamer object and place it in the list
mode = info.get_streamer_mode(name)
streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
streamer_list.append(streamer_name)
#this line is just to try and print something
print streamer_list[0].name, streamer_list[0].mode
if __name__ == '__main__':
main()
the program itself works perfectly, just really slow
any ideas?
Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). That is, 80% of the time the program is actually running in 20% of the code. In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.
The best way to do this is to profile your code. This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers, or simply print out the current time at very points of the program. Eventually, you will find one part of the code that seems to be taking the most amount of time.
Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.
There would be a few ways to fix this if this is indeed the case:
You are using the wrong tool here to parse the json data returned by your URL. You need to use json library provided by default rather than parsing the data using regex. This will give you a boost in your program's performance
Change the regex parser
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
To json parser
self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator
return (stream['name'] for stream in data[u'streams'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With