Hi Cedric,
There are several points to touch here.
First you may want to have a look at the animation tutorial. This introduces the draw() command. Code you put in the draw() command is called several times each second - this allows you to create animations, or do different things at specific times and events (e.g. user clicks mouse, web content is done downloading).
If you don't use an animation, NodeBox will first execute all of the code (i.e. "freeze") and then draw it, as you mentioned.
Next, you may want to use the Web library. It offers support for asynchronous downloads (in the background) so you don't have to bother about threading yourself.
Here's a short example. It continues to download the current URL until it is ready, at which point it adds the URL to the graph and starts downloading the next one.
size(500, 500) web = ximport("web") urls = web.yahoo.search("nodebox") i = 0 # the current URL to retrieve download = web.url.retrieve(urls[i], asynchronous=True) # Clear cache for live downloads each tim the script runs. #web.clear_cache() graph = ximport("graph") g = graph.create() g.add_node("root") speed(30) def draw(): global download, i # Once we are done downloading a URL, # add it to the graph and start downloading the next. # The retrieved content is stored in download.data. if download != None and download.done: html = download.data # parse the stuff you need from "html" string. g.add_edge(urls[i], "root") g.layout.refresh() if i < len(urls)-1: i += 1 download = web.url.retrieve(urls[i], asynchronous=True) else: # There is nothing left to download. download = None g.draw()
Dear Tom. Thanks very much for your help. The web module looks really interesting. I just tweaked my code in order to follow your suggestions. But I stumbled upon the following problems:
The webpages I would like to retrieve are written in XML. This is simpler for me to use this instead of parsing a non-standard HTML webpage (which could nonetheless be done of course, by writing a new abstraction class). No dedicated XML methods seem to be provided in this otherwise-excellent module.
However, I discovered that I simply cannot use the web module behind a proxy. Therefore, I cannot even test if the web module could be tweaked for XML...
I found elsewhere how to Thread my webpage queries for the moment. But I will keep following the story if you reply.
Cedric
You can retrieve all sorts of stuff with web.url.retrieve(). It will simply yield the file contents as a string. If there's XML inside, you can use Python's xml.dom.minidom module to parse the string, for example.
Proxies should not be able to influence the Web library. Are you receiving an error message?
I tried web.url.retrieve() but enver managed to get something. So I tried with just a piece of code, and used web.is_url(), which answered False (all other is_xxx() methods were also answering False). I looked inside the module itself, and the little is_url() definition states that it simply checks if it can connects. So I concluded that it was simply failing to connect. The proxy of my university is a recurrent issue, so I thought that was the source of the problem.
Maybe I have overlooked a bit the problem, since I had already a WebPage class using urllib2 that was able to connect.
I was also wondering if web.url.retrieve() was able to retrieve non-static page. I guess so of course, but just wondering. The URLl I am trying to retrieve looks like:
http://www.server.org/abs/bibcode&format=short_xml
I'll give another try today. Thanks for your help! I'll definitely continue to explore NodeBox capabilities, and it is already helping me in my research.
Cedric
Cedric
Back to work, and therefore behing a restrictive proxy, I made some tests. Here is a simple piece of code, with an example of URL I am trying to retrieve:
web = ximport('web') html = web.url.retrieve('http://adsabs.harvard.edu/abs/1976PASP...88..917C&data_type=SHORT_XML') print html, html.dataWhen run the first time, I get the following message:
/Users/cedric/Library/Application Support/NodeBox/web/url.py:414: Warning: in web.url.URLAccumulator for http://adsabs.harvard.edu/abs/1976PASP...88..917C&data_type=SHORT_XML
For subsequent execution, I get only:
So it returns an object, but its data attribute is empty.
Hm... The comment textbox in this website is very sensitive to any presence of lesser and greater signs used in HTML tags. The object I got is this, removing the problematic signs:
web.url.URLAccumulator instance at 0x1749a288
Well, I was able to get data from the URL in your example, so no problem there... Can you try the following:
from urllib2 import urlopen print urlopen("http://adsabs.harvard.edu/abs/1976PASP...88..917C&data_type=SHORT_XML").read()and tell me if that yields an error?
With no proxy, it works, of course. Behind the proxy, it freezes until timeout is raised:
Traceback (most recent call last): File "nodebox/gui/mac/__init__.pyo", line 358, in _execScript File "", line 2, in File "urllib2.pyo", line 121, in urlopen File "urllib2.pyo", line 374, in open File "urllib2.pyo", line 392, in _open File "urllib2.pyo", line 353, in _call_chain File "urllib2.pyo", line 1100, in http_open File "urllib2.pyo", line 1075, in do_open URLError: urlopen error (60, 'Operation timed out')I send you an (old) piece of code I am using to connect through a proxy. It may help.
import urllib from xml.dom import minidom class ADSPage(urllib.FancyURLopener): def __init__(self, url=None, proxy=False): self.url = url self.proxy = proxy self.__config() def __config(self): if self.proxy: proxy_map = readConnectionConfig() urllib.FancyURLopener.__init__(self, proxy_map) else: urllib.FancyURLopener.__init__(self) self.addheader('User-Agent', 'Mozilla/5.0') def get(self): self._query() self.xml_dom = minidom.parseString(self.content) def _query(self): f = self.open('http://%s' % (self.url)) self.content = f.read() f.close()Then you can write a standard parsing method that will reads self.xml_dom. I have an external function readConnectionConfig that reads a config file, and provide the necessary proxy map:
{'http': 'http://www.myproxyserver.fr:3128'}
Hi Cedric,
I've added a set_proxy() command to the latest release of the web library. Could you test if that works for you?
web = ximport('web') web.set_proxy('http://www.myproxyserver.fr:3128', type='http') html = web.url.retrieve('http://adsabs.harvard.edu/abs/1976PASP...88..917C&data_type=SHORT_XML') print html, html.data
Dear Tom.
Sorry for the late reply, I come back from vacations. Yes it works for me, behind my university proxy, with this simple command (adjusting proxy address of course). Great job. Thanks.
Threading NodeBox execution?
Posted by Cedric on Aug 02, 2008Hi.
Any idea/help would be greatly appreciated.I am trying to make a Graph. But each node is making a HTTP query to a database to retrieve some of its information. When I run my code, NodeBox freezes, until completion. I tried to thread my code, but I am rather a newbie in this topic, using the threading module. Here is a summary of what I am doing:
Thanks.