| JezUK Ltd - The Coffee Grounds - February 2004 |
| << January 2004 | March 2004 >> |
House prices are going up here, as they are generally. This is something of a double-edged sword, because when we do move again we're planning to move elsewhere in Moseley. Doh! [added 27th Feb 2004]
[Add a comment]
Primarily a bug-fix release.
SAX: SAXParseException copy constructor was corrected. DefaultHandler::fatalError now throws an exception, matching its documentation. Thanks to Ulrich Heinen of the University of Freiburg for picking that up. The MSXML2 wrapper now allows exceptions thrown from ErrorHandlers to propagate properly, rather than dropping them at the COM boundary. There are a few VS.NET specific fixes. The Writer SAX filter now also writes any internal DTD subset.
DOM: Some minor DTD handling problems were fixed
Source tar.gz download Source zip download
Build Notes
Another exceedingly quick'n'easyTM teatime winner. It's a lovely way to eat broccoli - you really get to taste the broccoli, rather than the sauce or gravy or whatever that it usually gets mixed up with.
Nip down the shops and get
With all that broccoli to work against, the pasta need to be pretty rugged. Right now, I normally use buckwheat pasta because Natalie isn't eating wheat. Wholewheat spaghetti is something I usually avoid, but it's excellent cooked with the broccoli. Hemp and spelt pasta is good too - it cooks so quickly you can put everything in at the same time.
Spent the weekend up near Skelmersdale with my Warps chums JD, Anton and AndyB. Warps is a crappy acronym for Wargames and Role-Playing Society but WARPS made for a better name than Hull University Wargames Society. It also meant Ian could design a cool logo for the T-shirts. I'm fairly sure I came up with the Warps name, so I'm gratified that it still exists and is, no doubt, ensuring that maladjusted scientists and engineers continue to get 2:2s. Hurrah!
We talked a lot of nonsense, played Carcassonne, drank some beer and, with JD's delightful goodwyf Francine at the wheel, embarked on an extended midnight tour of Ormskirk looking for a takeaway. Top fun.
In the weeks beforehand Anton had solicited suggestions for music we listened to at university. He turned up with a CD of 119 rock tracks, roving from Biohazard through Faith No More to Lawnmower Deth. At the cheesy end of things he also threw in some Poison, Dave Lee Roth and Warrant's marvellous Cherry Pie. Sitting round the game, drinking beer, Andy outlined his objections to Unilever's latest branding strategy. The rest of us tuned out slightly and gradually we all began to nod in time to Queensryche.
I don't honestly see us not continuing to do this two or three times a year until we're all dead and buried.
I can't claim to have an exhaustive test suite, but it eats all the arbitrarily complex XPaths I've thrown at it today.
What I find amazing is the speed with which I've been able to do this. I've spent less than two working days on it - two hours transcribing EBNF from the XPath rec, half a day footling around with abstract syntax trees and what not, and the rest of the time eliminating left-recursion. Now I have a grammar that I'm extremely confident is correct.
That development speed and confidence is entirely due to the mind-boggling power of the Spirit library. By allowing you to transcribe EBNF more or less directly into code, I can take the grammar in the recommendation and codify it. Literally codify it. The rec text is there in my code, so the code must be right. Here's snippet
// [1]
LocationPath = RelativeLocationPath | AbsoluteLocationPath;
// [2]
AbsoluteLocationPath = AbbreviatedAbsoluteLocationPath
| ('/' >> !RelativeLocationPath);
// [3]
RelativeLocationPath = Step >> *((boost::spirit::str_p("//") | boost::spirit::ch_p('/')) >> Step);
// [4], [5]
Step = AxisSpecifier >> NodeTest >> *Predicate | AbbreviatedStep;
AxisSpecifier = AxisName >> "::" | AbbreviatedAxisSpecifier;
The numbers in square brackets refer to the rules in the recommendation -
[1] LocationPath ::= RelativeLocationPath | AbsoluteLocationPath [2] AbsoluteLocationPath ::= '/' RelativeLocationPath? | AbbreviatedAbsoluteLocationPath [3] RelativeLocationPath ::= Step | RelativeLocationPath '/' Step | AbbreviatedRelativeLocationPath [4] Step ::= AxisSpecifier NodeTest Predicate* | AbbreviatedStep [5] AxisSpecifier ::= AxisName '::' | AbbreviatedAxisSpecifierEven without knowing the Spirit syntax, it's easy to see the two match very closely. You can see I've had to reorder some rules slightly. RelativeLocationPath is an example of left-recursion, which I've had to refactor. But differences are minor and pale into nothing compared with a hand-coded parser. (I was going to link to Xalan's XPath grammar and I can't actually find it, which kind of demonstrates the point.)
D:\work\JezUK\Arabica\xpathic>Debug\xpathic.exe
Hello
'text' parses OK
'comment' parses OK
'text()' parses OK
'processing-instruction('poo')' parses OK
'processing-instruction("poo")' parses OK
'processing-instruction()' parses OK
'self::name' parses OK
'@fruit' parses OK
'one/two' parses OK
'one/@fruit' parses OK
'one/@fruit[1]' parses OK
'one/descendant-or-self::woot[1]' parses OK
'one/two/three' parses OK
'one/two/three[1]' parses OK
'one/two[1]/three' parses OK
'/one/two' parses OK
'/one[1]/two[2]/comment()' parses OK
'/one[1]/two[2][1]' parses OK
'//one' parses OK
'//one/two' parses OK
'//one/two//@id' parses OK
'one/two/three[@attr]' parses OK
'one/two/three[@attr][1]' parses OK
'one/two/three[four/@attr]' parses OK
'one/two/three[@attr='nob']' fails Parsing
'one/two/three[position() = first()]' fails Parsing
'one/two/three[(@attr) or (@id)]' fails Parsing
If you're interested, the code is in the CVS as xpath-dev-sandbox. You'll need a current version on Boost, or Spirit 1.6.
So far, it is just the grammar, but everything else is just a simple matter of programming ...A friend of mine lives in Wolverhampton. Almost everyone in her street sends their children to the local, middle-ranking primary. After that, there is no automatic transfer to a similar school. The closest school is a girls-only grammar. The two nearest secondaries restrict their intakes to Catholics and Anglicans respectively. The local comprehensive has a catchment area that never reaches their street. My friend's best hope is that her children will be accepted at the specialist city technology college, which selects bands of children from an entrance test. The only choices for her children and those of everyone in the streets around her, if they want to avoid being placed in the area's failing schools, are these: tutor, find God, hope, appeal, move, or pay. They aren't choosing schools - it's schools that are choosing them.
[The Guardian, 14 Feb 2004]
What is a residential broadband IP address? [Postmaster.Info : AOL Mailer FAQ]
Residential customers of broadband services are assigned an IP address from a specific range maintained by the provider. These IP addresses may be either dynamic or static depending up the individual provider. Residential IP addresses should use the provider's SMTP servers and should not be connecting directly to another ISP's SMTP servers. Please see our Info Center for more information on IP addresses.
Bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards, bastards.
AOL don't want to deal with it, because they consider such things as carrying the whiff of spam, either by delivering spam directly or acting as an open-relay for others. There's no doubt that a lot of machines do operate like this, largely because they've been infected with trojans.
However, statements like "residential addresses ... should not be connecting directly to another ISP's [servers]" are nonsence, and refusing connections from so-called "residential addresses" won't do anything very much to reduce spam. [added 16th Feb 2004]
Obviously, I don't know what the situation is now because I haven't been using them. [added 16th Feb 2004]
It is not AOL who set the internet standards, and they could benefit from understanding technology better. [added 11th Dec 2008]
Thsi expression "Residential IP addresses" has no technical meaning, and businesses also use such "residesial" addresses.
[added 11th Dec 2008]
For the first time noticed the dedication in Java in a Nutshell, 4th Edition.
This book is dedicated to all who teach peace and resist violence.
Admirable sentiment. No doubt the many, many people to whom that applies sleep a little bit sounder in their beds knowing their noble actions inspired a 5cm thick Java fucking reference manual.
I'm a little bit late picking this up but am saddened to read that Flossie, the prettiest sheep in world, has moved on to a pasture where the grass is always lush and there are many, many McVitie's biscuits.
If you'll excuse me, I'm going to play GridRunner++ for a few minutes.
ping www.nedrichards.com
Unknown host www.nedrichards.com [added 10th Feb 2004]
A guy ran through the crossing in front of me this evening. He wasn't going especially fast, and then pulled up outside Sainsbury's. I hoofed it down the road and caught up with him.
Me: Red lights not apply to you, mate?
Him:
Me: Red lights not apply to you, mate?
Him: What?
Me: You just ran through the pedestrian crossing when the lights where red.
Him:
Me:
His surprise seemed genuine. He appeared to have not noticed the crossing, nor the fact the someone on the opposite side from me was on it at the time he drove through. If I hadn't picked up on his unchanged engine note, I would have started crossing too. Would he have noticed me, if I had not noticed him?
I'd expected a more aggressive reaction from him on being challenged, not a mild show of "oh silly me" surprise. I'd always assumed that the people running the crossing, jumping the lights or whatever, where largely motivated by selfishness and aggression. It didn't seem possible that people could fail to see a pair of traffic lights, but apparently they can. Perhaps the nation really is asleep at the wheel.
I'm not saying all problems would be solved by having cars banned from the road, but... [added 8th Feb 2004]
Ban cars ! [added 13th Feb 2004]
Since 2001 I have been working sporadically on a new parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML.I would love to have the time to port this to C++ and give Arabica HTML eating capabilities.
An interface for SAX sources is even more obscure.
So I assume whatever you build will be for a specific implementation, such as an abstract class that you define, so that this HTML parser can send events to it, and the events look like SAX events on the other end of the conversation.
Fun project. [added 10th Feb 2004]
Porting something like TagSoup would mean writing something that implemented the SAX::basic_XMLReader interface. After that, you could drop it in and everything would just work.
There are examples of this kind of thing in CPAN, where there are Perl modules to present all kinds of things as SAX sources. There are probably Python examples too. [added 11th Feb 2004]
Here's a cool example:
Screenscraping HTML with TagSoup and XPath - http://www.hackdiary.com/archives/000029.html [added 11th Feb 2004]
From: Jez To: Pete Subject: Bettie Page Bio-pic Trailer http://www.bettiepagedarkangel.com/ It's what broadband was invented for. Jez
From: Pete To: Jez Subject: re: Bettie Page Bio-pic Trailer There is a god ...

The original book of ancient-astronaut craziness in a cheesy mid-70s Corgi edition? For only one of your English Pounds? Extra amusement points for being on sale at the Birmingham Buddhist Centre? How could I resist?
Actually, if you see von Daniken on the TV or hear him on the radio, he doesn't sound like a wide-eyed crazyman at all. He seems at least as sensible as James Lovelock when he explains Gaia or as methodical as Fred Hoyle describing how he came to round to the idea of panspermia. In the current frenzied political atmosphere, he certainly makes more sense then the explanation for going to war in Iraq. Footle around legendarytimes.com for more fringe-pseudo-science fun.
And before you ask - no I don't [added 5th Feb 2004]
| [Enormous version] |
http://www.looseend.org/photos/portmeirion_wide_big.jpg
The camera shows you the previous pic in the sequence on half the screen making it easier to line up. [added 3rd Feb 2004]
| << January 2004 | March 2004 >> |