Tim Hatch

Weblog | Photos | Projects | Panoramas | About

Introducing Traceback Tracker 29 Mar, 2009

I think that identifying errors, in an automated way, is important for lots of software projects to reduce the effort required to find duplicate tickets or respond to users when nobody else is around. Launchpad uses something for identifying similar bugs when reporting a bug, but it appears that it’s just Bayesian, since it has to work with lots of different errors — some in the gui, some from different runtimes, etc. When your problem space goes down to just Python errors, which have a well-defined format, you can do smarter, more exact things. This is a project that Eli Carter and I are working on, feel free to join in on the Testing in Python list.

I’ve seen entirely too many IRC “conversations” that go like this:

[02:30 am]          * steve__ joined
[02:31 am]  <steve__> Anyone seen this, can you help me? http://somepastebin.com/blah-be-de-blah
[03:00 am]          * steve__ quit
[07:30 am] <otherguy> Morning.

And a lot that go like this:

[10:40 am]  <steve__> I'm getting http://somepastebin.com/blah-be-de-blah when I florb the blots.
[10:50 am] <otherguy> What database server are you using?
[11:00 am]  <steve__> Red Hat
[11:05 am] <otherguy> No, I mean your sql database server
[11:15 am]  <steve__> mysql.
[11:20 am] <otherguy> Are you using Trac >=0.11? We had a known issue with certain versions and mysql.
[11:25 am]  <steve__> I don't know, how can I check?
[11:30 am] <otherguy> Look in the footer or the about page.

What I’d like to add to the first one is a short delay followed by helpful solutions:

[02:33 am]          * identibot suggests checking tickets #123, #234, #345 while waiting
                      for a response, sometimes people are asleep.  Please stay around,
                      or file a ticket if it's not listed above.

And for the second, to summarize the traceback, both obvious information and latent facts that we’re able to derive from comparing against open source releases:

[10:40 am]          * identibot thinks steve__ is using Trac 0.10 with MySQL backend x.y.z,
                      and got UnicodeDecodeError, on POSIX with Python from rpms.

So, at the bare minimum, we need a bot that identifies those facts so followup questions for facts that can be determined don’t waste time. This helps both us and our users, because they get a faster response. What would be really great is to index problems/solutions, say, something like tickets from a ticket system, and identify solutions based on this for when users aren’t around (most of the activity in #trac is during the US daytime).

The goals as they exist now are

  • Parse some tracebacks (DONE)
  • Make educated guesses at what’s in use (DONE)
  • Support most alternate formatting (right now it’s pretty targeted at what we get in Trac reports, but can be extended)
  • Fingerprint tracebacks, so you can find similar bugs based on code path (first stab, use filename:1,2,5;filename:20,31;TypeError and compute edit distance)
  • Respond to users in IRC and when reporting tickets in Trac

but please, join the discussion and let’s find out if there are other ways to use it. Here’s the source code and the release announcement

More found knowledge 24 Mar, 2009

If you burned a ATmega168 bootloader using the Arduino software (in my case, version 13 Alpha) and it has a 10-second boot delay or flashes three times, you might have picked Arduino Mini instead of Diecimila.

Converting HTML to reST 19 Mar, 2009

I'm in the process of rewriting my weblog, and switching it all to reStructured Text. The best way, if you can ensure your input is valid xhtml, is to use xhtml2rest. Running random pages through tidy, then xhtml2rest has produced thoroughly reasonable documents that don't require much editing.

There's also aaronsw's original html2rst converter, and a slightly improved version, that with minor changes can deal with most basic features. A place called siafoo also provides a web form which handles like the TurboGears version plus links in headings.

If you're interested in playing with siafoo further, I've got a quick script that will format the requests for you: siafoo-api.

Learning the hard way 12 Mar, 2009

Extruder Controller 2.0 board

This is a TQFP package ATMega168, a microcontroller which is used in various boards these days and can run at up to 20MHz. I'm in the process of putting together some boards Zach Hoeken of the RepRap Research Foundation designed, and spent Sunday evening beating my head against a wall.

Let this be a lesson to everyone, when you're using a multimeter to check whether any pins are bridged in your soldering job, and you find some connections between nonadjacent pins with 0.5K-2K resistance, it might be that they're connected inside the chip itself so check on one that isn't on a board yet. In my case they were ones on the lower-left in the photo, and I tried in vain to de/resolder them four times.

USBtinyISP tips 10 Mar, 2009

If you’re having trouble with the USBtinyISP programming an ATmega168...

avrdude: error: usbtiny_transmit: error sending control message: Protocol error

Just hotplug the usb side, not the IDC side.

avrdude: fileio: invalid operation=0

Make sure you write fuse bits inline, it just works better.

When PC6 isn’t working for input, check and reprogram the fuse bits to remove RSTDISBL.

Had it with bzr 07 Mar, 2009

First, a little background. I’ve been using bzr for storing all my personal projects and advocating to others for the past two years. I met some of the devs and liked the direction it was taking at that point (lots of tests, no smart server required unlike svn, and its sensible handling of redirects compared to anything else I’d seen).

It’s taken a turn for the worst recently, requiring developer-level effort to keep versions up to date, with foreign branch support (which is very important to me) not progressing to the point of being usable. About once a month, I hit some weird bug trying to do something quite normal, and burn half a day trying to get bzr (and its associated required plugins) up to date enough that I feel confident reporting a bug.

I used to dislike git because of its confusing error messages and my ability to get it wedged (usually while resolving conflicted merges), but I’ve actually had more trouble with recent versions of bzr (1.2+). My favorite bzr error, the one I got last night which changed my mind, was an error that got thrown while trying to display an error creating a new repo. I tried to update bzr-git (the likely culprit) and got the simple error “different rich-root support” without any sort of suggested fix.

Compare that mental stress with around 4 minutes of ‘make; sudo make install’ and it’s no question. As of today my respositories are either migrating to hg (when upstream merits it, for compatibility) or git.

Notes from SBPy last night 05 Mar, 2009

Python cheat sheet:

List of built-in functions is dir(__builtins__). Basically everything you access (stuff that’s imported, or defined, or used as a variable) is either a builtin, in globals() or in locals(). dir(object) gives you both the methods and variables defined on it.

For docs, check http://docs.python.org/, run help(func) or run pydoc mod.func (which will auto-import modules for you).

If the first thing in a module, class, or function is a string, it becomes the docstring automatically.

import os
from os import path
from cStringIO import StringIO as sio

Importable modules (on Linux) are named whatever.so, .pyo, .pyc, .py. pyo and pyc are byte-compiled versions of the .py which might be a tad smaller. Packages are just modules in directories, where the directory has an __init__.py in it. You can also load eggs, which are generally zip files created by setuptools.

lines = ['a', 'b']
'\n'.join(lines) # join is a method of strings, not lists, because you can join non-lists

def func():
    yield 'a'
    yield 'b'
'\n'.join(func())

2.6 has support for "context managers", which use the "with" keyword. This example automatically closes, even on Windows:

with file('blah', 'wb') as f:
    f.write('hi')

Testing modules that come with Python are doctests (just copy-paste from the interpreter, into docstrings), and unittests. Once you need unittests in multiple files or to easily skip, you end up with suites and they get really complicated. That's where nose comes in, and tests anything that looks like a test.

For coverage, check out figleaf (figleaf <file.py>; figleaf2html), remembering that it concatenates runs (so delete .figleaf when you change the code.

The Pascal's triangle module

class Triangle:
    def __init__(self, startrow=0):
        self.startrow = startrow

    def __getitem__(self, i):

        def pairwise_add(l):
            ret = []
            for i in range(len(l)-1):
                ret.append(l[i] + l[i+1])
            return ret

        r = [1]
        for _ in range(i+self.startrow):
            r = pairwise_add([0] + r + [0])
        return r

Its test module

import unittest
from dt import Triangle

def test_generator():
    for arg in [-1, 4, 10]:
        yield runner, arg

def runner(arg):
    assert isinstance(Triangle()[arg], list)

class PascalTestCase(unittest.TestCase):
    def setUp(self):
        self.t = Triangle()
    def test_tip(self):
        self.assertEquals([1], self.t[0])
    def test_secondrow(self):
        self.assertEquals([1, 1], self.t[1])
    def test_thirdrow(self):
        self.assertEquals([1, 2, 1], self.t[2])


if __name__ == '__main__':
    unittest.main()