Posted 5 months ago

How does biology explain the low numbers of women in computer science? Hint: it doesn’t.

It comes up a lot in discussions of women in computer science, women who write code, women in open source. Eventually, someone brings up the fact that women score slightly lower on math tests. Clearly, they claim, this biological inferiority must explain why there are fewer women in math heavy fields.

It sounds like a compelling reason, and it gets a lot of play. Except, you know what? It’s a lie.

I’m a mathematician. I’ve looked at those numbers, I’ve read some papers. The research into biologically-linked ability is fascinating, but it simply isn’t significant enough to explain the huge gender gap we see in the real world. I used to do this presentation on the back of a napkin for people who tried to spout this misconception to my face, and I finally put it online:

How does biology explain the low numbers of women in CS? Hint: it doesn’t.

(Source: daysofrage)

Posted 5 months ago

UNIX’ Russian Roulette

fooyeahcode:

sudo [ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo “You live”

(Source: twitter.com)

Posted 5 months ago

Rick Santorum, bigot or hypocrite?

I so wish I could have been there to point out the irony in his stance. Here’s what I would say.

“Mr. Santorum, you claim it’s unfair for people call you a bigot when you’re simply representing the views of the Catholic church. Did you realize your inability to consider that the American Psychological Association could have scientifically sound studies demonstrating something contrary to your beliefs is just as short-sighted? In other words, do you realize your argument makes you a hypocrite?”

Sorry, you can’t have your cake and eat it too.

Posted 5 months ago
Posted 5 months ago

Source code repository server

My new Mac Mini has been useful as a workstation, but sometimes I need a server dedicated to distributing media files or storing the source code for my programming projects. I’ve set out to address latter issue first, and this post documents my progress.

Here are the requirements for the source code repository server:

  1. Long term persistance
    • VM still works when copied to another machine.
    • VM is backed up at least once per week and tested, either manually or automatically.
    • Each repository is backed up by cloning it to a physical machine at least daily.
    • All backups are verified with some kind of automated consistency check.
  2. Security
    • Repositories can only be accessed from other machines (physical or virtual) via a secure channel like SSH.
    • Repositories are stored in a specialized account managed only by the repository server administrator (me).
    • Physical machines in my local network can request a working copy of a repository.
    • Source control interactions with repositories are regulated by access control lists managed by the repository administrator.
  3. Extensibility
    • Centralized repository supports hooks for automated standardization of code formatting, preventing commits that break tests or build script, or various other quality assurance measures.
    • Supports executing build scripts on a regular basis for specific repositories and storing dated copies of those builds. It should be known exactly what code was used to make the build, perhaps using a hash for tracking repository state.

Long term persistance

I’ve often used VirtualBox to address any virtual machine needs, but eventually I got annoyed with configuring everything in the GUI. VirtualBox does have an API and ways to make this easier for those who want more consistent control, but I couldn’t yet justify delving that deep.

Fortunately somebody else must have felt the same because there’s a great utility called Vagrant for creating and managing many virtual machines. Underneath it uses VirtualBox. For now I’m using Vagrant to manage the source code repository VM on my Mac Mini. I haven’t tested whether all the long term persistance requirements will be satisfied yet. Not too worried, though, because everything I’ve read and tried with Vagrant makes this seem realistic. Of course I’ll come back to those requirements after addressing bigger concerns.

Security and Extensibility

These two requirement categories are lumped together because they both hinge on a critical decision: What source control management tool should I use?

A company called SourceGear recently made a new distributed version control system (DVCS) called Veracity. There’s an interesting, free book by the founder of SourceGear about using various DVCSs called Version Control By Example, and it discusses Veracity. I spent a fair amount of time experimenting with Veracity, and I really like where it’s going. It doesn’t currently meet my security requirements yet. I’d consider adding the necessary features to Veracity, but I want to get up and running with my repository server. Maybe someday Veracity will have those features, or I’ll feel up to hacking on it. Not yet though.

Git works well for me. I’ve been using it in my personal projects for the past couple years, but have found my work isn’t easily accessible from all my computers. This isn’t git’s fault because it’s easy to clone a repository from another machine. What I really need is a centralized server that’s up nearly all the time. Github is good for that, but I don’t necessarily want all my projects hosted publicly. Yes, they have paid plans, but they seem far more expensive and restrictive than I’d like. Hence the need for this project.

Supposing I use Git, how can I satisfy both the security and extensibility requirements? Git supports hooks, so the SCM part of the extensibility requirements is automatically addressed. The protocol support in Git is pretty good, and includes SSH. I could make a new user on the repository server itself and attempt to manage the file permissions correctly for each project, but that seems a tad painful.

Gitolite comes to the rescue. It uses concise access control lists to dictate who has what rights to which repositories. Transactions occur over SSH. Seems like just what I need.

Posted 7 months ago
Posted 7 months ago

Improving bus schedule accessibility

Can’t help but smile when people spend time and money making innovative improvements to the world around them.

Posted 7 months ago

Introducing Tumblr's New API

staff:

Along with a sharp new Engineering Blog and a beautifully organized new Developer Center. Enjoy!

Posted 7 months ago

A few weekends ago I went to a dance camp called Pinewoods for the first time. Periodically people performed casually for everyone else, usually by playing instruments, singing, or dancing. In fact, I did a small poi performance and ended up teaching a lot of people how to do it in my downtime.

One fellow got up and sang “I’m My Own Grandpa.” It was the first time I heard the song, but I liked it a lot. It tells a story from the perspective of a guy who simply married a woman and through a complex sequence of marriages and births he becomes his own grandfather. Check out the video. It draws out the relationships visually so you can at least try to follow it all.

The folk song got me wondering whether anyone has ever tried demonstrating these complex relationships in Prolog, a programming language for satisfying constraints in a rule-based system. Perhaps the thought hit me because the stereotypical example for introducing people to these kinds of languages is modeling family trees and relationships between particular people.

After a short while curiously searching I found this Stackoverflow post. Apparently a computer science book published in 1976 was inspired by the song and is now inspiring Prolog teachers to assign the problem for homework. I like when some of my seemingly unrelated interests cross paths like that.

Posted 7 months ago

Type-safe mathematical vectors

The Challenge

Program an implementation of vectors for all dimensions with the following in mind.

  1. Binary operations on vectors like addition don’t make sense when the vectors have different dimensions.
  2. People using your implementation shouldn’t have to write boilerplate code to accommodate how you choose to handle dimension mismatches.
  3. Rather than failing silently by handling input incorrectly, the implementation should alert the programmer as early as possible about their mistake.

Case Study: Python

Let’s give this a try in Python. To account for the arbitrary dimension requirement, we’ll have the constructor take a single parameter called components. The plan is to pass an array of numbers such as [-3.0, 8.24, 2] or [42].

class Vector:
    def __init__(self, components):
        self.components = components

    def __add__(self, v):
        pairs = zip(self.components, v.components)
        return [ sum(axis_pair) for axis_pair in pairs ]

In case you’re not familiar with zip it helps you easily match up elements from different arrays at the same index. It’s useful in situations like this where we’re trying to sum vector elements component-wise.

>>> zip([1, 2, 3], [4, 5, 6])
[(1, 4), (2, 5), (3, 6)]

Let’s tinker with this implementation a bit.

>>> a = Vector([-3.0, 8.24, 2])
>>> b = Vector([10, 9, 8])
>>> a + b
[7.0, 17.240000000000002, 10]

Not bad. What about this?

>>> c = Vector([3.14159])
>>> a + c
[0.14158999999999988]

Gasp! The implementation accepts adding vectors from different dimensions. That’s because the __add__ implementation relies on zip, which truncates all the input arrays to the length of the shortest one. Guess we better validate the input to __add__.

def __add__(self, v):
    if len(self.components) != len(v.components):
        raise ValueError
    pairs = zip(self.components, v.components)
    return [ sum(axis_pair) for axis_pair in pairs ]

Now there’s an error when we try adding vectors with different dimensions.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vector.py", line 7, in __add__
    raise ValueError

We’re not getting wrong output anymore! The downside is we’ll have to depend on the programmers to check the dimensions themselves or handle the exception properly. Aside from that, we also have to raise an error on every vector operation where there’s a dimension constraint violation. Bugs like to creep in when there are maintenance constraints like this.

The core problem with the Python implementation above is all Vectors, regardless of their dimension, have the same type. We could have tried getting around that by making distinct Vector classes for several dimensions, but we’d probably end up with either too many or too few supported dimensions. It wouldn’t be fun maintaining those all those Vector implementations. Plus, using an array in the constructor won’t work unless the length of the array is validated against the dimension associated with the class. How could we dynamically build up constructors for each dimension vector type to accept the right number of arguments? I can imagine potential solutions in some dynamically typed languages with certain metaprogramming constructs. Maybe I’ll go into that further in a later post. Let’s move on for now.

Case Study: D

D deserves a brief introduction because it’s not quite as popular as Python. It’s a systems language resembling C++ and, at least syntactically, modern VM-based incarnations like Java and C#. The preprocessing engine that executes before compilation supports constructs far more expressive than #define or #ifdef. As a result it’s possible to do some interesting metaprogramming without losing the benefit of compile-time static checking or relying on potentially slower runtime operations.

Before we continue, here’s a quick note of attribution: Some of the following code was taken from the phoboslinalgebra project. Many thanks to Gareth Charnock for sharing his code on GitHub. Alright, let’s dig in, shall we?

A TypeTuple is a specific-length list where each entry corresponds to a particular type. For example, TypeTuple!(int, double, string) describes the type for lists containing three elements where the first is an int, the second is a double, and the third is a string.

An n-dimensional vector is represented by nothing more than a list of n values where each is an element of the vector space’s field. To represent an n-dimensional vector in D we could really use a TypeTuple!(T, T, ..., T) where T is the field type and there are n of those Ts. How can we tell D we want the type for 3D vectors of floats, 2D vectors of ints, or 1000D vectors of mySuperSpiffyNumericType? The answer is to recursively build a type definition for our TypeTuple!(T, T, ..., T).

import std.typetuple;

template TypeNuple(T, size_t n) {
  static if (n == 0) {
    alias TypeTuple!() TypeNuple;
  }
  else {
    alias TypeTuple!(T, TypeNuple!(T, n-1)) TypeNuple;
  }
}

The TypeNuple template uses the static if to dynamically generate code during the early stages of compilation. Notice that the implementation recursively builds up a type tuple of Ts until it has n elements. We’re using recursion to dynamically generate the types we need at compile time!

Now let’s create a struct that utilizes our recursively defined list of types.

struct Vector(Field, uint D) {
  alias TypeNuple!(Field,D) ArgList;

  Field[D] components;
  
  this(ArgList argList) {
    foreach (i, arg; argList) {
      components[i] = argList[i];
    }
  }
}

The alias call makes ArgList a tuple of D elements, each of which has type Field. Suppose we’re making a 3D vector of reals. In this context, ArgList has type (real, real, real) and an example instance could be (3.14159, 2.71828, 0). Notice how the constructor iteratively initializes the components attribute array with the values from the provided argList.

Now we’ll overload the binary operators + and - to take two Vector!(Field,D) values and produce a new Vector!(Field,D). For convenience we’ll alias that type as VecType. The following code goes inside the struct defined above.

  alias Vector!(Field,D) VecType;

  VecType opBinary(string op)(VecType vec) {
    if (op == "+" || op == "-") {
      VecType result;
      for (int i = 0; i < D; i++) {
        mixin("result.components[" ~ i.stringof ~ "] ="
              "this.components[" ~ i.stringof ~ "]" ~ op ~
              "vec.components[" ~ i.stringof ~ "];");
      }
      return result;
    }
  }

The mixin construct makes it easy to assign each element in the resulting vector to the sum of the corresponding elements in the two other vectors.

Let’s make a unit test that adds a 3D real vector to itself and checks the result:

unittest {
  alias Vector!(real,3U) VecR3;

  VecR3 v = VecR3(3.14159, 2.71828, 0);
  VecR3 v_doubled = v + v;

  assert(v_doubled.components[0] == 3.14159 + 3.14159);
  assert(v_doubled.components[1] == 2.71828 + 2.71828);
  assert(v_doubled.components[2] == 0 + 0);
}

When we run the unit test there’s no output. That tells us it’s working!

$ rdmd -unittest --main vector.d
$

If you’re skeptical that no news is good news, try changing the asserts so they come out to false. rdmd will cry tears of garbage.

What happens when we add two vectors with different field types or dimensions? You’ll get an error at compile time. Yeah, that’s right—kill those bugs before they have a fighting chance.

vector.d(56): Error: incompatible types for ((r2) + (r3)): 'Vector!(real,2u)' and 'Vector!(real,3u)'

I imagine there are other languages that give you the same expressiveness and safety, but haven’t tested the theory. What other languages can generate recursively defined types for you at compile time? Can they also easily support useful binary operations on those types to produce a new value of the same type?