leap, re, mi

I made this at the McGill Music Hack Day, ‘way back in November of 2012. I wanted to try to use the classical solfege hand signs to control pitch with a Leap, with (faint!) hopes that this would prove to be a natural-seeming interface. In the end, it wasn’t bad, but the Leap, while super rad, is not quite the best hardware for it.

I never learned the above signs, because I never learned to sing except by ear.  The principle is that you make the hand sign as you sing the note, and the extra muscle memory helps you learn things faster.  So for those of us who can’t /won’t sing, can we teach a Leap to recognize those symbols?  Almost, as it turns out.

The Leap

The Leap is serious tech.  It tracks the position of every finger it can see, and the hand’s location, rotation, and radius.  I ended up using basically all of those attributes.  I used a Python machine learning package called Orange to train a Support Vector Machine, using, wait for it:

frame_data = Orange.data.Instance(domain, [num_fingers, hand.sphere_radius,
hand_pitch, hand_roll, hand_yaw,
hand.palm_position.y, hand.palm_position.z, hand.palm_position.x,
finger_data[0][0], finger_data[0][1], finger_data[0][2],
finger_data[1][0], finger_data[1][1], finger_data[1][2],
finger_data[2][0], finger_data[2][1], finger_data[2][2],
finger_data[3][0], finger_data[3][1], finger_data[3][2],
finger_data[4][0], finger_data[4][1], finger_data[4][2],
‘do’])

Ow ow ow.  So we’re tracking: the number of fingers, the radius of the hand sphere, the rotation of the hand, the position of the hand, and the positons of all five fingers.  Phew.  Lucky for us, the machine learning takes care of all that – though training and learning the model can take some time.  I tried simpler models before adding the 15-part finger data, but things get ‘way better with it.  I also suspect that increasing the weight given to number of fingers and hand position would make things better still.

However, even with that huge model, things are not ideal.  The Leap doesn’t do well at hand details:  it is more about fingers (access to the Leap’s raw data would probably solve all these problems though).  So I ended up tweaking the above gestures to make them differentiate better.  Specifically:

  • Do:  the same.
  • Re:  Stick thumb out.
  • Mi:  Splay three fingers out.
  • Fa:  Straighten the other four fingers.
  • So:  Stick thumb out.
  • La:  Splay all five fingers out.
  • Ti:  Fold thumb under middle finger.
  • Do+:  the same.

This made things roughly ten billion times better, as now we’re counting fingers:  no adjacent notes have the same number of fingers.  I also made a tool for training the SVM, so testing different hand positions was relatively simple.

In terms of making sound, the loop gives back a note name, which we convert to MIDI, and throw to ChucK over OSC.  The current ChucK implementation makes sine-wave bleeps – it would be nice to make it do more.  A gorgeous synth goes a long way to hiding a lousy interface, after all.

The main improvements in the interface are in robustness (maybe dropping iffy frames?), and moving things back towards the real hand sign.  It’d also be nice to sneak pitch-bends and vibrato in there somehow, but that would probably require a total  rewrite.

For a hack day project, it’s not bad – there’s certainly lots of potential and desire for potential in this area, as the Leap, the Kinect 1 and 2, and the nascent Myo show.  Will we get actual free-gesture music apps that are more than novelties?  We’ll see – but it’ll sure be fun to try to build them.