Choirfish Tune Identification

(Last updated: 2003-12-16)

Project Members

Goals

To develop a program capable of indexing monophonic source sounds and recognising them when they are reproduced by a human (whistling). By monophonic we mean that only one note is played simultaneously, not that the tone contains only one frequency.

If at all possible, we would like to investigate the possibility of indexing music files (MP3) with polyphonic information.

Abstract

The ChoirFish application will record whistled input, perform fourier analysis to get pitch information, use that to generate a Parsons Code (see references), and search against a database to determine the most likely song. ChoirFish is a Windows GUI application implemented using MFC7.1. It's basic functionality mimics the TuneServer desribed in the article under references, although the implementation is all our own.

Introduction

One of the main problems you have to deal with is inaccuracy of the input data. People don't whistle a song exactly as it is. They can be out of key, out of tune, use the wrong rhythm, shift keys during whistling, etc. etc.

This means a method has to be found to reliably deal with these problems, while at the same time sufficiently narrowing down the possible result space.

Existing work that has been done in this area includes the so-called "Parsons codes", a system that looks only at transitions between notes, disregarding rhythm and melody altogether.

Design

Initially, we plan to implement a system using Parsons Codes and evaluate it's performance. When we have a working system, we will see if we can improve on it's design.

Designing the Data Layer

NOTE: Actually just getting it working proved much more work than anticipated. The article is very vague on some parts of the design and even more so the implementation, so we had to negotiate some troubles ourselves. Therefore we have not been able to make any real improvements to the approach. We will explore some possible avenues for improvements in our final report however.

Implementation

We will implement our solution in C++. We will use Microsoft Visual Studio .Net 2003 as our main development platform.

Experimentation

So far we have experimented with several whistling styles; differing in speed, clarity, etc. One of the first conclusions we have drawn is that the more discrete you whistle, the better the recognition; tremolos or pitch slides can aversely affect the performance, as it is very difficult to determine how many Us or Ds this would need to represent in our database, so we cannot really account for it. This is of course no real surprise.

We found the 1.03 frequency ratio for seperating notes by pitch to be a bit on the low side, it was very difficult to get an R in the Parsons Codes, especially in the lower frequencies. 1.05 seems to work a little better. One of the problems is that it's very difficult to stay at the same pitch when whistling a single note. We might be able to solve this by requiring an abrupt change: when the pitch of the note changes (but stays under the 1.03 ratio) we currently keep the first observed pitch to compare other changes with. If we however start using this new pitch, we could possibly prevent gradual decreasing or increasing frequencies from being seen as a new note.

Speed is not really an issue, because unless you go rediculously fast you aren't likely to drop under the 2 frame (~69ms in our application) minimum note length. You will usually get at least one frame silence between notes, unless you aren't really being silent yourself. There are a few factors that might argue against going slow even. For instance the above mentioned pitch change in long notes, and we have also observed that the attack of a whistled note is so strong compared to the sustained volume that if you try to sustain it long you are very likely to drop below 5% of the maximum recorded volume, which causes erronous Rs in the Parsons Codes.

Software Requirements

  • A C/C++ compiler targetting the Win32 platform, preferably Microsoft Visual C++ .Net 2003.
    NOTE: To compile the actual ChoirFish application itself you will need VC.NET 2003, because it uses MFC7.1. You could probably compile all the libraries and also our console test application ParsonTest with other compilers, but we don't guarantee anything!
  • Microsoft Windows NT4 with SP6a, Windows 2000, Windows XP or Windows Server 2003.
    Windows 95, 98(SE) and ME are not supported!
  • Microsoft XML Core Services 4.0
  • The Fastest Fourier Transform in the West

Hardware Requirements

  • A microphone

Workplan

  1. 24-09-03: Project Group and Preliminary Proposal
  2. 08-10-03: Project Proposal
  3. 05-11-03: Preliminary Project Presentation and Discussion
  4. 03-12-03: Final Presentations and Demos of the Project

Deliverables

Technical paper: [pdf]
Errata: The document is titled The Singing ChoirFish, which should've been The Whistling ChoirFish.

Final source code: [zip]
Contains the fftw3.dll and fftw3.lib files, as well as merge modules used for integrating MSXML4 with the Deployment Project.

ChoirFish v1.0 installation: [msi]
Contains everything you need to install and run ChoirFish. You'll need Windows Installer 2.0: Download Windows Installer 2.0 Redistributable for Windows NT4.0 and 2000

Songs recognised by ChoirFish

References

  • An Interface for Melody Input [ps] [pdf]
    This article discusses a tune recognition system using Parsons Code.