Chord Classification using Neural Networks
I'm currently working on classifying chords from audio using neural networks. This post gives an overview of the project, how it works, and the (soon) showcases the final product.
This is a work in progress, and will be updated regularly.
This projects aims to:
- Classify chords being played into a microphone live.
- Classify audio from a video hosted online and see the chords as the video is played.
I'm going to use neural networks for this, mainly because I want to develop a working knowledge of deep learning.
In order to train a machine learning algorithm, I need training data, i.e. audio snippets labeled with the correct chord. I thought of 2 sources:
- Tape myself.
- Use youtube play along videos. These videos intended for people to play along with the video diplay the chords in real time. I can read this chord in order to correctly label the audio.
Here's an example of the type of video I'm referring to:
Using these two techniques, I obtained a several hours of labeled audio.
Notes, and by extension chords, are directly related to the frequencies of a signal. Therefore, features are based on the Fourier transform of the signal. I use a short time Fourier transform of the incoming signal, keeping only frequencies that can be produced by a Ukulele. Detailed description and code is available in this ipython notebook.
First, I trained the classifier using chord only recordings and used cross validation to evaluate its performance, and then tested it on songs. I trained a support vector machine, and several different neural network architectures.
Once the data is split into training and test sets. Using scikit learn, training the SVM is accomplished using:
svm = sklearn.svm.LinearSVC(C=1) svm.fit(x_train, y_train) print metrics.classification_report(y_test, pipe.predict(x_test)) """precision recall f1-score support avg / total 0.95 0.95 0.95 1392"""
The SVM performs quite well. However, when tested on songs, performance was much poorer.
To normalize the signal:
# Divides the signal by the absolute value of it's highest peak. # If there is no peak (all 0), return False def normalize(signal, inplace = True): min_v = min(signal) max_v = max(signal) peak = max(abs(min_v), abs(max_v)) if peak == 0: return False signal /= peak return signal
# Divides the signal by the absolute value of it's highest peak. # If there is no peak (all 0), return False # If inplace, returns None def normalize(signal, inplace = True): max_norm = scipy.linalg.norm(signal, np.inf) if max_norm == 0: return False else: if inplace: np.divide(signal, float(max_norm), signal) else: return np.divide(signal, float(max_norm)) return