Pages

Monday 1 September 2014

The Sound of Happiness

Or fun with spectrograms. 

What does happiness sound like?

Humans are primarily visual animals. As such, when processing large volumes of sound data, we often turn them into visual representations. One of the most popular ways to do this is to create a spectrogram. 

Spect-as in the frequency spectrum
Gram- drawing 


This is a spectrogram from a Raven demo file of a bird call. The wavform is at the top where time is the x axis and amplitude is the y-axis. In the bottom window amplitude is now represented by color (darker is louder), time is the x-axis and frequency is the y-axis with lower frequencies (elephant/blue whale calls being towards the bottom) and higher frequencies (bird chirps, me when startled by cold water are at the top).

Usually the standard way to teach this topic is to show how any given sound can be turned into an image using a complex set of mathematical equations (or by typing 'fft' into matlab). However, I thought it would be fun (for me at least) to look at it the other way around. What if we had an image, and wanted to turn it into a sound file.

Say we have a silly drawing and we want to turn it into a spectrogram. How would we go about that?
Fortunately, dear reader, I happen to have such silly drawing (artfully created using MS paint).


  
Silly drawing.

Now, instead of an image we are going to make an audio-representation of the drawing where time is going to be on the x-axis, frequency is going to be on the y-axis. 


First we load this image into MATLAB and turn it into a matrix with darkness values for each cell.

 The image turned into a cell matrix


Now we make a spetrum of each column (see below). A spectrum has both magnitude (volume) and phase information embedded in it. What is phase do you ask? Phase can be thought of as where on the a sin wave does our signal start? So for each frequency amplitude (dark spot) we need to know two things. 1. What is the frequency and 2. Where (from 0-2pi) does the signal start?





Spectrum (upper image) values for the slice of the happy face representing the left cheek and a bit of the squidgie hair. 

Dude! That spectrum looks weird! Why is it duplicated on the right and left half? Good catch kind sir/madam. It's duplicated because, simply put, the phase information needs a place to go. So that's where it goes. We need to duplicate the whole figure in the negative space to preserve (create) some phase information. For this process, I've gone ahead and assigned every positive value (where it's dark) a random phase from 0 to 2pi. 


The image after we've added the negative amplitude and phase bits.

Creepy no? 
Ok, so where are we? We've got our image, we've got the full spectrum (with amplitude and phase) for all our frequency values. We know how to create spectrums for each column of our image (hint, in involves typing fft into matlab).

Now, we need to transform those frequency values into time values using, imaginatively, the Inverse fft (hint: ifft in Matlab).

We do that for every column in our image and combine all the sound values and *poof* we have the sound of happiness. Or a silly picture that says "happy". 


























Raven screenshot of waveform and spectrogram view of happy image. Note the vertical lines are because there is a goof in the phase for each of the spectrums. 

Thanks for hanging in there. This is what happiness sounds like!




Update 

Thought I would include the MATLAB code (warts and all) in case anyone wanted to play with it. I appologize in advance to anyone with a knowledge of DSP for the cringe-worth mistakes .

_______________________________________________________________________________
close all; clear all; clc;
%% Read in happy image
Raw_pic = imread('C:\Users\Desktop\Pamguard Tutorial Figs\Happy'. 'JPEG');
%% Turn it into a spectrogram
% Create the magnitude and add random phase
Happy_pic_mag=300.^-((im2double(Raw_pic(:,:,1)))-1); % Magnitude. Exponentially transforming to emphasize amplitude differences
Happy_pic_mag=Happy_pic_mag-min(min(Happy_pic_mag));
Happy_pic_phase=exp(rand(size(Happy_pic_mag))*2*pi*i);% Phase
% (I know, it's not quite right corrections gladly accepted)

Happy_pic=Happy_pic_mag.*Happy_pic_phase; % Magnitude and phase

% Now, set the phase of the first and last values to 1, because a very
% smart and compassionate man said so.
% 
Happy_pic(1,:)=0; Happy_pic(end,:)=0;

% Now we need to create the complex conjugate for the second half of the
% spectrogram

Happy_pic_conj=conj(Happy_pic);

Happy_spect=[flipud(Happy_pic);Happy_pic_conj];

image(abs(Happy_spect));
%% For each column in the image create the wavform

% Pick the sample parameters
fs=8000;
freqs_init=linspace(1,fs,size(Happy_spect,1));
freqs_final=[1:fs];
% I've picked the range of 0-40khz because human hearing is roughly 0-20khz
% so our nyquist frequency (fs/2) is at the top of the human range. Later reduced to 8000 because otherwise it was too squeelie

dt=1/fs; % time resolution
df=round(freqs_init(2)-freqs_init(1)); % frequency resolution
T=1/df; % Duration of each spectrogram slice
N=T/dt; % Number of points in each spectrogram slice

% Pick and overlap value
ovpl=0.01;
ovlp_pts=floor(ovpl*N/2);

% Now, scroll through each column and create the wavform
yy=[];
for ii=1:size(Happy_spect,2);

    
    Xm_raw=Happy_spect(:,ii); % Raw spectrum
     Xm=Xm_raw;
     xn=ifft(Xm)*N*df;
    


    if ii==1
    yy_start=1;
    else    
    % create start and stop points
    yy_start=((ii-1)*N)-ovlp_pts+1;
    end
    
    yy_end=yy_start+N-1;
    
    
    yy(yy_start:yy_end)=real(xn)/1200000; % reduce the volume to remove clipping
    
end

plot(yy) % idiot check
% Write the Wavfile

audiowrite('temp.wav',yy,fs); 





____________________________________________________________________________ 




No comments:

Post a Comment

Comment forum rules.
1. Be accurate
2. Cite your sources
3. Be nice

Comments failing to meet these criteria will be removed