Archive for the ‘Uncategorized’ Category

A whole bunch of images

Wednesday, December 7th, 2011

The Victorianator Postmortem presentation

Thursday, November 3rd, 2011

Click here to download the pretty slideshow!

Production schedule

Saturday, October 8th, 2011

TBD

Screen sketches

Sunday, October 2nd, 2011

To be completed

Art, design and animation / Visuals

Sunday, October 2nd, 2011

To be completed

More audio debugging

Friday, June 17th, 2011

So here’s how I’ve been debugging the audio.  Blogging while waiting for the game to run on the device.

The first thing I did was to start installing ffmpeg.  This will allow me to convert the audio files; see http://howto-pages.org/ffmpeg/

The second was to remove all of the effects and to listen to the recording.  Thankfully, it was nice and crisp.  It was actually quite annoying hearing the weird sounds I made to vary the pitch.  Then there’s the sound of me unplugging the microphone…  All in all, very interesting!  And crisp!

Ok – let’s tackle the effects.  One by one.  First one is the volume (I call it internally augment.  actually, programmers have a weird way of working in bubbles and inventing vocabulary rather than standardizing at times.  It’s annoying).  Volume works except it doesn’t handle high volumes that well.  Imagine if there are 2 digits to hold audio information, capable of representing the numbers -99 to 99.  What happens when you try to add 1 to 99?  Well, we get -99.  That added quite a bit of noise.  Let’s make sure the signs don’t change and it sounds better.

There this background noise.  It’s annoying.  Is it part of the augment effect or part of the glue that connects audio effects together?  One way to find out is to copy data directly from the source to the destination without changing the volume (only one effect is running so we can do this easily).  It seems to be the glue.  That’s in AQBuffers.h (.h so it would inline; in retrospect it should be in a .c so fp optimizations can be applied).

AQPlaybackFromEffect might be the culprit.  This thing takes an audio source (such as a file or other effect), applies an effect, and returns audio data.  (These can be chained.  If there’s a bug here, it will amplify).  I recall fixing this some time ago; let’s get to it then!  Maybe the last fix got undone or something else changed…

The bug used to be in “AQAudioData::acceptData”.  It’s no longer being used?!  A quick search in the project reveals that it’s being used by the recorder…  This is fishy; well let’s get rid of it.  Stale code makes this thing cryptic…  Thankfully, deleting… wait…  brain slowly catching up…  I should be in AQPlaybackFromEffect — AQAudioData does the recording and it works perfectly!  (Sigh – I haven’t been in this piece of code in quite a while)

AQAudioBuffer::pop … that should be the culprit.  That looks better.  This object is a bit tricky.  I reserve more memory than is needed and guarantee that each effect has an additional buffer to play in.  This is very effective in simplifying the effect code (it ensures that all the subsystems that take audio data as input get the same amount of data, even if subsystems — including effects — output variable amounts of audio data).

What I’m doing is checking to make sure that the input/output correspond.  The number of inputs should equal the number of outputs.  That’s correct.

Let’s look at the arithmetic.  When we run out of space, we loop back and write over old data.  This should not matter for the volume effect, so let’s see what this does.  When the number of read elements exceeds the buffer size (excluding excess left for good measure) copy to the beginning of the data the data located at the end that has not been processed.

A bit of explanation, AQPlaybackFromEffect:pop returns 1024 elements of single channel audio.  There is a work area (place where I store audio temporarily) to store audio with effects to pass down to other effects or the output.  When the work-place is filled with audio, I start working from the beginning (the assumption whose validity I ensure is that when I start back at the beginning the previous effect has already processed all of the data).  Generous buffer sizes ensure that this assumption is valid at all times.

First thing, short circuit AQPlaybackFromEffect::pop so it only gives back audio data…  And there is still this odd sound…  AQAudioBuffer::pop might not be the issue…  Rather, let’s play the game through without any effect layer again.  Just to be sure there is added noise.  Ok, who calls pop?  AQPlayer uses pop to write data to the speakers…  Maybe it was just a fluke there were no glitches on the first few runs…

Maybe not.  What else changed?  There was the echo effect I accidentally re-enabled.  (How?  A bit of history is in order.  Initially, the audio effects were to be one massive thing that I just used.  So everything was initially created for a single massive effect that would magically take in some audio and do work.  So, there was no chaining, therefore echo effect was left to be run via this older code.

By removing the echo effect, we end up with AQEffectNone doing the processing.  Essentially, this effect copies data without changing it.

I think we have a second definitive culprit: the echo effect!  (the augment effect now sounds better, but augmenting the effect too much might be a BAD idea – can’t conserve much audio data if everything is either -99 or 99…)

Why am I sure the echo effect is a culprit?  Elementary my dear Watson.  When I had the echo effect re-enabled, I heard the echo on start of playback.  This is a bad thing.  And to understand why I should delve into more details about the effect.  Recall that all effects have a generous number of buffers they write into?  the echo effect takes about 40 of these buffers.  Let’s step back a bit.

Imagine the volume effect is increasing the volume about by 10 percent.  Now, the echo effect receives data from the volume effect.  The echo looks into the history of previous volume adjusted data and  adds that to the current audio.  What if it’s the first time the thing is being run?  (I recall hearing an echo on start of playback.  I hear noise during playback.  Could it be the echo effect is referencing invalid data?)

Into AQEffectData – I’ll change all the references to audio data to a special value called null when the effect starts.  Any attempt to access data at null will crash the application.  I hope it crashes, it makes debugging so much easier…  It’s just a single line, and it should be there anyways.  It didn’t…

Wait, the delay_ms variable was… never used.  That seems to be there for some reason.  Best way to understand code is through it’s history – I think this variables story was that before the effect was put into the game it was tested using different variable names.  delay_ms made sense, but I assigned each effect a value of m_amount (from the single-effect-for-everything days).  EffectEcho was one of the first effects to use this.  Well, the useless variable was removed.

It’s this particular setting of a delay of 0.75 milliseconds that’s causing some audio oddities.  Let’s investigate, shall we?

A delay of 0.75 milliseconds….  For each second, we process 44100 audio samples.  0.75 milliseconds is 33 audio samples.  So we go back in time and see what was 33 samples ago.  Also, we work on chunks of 1024 audio samples.  My theory is that the echo effect will only work reliably starting at 24 milliseconds (or 1033 samples).  Let’s test, shall we?

Exactly as expected!  At 24 milliseconds, the echo is barely noticeable (a jump from 24 to 0 ought not be noticeable).  While I’m at it, I’ll make sure to work out the upper limit of the echo effect.  There are 40 buffers to play with.  Or 40960 samples.   928 milliseconds.  That’s more than enough!  Honestly, I can’t hear the echo if it’s beyond 100ms.

Yep, it’s another bug.  Fixed it.  The issue?  What if I have 40960 samples and I go back 40000 samples?  What if I’m working on writing sample 0.  That’s 0 – 40000, -40000….  Add 40960 to get the right result, the existing code just ignored the case.  Doing the right thing fixed it.

Next is the time effect…  perfect – a bit glitchy on very fast speeds, but decent otherwise.

Pitch effect is last…  there’s the reason.  It’s like a hollow effect.  What can I do?!

Let’s try increasing the number of steps to 16.  This should greatly increase the accuracy (won’t run on iPod though!).  Slightly better.

Now to browse through the code.  There are 4 steps.  Each step first finds the fundamental frequency. Then if the segment is voiced it adjusts the pitch.

The fundamental frequency is the frequency that repeats the most over a segment.  Think of audio as a series of composed wave-forms.  These will repeat over time.  Then one that repeats the most often is the fundamental (the fundamental is related to the length of a single cycle).  (note – repetition is defined loosely here…)  The fundamental is also what we use for pitch detection (which works…  until I get confirmation of the opposite)

Then the pitch is done…  Pitch starts by computing the FFT.  The FFT is an arduous process – recall that audio is (logically for our purposes) made up of repeating waves.  The FFT transforms a segment of repeating waves into a summation of the actual waves (sines and cosines).  Mathematically (aka, ideally) the returned data from the FFT can be identically transformed back into the series of audio samples.  Recall our issue with -99 and 99 with the volume.  Something similar happens with the data here.  Except in this case we use floats, that use scientific representation.  In plain english, 230 would be written as 23*10^0 = 23*1.  Or 23 times 10 to the power of 0.  Again, assume a limit of 2 digits for the number.  Also assume one digit for the exponent.  Notice that the range of digits is very large, -99 times 10 to the power of 9 to 99 times 10 to the power of 9.  But, 236 times 10 to the power of 8 is simplified to 23 times ten to the power of 9.  Small values (-99 to 99) are perfectly represented without any gaps, but -990 to 990 only exist for each 10th number.  -990, -980, etc.  As numbers become greater they can still be represented, but it is assumed that large numbers and small numbers normally won’t be mixed so it’s ok….  (any CS person will complain that I’m oversimplifying things – that is true).

My first test is to see how much quality is lost from simply going into the spectral domain and back into the time domain.

The FFT works perfectly.  The audio sounds perfect.  (I removed all code that does the pitch and only left in the fft).

I recall PAF had to do some subsampling to get the fundamental frequency.  Changing that helped (but not enough)….

Let’s think a bit.  If the pitch (m_pitchFactor) is 1, then the FFT should not be changed (distortions should apply after the fact).  But that still won’t work – the voice is reconstructed from the peaks.

Ok, now that I’ve gotten a slice of thinking food (Pizza), let’s think about what “pitch change” entails.  It means taking the wave that is the voice and stretching or squishing it.  What do I mean?  A high pitch’s wave will “repeat” itself more frequently than a lower pitch’s wave.  Look at this application: http://www.falstad.com/fourier/  Note that the final wave is a composition of both the sines and cosines.  This should suffice with respect to our discussion.

Just sent an email to a former prof.  I’m trying to get a Cegep to teach Haskell.  Back to coding:

As I go about figuring out what’s going on; the array contains cosines and sines (as per expected).  The pitch algorithm takes the initial pitch and attempts to translate it.  It is this translation that is adding noise.  From what I’ve researched, this is another invertible transform (to go and return without any destruction of data).

Let’s see here, after it has computed the phases once it attempts to reuse the data.  What’s the quality of the original calculations?  Excellent I might add!  Well, excellent where there is no change in pitch.  What I have done is interpolate between the phases (the closer we get to normal voice, the more like the speaker it sounds).

I’m going to keep on looking at the code, but this is a fairly good boost in quality (partially luck, partially I’m figuring out how everything works).

At a word count exceeding 2012, I’m going to end this post.  Finally!  Poor reader, I pity you.

Unwanted Features

Thursday, February 3rd, 2011

These two days I have been debugging code.  Make it more stable so I can send it out safely.  Well; while debugging I found this very interesting behaviour….

Usually, I debug using The Charge of The Light Briggade for testing.  It’s the only poem that I’ve gone out of my way to annotate.  Well; when I try to use “The Jabberwocky” for testing, very interesting things happen.  First, the iPod touch dies.  Literally.

I call it dead when the home and top button don’t work anymore.  You press them, and the screen remains frozen.  Terminating the process from the debugger doesn’t help either.  The application is about as dead as… say, the device itself turned off with a dead battery?

The only fun part about this situation is that it gives me time to actually blog about the problem.  So far I have narrowed down the problem to a function called `discardAndResizeRecorders’.  This method finds the existing audio files and loads it.  Then it does a bit of clever mmapping…

The latest round of debugging reveals that the crash occurs when data is freed up.  I’ll keep track of the recording, it started at 17:13:20, and there is 300 seconds of buffer.  Ideally, this bug should be impossible to replicate.  Ideally, I wouldn’t have to restart the silly device, wait while it goes through several screens, and further wait until it records the poem to debug it.  But I guess that’s the nature of the application.  End: 17:14:58.

Elapsed time: 2:18, or 138 seconds.

The offending line is (drumroll!) close(m_file);

Yup, the line that closes the file.  munmap works flawlessly.  This should not happen…

Let’s dig deeper into the issue.  Look at that!  The iPod reboots itself even if I do nothing!  Awesome (read the heavy sarcasm)!  No, I do not want to register my iPod.  It’s not mine anyhow.

The file pointer had a value of 5.  And the lights just turned themselves off since I’m too motionless within the lab.  Well, programmers are sloth-like creatures.  Snail like?  Well, name a creature that doesn’t move much.

My paranoia tells me to set the filenames to 8.3 format.  That’s silly.  Crap… the debugger didn’t figure out the breakpoint was in a header file.  I must remember, XCode only likes breakpoints when they exist in source files (except when debugging has resumed in the source file).  That wasted… some time.

I’ll have to put this one aside and come back to it tomorrow….