Archive for the ‘Victorianator Blog Archive’ Category

More audio debugging

Friday, June 17th, 2011

So here’s how I’ve been debugging the audio.  Blogging while waiting for the game to run on the device.

The first thing I did was to start installing ffmpeg.  This will allow me to convert the audio files; see

The second was to remove all of the effects and to listen to the recording.  Thankfully, it was nice and crisp.  It was actually quite annoying hearing the weird sounds I made to vary the pitch.  Then there’s the sound of me unplugging the microphone…  All in all, very interesting!  And crisp!

Ok – let’s tackle the effects.  One by one.  First one is the volume (I call it internally augment.  actually, programmers have a weird way of working in bubbles and inventing vocabulary rather than standardizing at times.  It’s annoying).  Volume works except it doesn’t handle high volumes that well.  Imagine if there are 2 digits to hold audio information, capable of representing the numbers -99 to 99.  What happens when you try to add 1 to 99?  Well, we get -99.  That added quite a bit of noise.  Let’s make sure the signs don’t change and it sounds better.

There this background noise.  It’s annoying.  Is it part of the augment effect or part of the glue that connects audio effects together?  One way to find out is to copy data directly from the source to the destination without changing the volume (only one effect is running so we can do this easily).  It seems to be the glue.  That’s in AQBuffers.h (.h so it would inline; in retrospect it should be in a .c so fp optimizations can be applied).

AQPlaybackFromEffect might be the culprit.  This thing takes an audio source (such as a file or other effect), applies an effect, and returns audio data.  (These can be chained.  If there’s a bug here, it will amplify).  I recall fixing this some time ago; let’s get to it then!  Maybe the last fix got undone or something else changed…

The bug used to be in “AQAudioData::acceptData”.  It’s no longer being used?!  A quick search in the project reveals that it’s being used by the recorder…  This is fishy; well let’s get rid of it.  Stale code makes this thing cryptic…  Thankfully, deleting… wait…  brain slowly catching up…  I should be in AQPlaybackFromEffect — AQAudioData does the recording and it works perfectly!  (Sigh – I haven’t been in this piece of code in quite a while)

AQAudioBuffer::pop … that should be the culprit.  That looks better.  This object is a bit tricky.  I reserve more memory than is needed and guarantee that each effect has an additional buffer to play in.  This is very effective in simplifying the effect code (it ensures that all the subsystems that take audio data as input get the same amount of data, even if subsystems — including effects — output variable amounts of audio data).

What I’m doing is checking to make sure that the input/output correspond.  The number of inputs should equal the number of outputs.  That’s correct.

Let’s look at the arithmetic.  When we run out of space, we loop back and write over old data.  This should not matter for the volume effect, so let’s see what this does.  When the number of read elements exceeds the buffer size (excluding excess left for good measure) copy to the beginning of the data the data located at the end that has not been processed.

A bit of explanation, AQPlaybackFromEffect:pop returns 1024 elements of single channel audio.  There is a work area (place where I store audio temporarily) to store audio with effects to pass down to other effects or the output.  When the work-place is filled with audio, I start working from the beginning (the assumption whose validity I ensure is that when I start back at the beginning the previous effect has already processed all of the data).  Generous buffer sizes ensure that this assumption is valid at all times.

First thing, short circuit AQPlaybackFromEffect::pop so it only gives back audio data…  And there is still this odd sound…  AQAudioBuffer::pop might not be the issue…  Rather, let’s play the game through without any effect layer again.  Just to be sure there is added noise.  Ok, who calls pop?  AQPlayer uses pop to write data to the speakers…  Maybe it was just a fluke there were no glitches on the first few runs…

Maybe not.  What else changed?  There was the echo effect I accidentally re-enabled.  (How?  A bit of history is in order.  Initially, the audio effects were to be one massive thing that I just used.  So everything was initially created for a single massive effect that would magically take in some audio and do work.  So, there was no chaining, therefore echo effect was left to be run via this older code.

By removing the echo effect, we end up with AQEffectNone doing the processing.  Essentially, this effect copies data without changing it.

I think we have a second definitive culprit: the echo effect!  (the augment effect now sounds better, but augmenting the effect too much might be a BAD idea – can’t conserve much audio data if everything is either -99 or 99…)

Why am I sure the echo effect is a culprit?  Elementary my dear Watson.  When I had the echo effect re-enabled, I heard the echo on start of playback.  This is a bad thing.  And to understand why I should delve into more details about the effect.  Recall that all effects have a generous number of buffers they write into?  the echo effect takes about 40 of these buffers.  Let’s step back a bit.

Imagine the volume effect is increasing the volume about by 10 percent.  Now, the echo effect receives data from the volume effect.  The echo looks into the history of previous volume adjusted data and  adds that to the current audio.  What if it’s the first time the thing is being run?  (I recall hearing an echo on start of playback.  I hear noise during playback.  Could it be the echo effect is referencing invalid data?)

Into AQEffectData – I’ll change all the references to audio data to a special value called null when the effect starts.  Any attempt to access data at null will crash the application.  I hope it crashes, it makes debugging so much easier…  It’s just a single line, and it should be there anyways.  It didn’t…

Wait, the delay_ms variable was… never used.  That seems to be there for some reason.  Best way to understand code is through it’s history – I think this variables story was that before the effect was put into the game it was tested using different variable names.  delay_ms made sense, but I assigned each effect a value of m_amount (from the single-effect-for-everything days).  EffectEcho was one of the first effects to use this.  Well, the useless variable was removed.

It’s this particular setting of a delay of 0.75 milliseconds that’s causing some audio oddities.  Let’s investigate, shall we?

A delay of 0.75 milliseconds….  For each second, we process 44100 audio samples.  0.75 milliseconds is 33 audio samples.  So we go back in time and see what was 33 samples ago.  Also, we work on chunks of 1024 audio samples.  My theory is that the echo effect will only work reliably starting at 24 milliseconds (or 1033 samples).  Let’s test, shall we?

Exactly as expected!  At 24 milliseconds, the echo is barely noticeable (a jump from 24 to 0 ought not be noticeable).  While I’m at it, I’ll make sure to work out the upper limit of the echo effect.  There are 40 buffers to play with.  Or 40960 samples.   928 milliseconds.  That’s more than enough!  Honestly, I can’t hear the echo if it’s beyond 100ms.

Yep, it’s another bug.  Fixed it.  The issue?  What if I have 40960 samples and I go back 40000 samples?  What if I’m working on writing sample 0.  That’s 0 – 40000, -40000….  Add 40960 to get the right result, the existing code just ignored the case.  Doing the right thing fixed it.

Next is the time effect…  perfect – a bit glitchy on very fast speeds, but decent otherwise.

Pitch effect is last…  there’s the reason.  It’s like a hollow effect.  What can I do?!

Let’s try increasing the number of steps to 16.  This should greatly increase the accuracy (won’t run on iPod though!).  Slightly better.

Now to browse through the code.  There are 4 steps.  Each step first finds the fundamental frequency. Then if the segment is voiced it adjusts the pitch.

The fundamental frequency is the frequency that repeats the most over a segment.  Think of audio as a series of composed wave-forms.  These will repeat over time.  Then one that repeats the most often is the fundamental (the fundamental is related to the length of a single cycle).  (note – repetition is defined loosely here…)  The fundamental is also what we use for pitch detection (which works…  until I get confirmation of the opposite)

Then the pitch is done…  Pitch starts by computing the FFT.  The FFT is an arduous process – recall that audio is (logically for our purposes) made up of repeating waves.  The FFT transforms a segment of repeating waves into a summation of the actual waves (sines and cosines).  Mathematically (aka, ideally) the returned data from the FFT can be identically transformed back into the series of audio samples.  Recall our issue with -99 and 99 with the volume.  Something similar happens with the data here.  Except in this case we use floats, that use scientific representation.  In plain english, 230 would be written as 23*10^0 = 23*1.  Or 23 times 10 to the power of 0.  Again, assume a limit of 2 digits for the number.  Also assume one digit for the exponent.  Notice that the range of digits is very large, -99 times 10 to the power of 9 to 99 times 10 to the power of 9.  But, 236 times 10 to the power of 8 is simplified to 23 times ten to the power of 9.  Small values (-99 to 99) are perfectly represented without any gaps, but -990 to 990 only exist for each 10th number.  -990, -980, etc.  As numbers become greater they can still be represented, but it is assumed that large numbers and small numbers normally won’t be mixed so it’s ok….  (any CS person will complain that I’m oversimplifying things – that is true).

My first test is to see how much quality is lost from simply going into the spectral domain and back into the time domain.

The FFT works perfectly.  The audio sounds perfect.  (I removed all code that does the pitch and only left in the fft).

I recall PAF had to do some subsampling to get the fundamental frequency.  Changing that helped (but not enough)….

Let’s think a bit.  If the pitch (m_pitchFactor) is 1, then the FFT should not be changed (distortions should apply after the fact).  But that still won’t work – the voice is reconstructed from the peaks.

Ok, now that I’ve gotten a slice of thinking food (Pizza), let’s think about what “pitch change” entails.  It means taking the wave that is the voice and stretching or squishing it.  What do I mean?  A high pitch’s wave will “repeat” itself more frequently than a lower pitch’s wave.  Look at this application:  Note that the final wave is a composition of both the sines and cosines.  This should suffice with respect to our discussion.

Just sent an email to a former prof.  I’m trying to get a Cegep to teach Haskell.  Back to coding:

As I go about figuring out what’s going on; the array contains cosines and sines (as per expected).  The pitch algorithm takes the initial pitch and attempts to translate it.  It is this translation that is adding noise.  From what I’ve researched, this is another invertible transform (to go and return without any destruction of data).

Let’s see here, after it has computed the phases once it attempts to reuse the data.  What’s the quality of the original calculations?  Excellent I might add!  Well, excellent where there is no change in pitch.  What I have done is interpolate between the phases (the closer we get to normal voice, the more like the speaker it sounds).

I’m going to keep on looking at the code, but this is a fairly good boost in quality (partially luck, partially I’m figuring out how everything works).

At a word count exceeding 2012, I’m going to end this post.  Finally!  Poor reader, I pity you.

The latest bug fix

Thursday, June 9th, 2011

Ominously silent I have been but busy coding I was.

I believe the latest bug fix is useful when documenting.  I’ll try to cater to readers who don’t know programming; but they may require a bit more patience to decipher what I wrote.

First, about the bug itself.  It manifests itself semi-randomly.  Practically unpredictable.  It appears most often on the second or third attempt at gesturing a poem.  It happens more frequently when recording a poem.  Selected poem or level have no discernible impact on hitting the bug (I wasted a bit of time trying to find more patterns; at one point I was convinced it was levels, then that it was changing poems, etc.  But in retrospect that wasn’t the case).

The Victorianator is written in multiple variants of C (C, C++, Objective-C).  C is used for the critical parts, C++ in most places, and Objective-C glues everything together nicely.  Conversely, Objective-C is used to describe a high-level view of what’s happening and C goes into the fine details.  For example, one will say that audio effects must be applied (and the parameters), the other will specify how the audio signal gets manipulated to achieve those effects.

The one issue I have with Objective-C is that often-times it does not distinguish between things explicitly, that it is typed.  In Objective-C, practically everything is seen as an object.  Messages are sent to objects, and you hope it does what you want it to do (the language does not enforce that a chair is really a chair and not some other object).  In C++, you typically have to be very specific.  And depending upon what is being done, one is more convenient than the other.

Both C++ and Objective-C deal with “objects”; things that can be queried and asked to do things.  C++ is very specific about the objects and Objective-C lets things slide.  Interestingly, when an object is created in C++ (things just materialize out of thin air on demand in programming languages… but it has to be destroyed eventually to free memory) the person writing the application must determine when the object will be destroyed (barring clever use of smart pointers and reference counting).  Objective-C is at heart reference-counted, meaning each place a given object is used (say a chair) a counter is incremented (1 people use it, 2 people use it, etc.) and decremented once it is not used (once per place).  Once the counter hits 0, no usage, the object gets destroyed.

The point?  I’m getting to it!

The object that handles the gesturing is told where the gesture data is stored.  Then it uses this gesture data to present what is being drawn.  What if… every so often… the object queries for gesture data (it doesn’t know where since it hasn’t been given it yet).

There, that’s the base of the bug.  Let’s go into details, the gestures are C++ objects.  Also, the gesture screen is always running (it’s very silent except when it displays itself).  So, when it’s woken up it might have the location of the previous gestures.  The previous gestures might have been destroyed, so the data is now invalid.

Fixing was simple – only use gesture data after it’s been assigned.  Literal – that’s what I told the computer.

Finding the bug is the adventure.  See, the bug happens only once in a while.  What do I do?  Launch the game within the simulator, try to find a pattern that will reveal the bug (it’s semi-random, not predictable, very annoying).  It was always the same bug, the name of the gesture returned was invalid.

First thing is to put a break-point on the line of code before the error that fetched the name of the gesture.  A break-point halts execution of the program at a given line of code and allows me to look at the values within variables.  It also allows me to follow queries to objects.  In this case, MotionsGL (handling the motions screen) was calling TemporalMultiMotion (an object that given a time will return what gesture the player has to do).  The break-point was in MotionsGL on the line where it called “nextGestureName” to figure out what the next gesture would be (helps the player when they know what’s coming).

After about 10 tries (letting the game run, perusing code or reading articles during each play through) I hit the invalid next gesture.  Within the TemporalMultiMotion I can also see that the gesture name is invalid.  That’s a good sign; I guess.  Also, all of the data relating to the next gesture was invalid.  The gesture should start in about 2 million seconds, or 24 days.  It should also last about 2 million seconds…  Definitely not right.

Could this data be something else?  A memory overrun is when an object writes to memory beyond its given space.  (each object is given a small space in the computers memory; it’s convention that they keep to themselves but bad-behaving objects that do wreck the data of other objects do occur — usually out of programmer error rather than malice).  So I examined the data as though it were of type “short” (type of number), this would be what I expect the data if it were audio.  Looking at the data, it could be audio – maybe.  But this is a very bad recording, with nothing at the beginning – it would be zero in that case not some random 4-digit numbers.

Within the debugger, it is quite easy to tell the system “I know it is supposed to be an object of type X, but show me what would I see if it were of type Y…”  Very convenient.  So let’s go on, and see if it’s character data.  If I’m lucky I’ll get something legible.  Yup, I got a series of characters that read “t Texture Mission.01.” (quick note, the gesture name is not stored as a series of characters but as an Objective-C object – so converting it to characters can give weird things).

Glancing at the debug output (many applications write out information, a log of sorts, about what it’s doing that the user never sees.  This log allows me to see that certain key events happened as they should) – I noticed the text “Lost Texture Mission.01.”…  That’s not good.

I doubt the logging stuff is corrupting the system.  Too many people use it, if it was corrupting things it would be fixed rather quickly.

Now that I have determined that TemporalMultiMotion is the source of the bad data – where does it get its data?  It receives it from MotionsGL while audio is being set up.

I put a break-point in that spot.  2 break-points total.  Why two?  I want to confirm that the data did not get corrupted since it is assigned to the time it is used.  This will help narrow down what happened to either MotionsGL is doing something very bad that is wrecking the data or something is giving the wrong data to MotionsGL (that gets forwarded to TemporalMultiMotion).

On the first few runs, all the gesture data was correct; MotionsGL properly set up TemporalMultiMotion and got the correct data.  On the run where things died, MotionsGL attempted to get data before it gave the correct data.  The breakpoints were hit in the opposite order that they should have been.

Knowing this, I asked the debugger to check if audio was initialized.  It said no.  This confirmed that the data was not set up.

And that’s debugging.

Gallery section screens

Saturday, March 19th, 2011

Those are first drafts/placeholders till the robot is animated and composited


Wednesday, March 9th, 2011

A picture is worth a thousand words:

Scoring Screen, Code

Wednesday, February 16th, 2011

So, Mo posted a scoring screen.  I start integrating a scoring screen.  All that’s left is to listen for screen touches and to make the screen listen to the in-game score instead of something hard-coded to 50%.  This was a fairly simple screen to put together since most of the elements were recycled.  Instead, I had fun turning on and off the lights on the outside of the score-circle:

The score is supposed to be 50%.  However, whenever I see needles I think of a continually jumping needle.  Sometimes it appears stable, sometimes it just jumps around.  Of course, always around the same spot.  So, here is grapher to the rescue:

First thing, I knew I wanted it to oscillate…  first thing that came to mind was sin.  So I started with sin(x).  Then, I added sin(5x).  That seemed too jittery, but sin(x)*cos(10x) did something interesting.  The jittery cos(10x) was attenuated by the sin(x).  But that seemed too predictable.  So I added a few more terms in an ad-hoc fashion…

What we get is the nice graph as seen in the picture.  The resulting function repeats itself every 2PI seconds.  The needle oscillates a bit before the desired score, a bit after, then it appears to get excited.  Until it reaches a point where it appears to stabilize, oscillating a bit before and after the desired score.

Scoring screen

Tuesday, February 15th, 2011

Bug fixes

Tuesday, February 8th, 2011

The last crash bug.  Updating iOS fixed the problem.

Resources are coming in too slowly to actually finish the beta on time.  I’m not surprised.

Back to being my usual jaded self….

No Mic Detected error screen

Sunday, February 6th, 2011


Friday, February 4th, 2011

My computer died again,
it was supposed to be fixed.
I’m stuck in a MAC again and I’m scared.

I’m working on the game design doc and on the error massages sent by the website again.
I updated wordpress not so long ago and took care of all of this, but is seems to regress for a reason I can’t understand.
FTP on mac is a little different, I just got introduced to Cyber duck…it seems decent.
To be continued…

The GGD is still almost finished.
I need someone to read through it to make sure my syntax makes sense.

This week was supposed to be Beta Milestone.
I have no clue of what happened with my team, they seemed to be unavailable for the usual weekly meeting.
I hope we will be able to meet the code freeze deadline.

Hey, where’s the Iphone?

Unwanted Features

Thursday, February 3rd, 2011

These two days I have been debugging code.  Make it more stable so I can send it out safely.  Well; while debugging I found this very interesting behaviour….

Usually, I debug using The Charge of The Light Briggade for testing.  It’s the only poem that I’ve gone out of my way to annotate.  Well; when I try to use “The Jabberwocky” for testing, very interesting things happen.  First, the iPod touch dies.  Literally.

I call it dead when the home and top button don’t work anymore.  You press them, and the screen remains frozen.  Terminating the process from the debugger doesn’t help either.  The application is about as dead as… say, the device itself turned off with a dead battery?

The only fun part about this situation is that it gives me time to actually blog about the problem.  So far I have narrowed down the problem to a function called `discardAndResizeRecorders’.  This method finds the existing audio files and loads it.  Then it does a bit of clever mmapping…

The latest round of debugging reveals that the crash occurs when data is freed up.  I’ll keep track of the recording, it started at 17:13:20, and there is 300 seconds of buffer.  Ideally, this bug should be impossible to replicate.  Ideally, I wouldn’t have to restart the silly device, wait while it goes through several screens, and further wait until it records the poem to debug it.  But I guess that’s the nature of the application.  End: 17:14:58.

Elapsed time: 2:18, or 138 seconds.

The offending line is (drumroll!) close(m_file);

Yup, the line that closes the file.  munmap works flawlessly.  This should not happen…

Let’s dig deeper into the issue.  Look at that!  The iPod reboots itself even if I do nothing!  Awesome (read the heavy sarcasm)!  No, I do not want to register my iPod.  It’s not mine anyhow.

The file pointer had a value of 5.  And the lights just turned themselves off since I’m too motionless within the lab.  Well, programmers are sloth-like creatures.  Snail like?  Well, name a creature that doesn’t move much.

My paranoia tells me to set the filenames to 8.3 format.  That’s silly.  Crap… the debugger didn’t figure out the breakpoint was in a header file.  I must remember, XCode only likes breakpoints when they exist in source files (except when debugging has resumed in the source file).  That wasted… some time.

I’ll have to put this one aside and come back to it tomorrow….