How We Test
|
Richard Baguley Published on March 15, 2007 Comment on this |
Here at Wirelessinfo.com we test cell phones using rigorous scientific methods. We use the same tools and techniques that the manufacturers use to analyze every aspect of the performance of a phone, from the images it shoots to how it captures and transmits audio.
Audio Testing
We evaluate the electroacoustic performance of the cell phones (meaning the quality of the sound produced and captured by the electrical components) objectively using a test system from Listen, Inc which is compromised of a SoundCheckTM, electroacoustic test system, and a Brüel & Kjær Head and Torso Simulator (HATS) with a handset positioner. Skype is used to provide a connection between the SoundCheck system and the cell phone network. Each of these components is described below:
SoundCheck
SoundCheckTM, from Listen, Inc. is an electroacoustic measurement and analysis package widely used for testing telephone components and complete phone systems, both on the production line and in R&D applications. It is a PC and sound card based system which communicates with both analog and digital equipment using standard, non-proprietary interfaces.
The Brüel & Kjær hardware is directly controlled through SoundCheck, and Windows Multimedia is used to communicate with the soundcard and the Skype application. Tests are fully programmable, enabling the sound signal, various analysis methods and result output format to be selected.
Head and Torso Simulator
![]() |
| The Head and Torso Simulator with a cell phone in position |
Skype
In our tests, the phone calls are made between Skype and the wireless phone under test. Using this offered several advantages. Firstly, it avoided the need for expensive base station emulators (these can typically cost over $100,000, and one would be needed for both CDMA & GSM). Secondly, we are testing the complete system, phone and network, since this is what a consumer is using when they make calls on a cell phone. An added advantage is that it can communicate with SoundCheck using the Windows Multimedia interface. Skype was selected as it claims a flat frequency response over the frequency range 50Hz to 8kHz (see here), and our own measurement results showed it to be flat over the range 100Hz to 8KHz – more than sufficient to cover the bandwidth of the network (300Hz – 3.4kHz). Several test calls were evaluated on each phone and the results averaged to minimize the effects of any dropouts in the connection.
Tests
We measured SEND (how well the sound is transmitted), RECEIVE (how well you hear the person), and SIDETONE (how much of your own voice you hear through the receiver when you talk). Receive performance was measured on both the receiver (earpiece) and hand-free speaker (speakerphone). For both send and receive we measured frequency response. This determines the tonality of the voice, e.g. either too bright or too much bass.
Test Configurations
![]() |
| Test configuration for measuring Send performance |
The test signal is generated by the signal generator contained within SoundCheck. This signal is then played through the artificial mouth of the HATS, which simulates the effect of a person speaking. The mobile phone is held in a normal operating position by a handset positioner (see photo above). This ensures that the cellphone is held is the correct operating position for the test and all phones are tested under exactly the same conditions. The signal is digitally transmitted over the network, where it passes through a cellphone network gateway and continues its journey via the internet to the softphone. The incoming signal to the softphone is measured and analyzed using SoundCheck.
The Send frequency response should be within limits set in the TIA-810-B standard. These limits are marked on the graph so it can be clearly seen if the response falls outside the limits. If the response is above the limits at high frequencies your voice will sound ’tinny’ or bright, and if it above the limits at low frequencies your voice will sound ‘boomy’.
![]() |
| Test configuration for measuring receive performance |
The receive set up is the inverse of the send configuration. The signal generated from the SoundCheck software is transmitted via Skype to the internet, through the network gateway, and through the air to the cellphone. The cellphone is held in the correct test position against the ear of HATS, which receives the signal and transmits it to SoundCheck for measurement and analysis.
The Receive frequency response should be within limits set in the TIA-810-B standard. These limits are marked on the graph so it can be clearly seen if the response falls outside the limits. If the response is above the limits at high frequencies voices will sound ’tinny’ or bright, and if it above the limits at low frequencies voices will sound ‘boomy’.
![]() |
| Test configuration for measuring sidetone performance |
Sidetone is measured by sending the signal created by the signal generator to the artificial mouth of HATS. This is picked up by the cell phone microphone and some of it gets played back out the receiver. This received signal is ‘heard’ by the artificial ear in HATS and transmitted back to SoundCheck for analysis. The rest of the circuit (the gray portion) is necessary for the call to be made, but plays no part in the measurement itself.
Sidetone is how much of your own voice you hear through the receiver when you talk. This is designed by the manufacturer to be a specific level. Too little and you find yourself shouting because you can’t hear what you are saying, and too much and you find yourself talking too quietly to be heard clearly.
Sidetone frequency response is measured. It is usually quite irregular, so the STMR (overall loudness loss between mouth and ear) is used instead. A loudness loss is supposed to be about 18dB +/-5dB.
Producing the scores
To produce the score that you see alongside the graphs for send and receive frequency response, we measure the performance of the phone against the limits defined in the the TIA-810-B standard, indicated by the red lines in the graph. The smoother the curve and the closer it is to an ideal line directly between the curves, the higher the score it gets. For sidetone, we measure how close the STMR is to the amount set in the standard of 18DB. The closer the phone comes to the 18dB standard, the higher the score.
Image Testing
We use Imatest to examine the quality of the images captured by the cell phones. This professional image testing program can run a variety of tests on images to judge the color accuracy, noise level, chromatic aberration and other criteria. We rate the quality of the images on the following tests:

Resolution - Imatest analyzes a photo of a resolution chart, producing a measure called line width per picture height. This indicates the number of parallel black and white lines that could fit into the image without the lines blurring together. WE use this score to calculate the resolution score.
Color – We take a photo of a Gretag Macbeth color chart at 3000 lux (the equivalent of bright sunlight) and analyze the image in Imatest using the Colorcheck test. This determines how accurate (or, more likely, inaccurate) the captured colors are, producing a chart like this which shows the original color, the captured color, and the corrected color as shown on the right:Imatest also produces a graph that shows the color error:

The circles on this chart represent the captured colors, and the squares where they ideally should be. The shorter the line, the smaller the color error and the better the image quality.
Noise – DigitalCameraInfo.com tests noise by increasing the ISO setting, but most cell phones don’t have an ISO setting. Instead, we test noise by taking a series of images of the color chart at descending lighting settings: starting at 3000, then 1500, 500 and finally 60 (the equivalent of a poorly lit room). Each of the images is analyzed in Imatest using the Colorcheck test, and the noise for each of the channels (R,G,B and Y) is noted. We then plot this noise on a graph to determine the trend, and score based on the slope of the line.Unlocked Standby to First Shot - For this test we want to see how long it takes to go from standby on the handset until we have captured a photo. As with most of our timed tests we start with the handset either closed, where applicable, or on the home screen unlocked. We end the test when we have captured a photo. We do the test repeatedly until we get the lowest replicable score.
Shot to Shot Time - In this test we want to see how long it takes to capture a series of photos. When possible we use a camera's burst mode to do this test. Where burst mode is not available we take a series of five photos manually. If possible we turn off the auto-review feature. We also do the manual test in cases where a camera's burst mode takes photos at a significantly lower resolution. In these case we score on the burst mode but provide the manual result for comparison. We do this test multiple times to ensure that we have a replicable result.
Shutter to Shot Time - This tests attempts to judge how long it takes the phone to actually capture a photo once you've hit the shutter button. We start this test when we begin to press down on the shutter button and complete it when a photo is captured. We do the test repeatedly until we get a replicable result. In cases of phones with an auto-focus camera the time it takes to put the scene into focus is included in our score, but we will often provide the time from which the scene is in focus for comparisons sake.
Our battery tests are broken down into three categories: talk time, music and browsing. The specifics of each test are outlined below:
Talk time - We call the phones using Skype, then play back an audio book to simulate a conversation. All other settings are left on default.
Music – we load our standard test album (held in MP3 format for compatibility) onto the cell phones and set the phones to play this album repeatedly until they either stop playing or the shut down
Browsing – The phone browsers are pointed at a series of interlinked, automatically refreshing web pages. We also use a device (called the jabber) to automatically press a phone button once a minute to simulate someone browsing the web, hitting the keyboard to change web sites. The browsing experience is captured on video to note any problems with the browsing, in which case the test is re-run. We declare the test over when the phone stops loading pages, either due to a total shutdown or a radio disable from low battery power.
Signal Strength
All of out tests are carried out using the live networks of the major carriers. as such, signal strength is an issue: we have no control over the vagaries of the cell phone network. Our testing facility is located indoors, so we use a Wilson Electronics Dual-band SOHO Cellular/PCS Amplifier to boost the signal strength, with a large antenna on the roof. This provides a boost for all of the major networks. Although we cannot completely eliminate signal strength as a factor in our tests, this amplifier ensures that no network or phone is at a disadvantage in our testing due to poor signal strength.
How We Score
We use a different approach to scoring than most: instead of generating scores out of 5 or 10, our scores are open-ended. We design our tests and product analysis to generate figures (such as the time to complete an operation, the number of features a device supports or the figures produced by test programs such as ImaTest or Soundcheck), then run an analysis on that figure that produces a score. This score is scaled to what we consider a range of bad to good, with a bad score getting low numbers (say, 1 or 2) and a good score getting high numbers (say, 9 to 10). But it doesn’t stop there; with our system, if a product comes along that is markedly better than the ones before it, it can earn more than 10.
Other web sites are left scrambling when this happens: they are forced to just give it the highest score (and hope nobody notices) or to rewrite their testing rules and hope nobody notices that their old scores are no longer valid.
Our system adapts: it can still produce scores as products get better over time, giving you a better overview of how products get better over time. We produce a score like this for every aspect of a product that we test, with every individual aspect of the product getting its own score.
We do many tests that involve us timing how long an action takes, for example we time how long it takes to make a call or how long it takes to create a new test message. For all of these tests we start with the phone closed, if possible, or at the home screen unlocked. We begin timing when we start the procedure and stop timing when it is completed. We generally do five repititions of the timed test and then use the average time to calculate our score. Below is a table showing all of our timing tests, the end point for each and notes where applicable.
| Test | End Point | Notes |
| Dialing Speed | When we hit the send button | |
| Startup to Call | When we hit the send button | This test begins with the phone closed and off |
| Time to a new Email Message | When we have a new email message dialogue on screen | |
| Time to a new SMS Message | When we have a new SMS message dialogue on screen | |
| Adding Contacts | When we have hit the save button | We use five standard names (first & last) and phone numbers |
| Adding Calendar Items | When we have hit the save button | We use a lunch appointment the following day at 12pm with a reminder 15 minutes before |
| Adding ToDo/Task | When we have hit the save button | We use a task reminding us to get groceries due the following day |
| Adding Notes | When we have hit the save button | We use the following text for our note: lunch 12pm tomorrow |
| Accessing Music Software | When a song begins to play | We use the fastest method possible |
| Video Software Access | When a video begins to play | We use the fastest method possible |
We do two tests in our reviews to see how easy it is to type on a phone's keypad or keyboard. The first test is done holding the phone with two hands and, generally, using our two thumbs to type. The second test is done holding the phone in one hand and using a single thumb to type. When available we turn on features like predictive text and word completion. We do not take advantage of phones that will memorize commonly used phrases, however. The test begins when we start typing and ends after we have entered our standard piece of text:
The cute brown fox jumped over the lazy black puppy as he was eating his dinner.
We do each test five times and take the average time to calculate our words per minute score.
How Our Overall Scores Are Created
We then take these scores and roll them together to create an overall score. We do this by assigning a percentage to every category (such as 15% for audio quality, 10% for battery life, etc) that shows how much it contributes to the overall score, based on how important we think it is to the product.
Within each category, we assign a part of this percentage to each test. So, the individual test for the sent frequency response of a cell phone used as handset (when you hold it up to your ear) gets a percentage of 0.5%, the distortion gets 0.5%. Again, this depends on how important we think the test is to the overall use of the phone; we assign lower numbers to the speakerphone tests than the handset tests, because people don’t use cell phones as speakerphones as much as they do as handsets.
So, using our example again, the total percentage for all of the individual tests in the audio category adds up to 15%, and all of the battery life tests add up to 10%. We them multiply the score by the percentage for that test, and add the results up for the category and the device as a whole.
|
|
|








