Testing for Accuracy and Precision

Software testing has no boundaries at all. This discipline is so unique that it’s not very common to see systematic approaches due to the variety of material and the changing tradeoffs. A few weeks ago, I came across to a decent software testing article from a Microsoft engineer which was published on Live Spaces. Unfortunately, it was followed by 2 spam comments — was very ironic to see such an assertive article was ruined by two regular Russian spammers.

I love machine learning and classification. My whole life is being spent between two parameters: accuracy and precision. These are the common statistical values to determine how successful your system is. If you have a search engine, accuracy may tell you what percentage of retrieved documents are really relevant. And percentage is a value to determine how likely your results cover all the relevant documents available.

Surprisingly a few days ago, I was asked to break a machine learning system during a job interview. I was asked to come up with some possible cases. According to my own philosophy, accuracy and precision are parts of the system requirements. They are related with the quality of the overall product. But how are you going to collect information to come up with these numbers? Imagine you are working on a searching engine. Is it manageable to find n people and ask them manually if they like the results or not? Will your sample (n people) reflect your user base? How costly it will be and how objective? Is it really scalable? Is it possible to for a human to read all of the documents on the Web and decide which are really related to his search phrase? These are a few introductory level problems with analysis of accuracy and precision.

Post-Processing and the Importance of Feedback

It may not be critical for you to release a product with a target accuracy and precision. Mostly, consumer market suits this model the best. But this alone should not be translated into the “inessentiality of the quality tracking”. I am just advising you to track the quality after the release (similar to ship-then-test method). Detect which results are exit-links, provide instant feedback tools for users to relocate their results and etc. Use acquired feedback to improve the existing system. Testing may not be done with the release, you may need to discuss and analyze if your product is performing well and report to your development team and influence them with scalable user-oriented improvements.

Addresses Not Found in High Traffic

My sister found herself a new downloading hobby and I was not planning to be the hobby killer until everything became inaccessible for both of us. She’s heavily downloading recently, I’m not sure about the material but it’s  high load. Pages were coming slower on my side as it was expected and I’m not saying I have a wide bandwidth but overall bottleneck was not just the slower uploads or downloads.

UDP 53, what’s wrong there?

I started to recognize a pattern. My downloads were even more slower because resolving was failing miserably every time I try. I was not even able to resolve domain names to IP addresses. Had to check myself what might cause this problem. As a quick note, if your local DNS cache (managed by operating systems) doesn’t have a record of the domain name you’re trying to visit, you make a request to one of the nearby DNS servers to return the associated IP. If your nearby server doesn’t have that record, it asks to root servers etc. Most of my reader audience knows the story well. This communication is made on UDP port 53. UDP is a connectionless way to transmit data. Unlike TCP, you don’t have to spend time on three-way-handshakes to make a proper connection that both of the sides are aware of. But if your packets get lost, nobody is responsible. It’s like playing a game, many tradeoffs similar to every engineering issue.

I gently asked my sister to stop a while, and started receive not timed-out UDP answers back. Resolving problem was fixed. But I had to be convinced that UDP is the best ever been chosen from. I understood the fact the essential parameter was latency. We have to be fast, faster and fastest as possible. Wanted to take time back to understand why it is designed this way and my problem appeared with a solution in milliseconds.

Why DNS is using UDP?

Reliability versus fastness. Remind the rule. If you don’t have the address, ask a nearby name server. Is it implicitly saying “Don’t go too far.”? Probably it is. You’re not on a very reliable connection and if your traffic load is very high, there will be many conjunctions, long delays and large jitters. My dns requests most probably couldn’t even making it to the name server. And since my ISP’s name servers are not reliable, I was using OpenDNS. Translation: I was far far far away from the source.

I fixed the issue. Even crazy downloading is again on, my domains are resolving rapidly at the moment. I’m extremely happy. If you’re using OpenDNS at office or LANs which have more than 20+ clients, make yourself a favor and set up a local name server today.