Corruption and Correction: How Data Can Change the World
Corruption and Correction: How Data Can Change the World
Kenneth Do and Nurdin Hossain
Thomas Jefferson High School for Science and Technology
It’s 11:30 PM in Schaerbeek, Belgium. Many have spent the day casting their votes for the next members of the Belgian federal government. Emmanuel Willems, an IT specialist employed by the Belgian government, is finishing his dinner when he gets a call from an election official: something odd happened at one of the polling stations, and they need Emmanuel to come to take a look. When Emmanuel arrives, the official explains that one of the candidates, Maria Vindevoghel, has received an impossibly high number of votes given the number of citizens in the region. Thus, they begin a vote recount — the election officials are forced to manually count each ballot cast that day, inserting card after card into the machines. After several tedious hours, they enter the last few ballots and finish the recount. To the officials’ shock, the tallies remain the same for almost all the candidates…all except for one — Maria Vindevoghel. This time, Maria has received a far more reasonable number of votes, but it’s a staggering 4096 fewer votes than the original electronic count [1].
What makes the number 4096 especially significant is not merely its magnitude, but another special property — it is an exact power of 2 (212, to be precise)! To better explain the importance of this number, we need to talk about data. Every day, humans generate roughly 2.5 quintillion bytes of data on the internet [5]. From writing emails to watching YouTube videos and texting friends, our world relies on creating and maintaining data. Data is any information that a computer processes and stores. At the most fundamental level, computers store this information in a collection of ones and zeros called bits, which are physically represented by high voltages (ones) and low voltages (zeros) controlled by microscopic electronic switches called transistors [3]. Bits are just numbers, and like the numbers we use in our daily lives, bits can be used to count. Whereas we normally count in base ten, with ten unique number choices (0-9), bits count in base two or binary, with two unique choices (0 and 1). Binary numbers can even be converted into base 10 numbers and represent any form of data depending on their sequence and size. All of the photos on our phones, all of our emails, and even all of the words on this page, are simply a giant combination of bits.
As a thought experiment, imagine taking a message of bits and “flipping” one of the bits. In other words, take any random 1 or 0 in the message and convert it to a 0 or a 1, respectively. What would happen? On a somewhat benign level, if a computer flips a bit in a message it is reading (maybe in a text or an email), the message could become slightly corrupted (for example, if we were to flip the 5th bit in the binary code for the letter ‘a’, 01100001, from a 0 to a 1, we would read the letter ‘i’ instead). However, this is just for text. If a bit were to flip in a more valuable number like, say, an election tally, the results would be much more disastrous. Going back to this example, with our new knowledge of bits, it should be easier to see why powers of 2 are so special — the maximum storage capacity of a sequence of bits is, by nature, a power of 2. If you have 2 bits, you can store 2 x 2 = 4 sequences of information, if you have 3, you can store 2 x 2 x 2 = 8 sequences, etc. Since the Belgian candidate’s discrepancy was 212 votes, it was concluded to have been caused by a bit flip in the 13th bit (likely caused by atmospheric rays colliding with the transistors in the machine).
Bit flips can have disastrous consequences for elections, drivers, and everyday computer users, but today, they are a rare occurrence. This relative infrequency is a result of new hardware and software that minimize the risk of such events occurring. From a hardware perspective, one of the most important techniques that has minimized bit flipping is modular redundancy, which describes how machines are built with extra components in case a specific component fails. For example, a computer might be constructed with three different processing units so that if one of them fails, the other two can override the faulty unit and correctly receive an input. Modular redundancy describes how many of one type of component are in a given machine [6]. The scenario described above involving three components is an example of triple modular redundancy. For machines that demand low margins of error, higher degrees of redundancy can be used. The Space Shuttle, for instance, employs a five-way redundancy such that, even if two of the components fail, the other three can still mask that fault and carry out the proper instructions. Modular redundancy, by creating a system in which failure is tolerated and actively addressed, has been able to limit the destructive impacts of bit flips [6].
Software solutions are equally as important in protecting against data corruption. Error correction code, or ECC, is code that adds redundancy, or data-less bits, to data to allow for error detection and correction [4]. One of the earliest error-correction codes was the Hamming Code, created by Richard W. Hamming out of frustration with the error-prone computers of his time— “In a digital computer, on the other hand, a single failure usually means the complete failure, in the sense that if it is detected no more computing can be done until the failure is located and corrected, while if it escapes detection then it invalidates all subsequent operations of the machine” [4]. To get a sense of the direction Hamming was going in while formulating the Hamming Code, imagine a 16-bit (4x4) sequence of 1s and 0s that contains some kind of message. Hamming wanted to create an error-robust encoding with the least amount of redundancy possible — “Systematic codes may be defined as codes in which each code symbol has exactly n binary digits, where m digits are associated with the information while k = n - m digits are used for error detection and correction… This serves to measure the efficiency of the code…”. Abiding by this dogma, one of the first prototypes of the Hamming Code aimed to replace the 16-bit sequence with a 15-bit sequence and replace the last bit, dubbed the parity bit, with a 1 or a 0 depending on if the sum of the 15 previous bits was even or odd. Then, if 1 bit were to flip (assuming it's not the last one) and we summed the first 15 bits, its parity (whether the value is even or odd) would not match with the parity bit. The natural extension of this, then, was to not just have one parity bit for the entire sequence of bits, but multiple bits, each dedicated to a specific chunk of the message. If we think about the 16-bit message (including the parity bits) as a 4x4 array rather than a sequence, we can assign parity bits to specific overlapping chunks (rows, columns, blocks, etc.) of the array. Through this, we’ll be able to not only detect errors, but even find their exact location and fix them. With just a few 1s and 0s placed in clever positions, we’re able to not just improve the reliability of data transmissions, but bring life to our data, allowing it to scrutinize itself with no human intervention.
What makes error correction so fascinating is its autonomy — there’s no need to manually compare the original message with the received message. Additionally, there are many cases where it is impossible to manually compare both, like when NASA beamed a picture of the Mona Lisa to the Lunar Reconnaissance Orbiter, a spacecraft that’s been orbiting the moon since 2009 [2].
As can be seen above, the picture received by the Orbiter was discernible but grainy due to cosmic radiation, with many bits missing or flipped. Using a more modern error correction algorithm called Reed Solomon Coding (this ECC is used in modern-day CDs and DVDs), the Orbiter restored the grainy image to its original beauty. More importantly, it corrected the image on its own, without any prior knowledge of what the Mona Lisa looked like.
Events like cosmic radiation and hardware malfunction are, unfortunately, out of our control. Recognizing this, however, the pioneers of the field moved away from the idea of pure error prevention and mixed in error correction. Scientists across many decades have worked to improve not only the hardware to make it harder for bits to flip, but also software to detect and correct errors. Without these advancements, you wouldn’t be able to trust the data you put on your hard drive or even on the cloud! Implementing redundancy and error correction methods have been crucial in limiting the damage of data corruption and protecting us in an increasingly digital world.
References
[1] Adler, S., & McEwen, A. (Hosts). (2019, May 8). Bit Flips [Audio podcast episode]. In Radiolab. WNYCStudios. https://www.wnycstudios.org/podcasts/radiolab/articles/bit-flip
[2] Dunbar, B. (2013, June 6). NASA beams Mona Lisa to lunar reconnaissance orbiter at the Moon. NASA. Retrieved December 10, 2021, from https://www.nasa.gov/mission_pages/LRO/news/mona-lisa.html.
[3] Data. (n.d.). TechTerms.com. Retrieved December 9, 2021, from https://techterms.com/definition/data
[4] Hamming, R. W. (1950). Error detecting and error correcting codes. Bell System Technical Journal, 29(2), 147–160. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x
[5] Marr, B. (2019, September 5). How much data do we create every day? the mind-blowing stats everyone should read. Forbes. Retrieved December 10, 2021, from https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=24f4463160ba.
[6] Sengupta, A. (2021, November 19). What are bit flips and how are spacecraft protected from them? ScienceABC. Retrieved December 9, 2021, from https://www.scienceabc.com/innovation/what-are-bit-flips-and-how-are-spacecraft-protected-from-them.html