Feb 10

Feb 10 A Point Picking Probability Problem

A Point Picking Probability Problem

Edwin Lu Thomas Jefferson High School for Science and Technology

This article was originally published in the 2021 print publication of the Teknos Science Journal.

When it comes to STEM research, math is an anomaly. The lack of application paired with the definiteness of each statement differs greatly from scientific research, which is a free flowing state between theory crafting and experimentation that nearly always exists in the real world. Why do we do math research then? When I asked myself this question, the kid in my 5th grade class asked the teacher, “Why do we even learn math?” pops up, as if to denounce everything I’ve learned so far. However, while math does not directly have a purpose in our daily lives, it creates the building blocks for nearly everything. When Pythagoras discovered the Pythagorean theorem, he did not immediately think of countless real world applications of it. And thus, we cannot immediately dismiss anything in this world as useless, as its true form can only be shown with time. Every science incorporates math in some form, whether it be the geometry behind chemical bonds, or the physics behind our universe, a concept will never remain a concept.

In my junior year of high school, I helped write problems and proctor for a middle school math competition. One problem that was proposed seemed very tame at first - “Given a randomly chosen point inside a unit square and a randomly chosen vertex of said square, calculate the probability that the distance from the chosen vertex to the point is shorter than the average of the distances from each vertex to the point.” - however we quickly realized that not only was this problem not fit for middle school students, none of us could solve it either. Several months later, during course selection season, I decided that this problem would characterize my research pursuit in the Computer Systems Lab. Computers provide the perfect replacement for actual numerical solutions, as they can easily model the exact probability through something called Monte Carlo simulations. For example, let us consider the point (0.5, 0.25) and the vertex (0, 0) of our unit square. The average distance to each vertex would be (.559 + .559 + .901 + .901)/4 = .730, while the distance to (0, 0) would be .559, which would be less than .730 and therefore satisfy the condition. In this way, we can accurately and efficiently estimate the answer to our problem by sampling a large amount of randomly chosen points using computer

Now, what do we do from here? We’ve chosen our several thousand or million points and used a computer to calculate our estimated probability (for reference, the calculated probability is around .4365578). Now we must ask ourselves how accurate this result really is. We can easily see how randomly choosing points can give us a variation of answers in the same way that if we flip a coin 1000 times it will not give us exactly 500 heads and 500 tails. Thus, we can discuss the uncertainty behind these Monte Carlo simulations. As stated previously, while random sampling is an effective method of producing a numerical answer to our problem, it suffers for the exact reason it is useful: randomness. As discussed by Roy and Gupta (2021), for sample sizes smaller than 1000 points, we can be hard-pressed to 95 percent prediction interval, however over 10,000 points can allow us to be relatively confident with our result. Due to the simplicity of the original problem, choosing even 100,000 points does not even take a minute to run, and over the course of a dozen or so minutes, we can easily calculate the probability for several million points.

However, the problem does not stop there. What if we consider instead a pentagon? Or a hexagon? For different regular polygons, we must calculate the probability in a different manner. For example, for a pentagon, we can split it into 5 equal area triangles, and randomly choose points from each triangle. This method of splitting up a polygon into equal area parts and then performing Monte Carlo simulations is known as “jittered sampling” (Pausinger et al., 2018). This method of jittered sampling allows us to calculate the probability for any regular polygon, and also gives us a different uncertainty than the standard Monte Carlo simulations. As discussed in Pausinger et al. (2018), the standard Monte Carlo simulations gives us a discrepancy of .0694, while our jittered sampling method gives us a discrepancy of .0638 for a partition from (0, 0) to (1, 1) and .0555 for a partition from (0, 0.5) to (1, 0.5). For an ideal distribution of random points, a discrepancy of 0 is allotted, and as such, a lower discrepancy value gives us a closer approximation. While these discrepancy values apply to only picking points and not calculating distances with them, we can also extend the calculations easily to our problem (S. Steinerberger, personal communication, Jan. 29, 2021).

In addition to using different polygons, there are many other interesting extensions we can solve. Using higher dimension hypercubes rather than a square, creating a random polygon using the Poisson point process are some of the more interesting ones. The Poisson point process is a randomly chosen set of points on a mathematical space. By limiting this mathematical space to a plane, we can create a 2-D convex hull from the Poisson point process (Formanov & Khamdamov, 2021). A convex hull of a set of points is essentially the smallest polygon that contains all the points. With this randomly generated convex hull, we can attempt to solve the original problem in the same way we substituted the square for a pentagon or hexagon.

Besides the Poisson point process, another interesting extension we can explore is using non-Euclidean geometry, such as taxicab geometry or elliptic geometry. Euclidean geometry is the standard geometry that everyone knows and loves, however, there are many other types of geometry. Elliptic geometry, also known as Riemannian geometry, also provides a very interesting twist on our common knowledge of geometry. Riemannian geometry is named after the mathematician Bernhard Riemann, and was the foundation for Einstein’s theory of special relativity (Siegfried, 2018). Riemann presented a paper titled “On the Hypotheses which Lie at the Foundations of Geometry”, challenging the original axioms behind Euclidean geometry, a paper which helped pave the way in physics decades later.

Now again, we must ask ourselves what the purpose of our original question is. This purpose lies in something called the Floyd-Warshall algorithm. An example of the Floyd-Warshall algorithm can be seen in the game Sim City. As Adam Conrad (2018) discusses, in Sim City, it is not sufficient to have one hospital, or one police station, for a massive city. As such, the Floyd-Warshall algorithm can help optimally place hospitals such that the distance required for anyone in the city to receive help is minimized. It is in this sense that this problem can be useful, and while hospitals and police stations are not located in a square-like formation, it can provide us useful data to ensure that buildings are placed properly and effectively.