Ten interesting algorithms, as a Software Engineer you should know

10 min readDec 24, 2020

#1 Bloom Filter

A Bloom Filter is a data structure designed to tell, rapidly and memory-efficiently, whether an element is present in a set. It is based on a probabilistic mechanism where false-positive retrieval results are possible, but false negatives are not. In other words, a query returns either “possibly in set” or “definitely not in set”. However, elements can only be added, not removed.; the more items added, the larger the probability of false positives. Bloom filters are both fast and space-efficient.

References:

Understanding Bloom Filter — DZone Big Data

A Bloom Filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present…

dzone.com

Bloom Filters by Example

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present…

llimllib.github.io

#2 PageRank

PageRank is named after Google co-founder Larry Page, and is used to rank websites in Google’s search results. It counts the number, and quality, of links to a page which determines an estimation of how important the page is. The underlying assumption is that pages of importance are more likely to receive a higher volume of links from other pages.

PageRank is defined in the original Google paper as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where,

we assume that a page A has pages T1 to Tn which point to it (i.e., are citations).
d is a damping factor which can be set between 0 and 1. It is usually set to 0.85.
C(A) is defined as the number of links going out of page A.

References

5.1. The PageRank algorithm — Chapter 5. Centrality algorithms

This section describes the PageRank algorithm in the Neo4j Graph Algorithms library. PageRank is an algorithm that…

neo4j.com

#3 Hilbert Curve

The algorithm implements a recursive procedure that involves simple integer operations and quickly converges to the set of points that make the Hilbert curve. The algorithm is elegant, short, and considerably easier to implement than previous recursive and nonrecursive algorithms and can be efficiently implemented in all programming languages that have integer operations and allow recursion. Location-based databases are extensively used by apps like Google Maps, Uber, and Swiggy. We explore the data structures and algorithms which allow spatial or location-based queries, like the quad tree and the Hilbert Curve.

References

Algorithm for Analyzing N-Dimensional Hilbert Curve

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3739) Multidimensional Space Hilbert Mapping…

link.springer.com

US20030004938A1 — Method of storing and retrieving multi-dimensional data using the hilbert curve …

An improved method of partitioning and indexing multi-dimensional data that maps the data to one-dimensional values…

patents.google.com

(PDF) Algorithm 781: Generating Hilbert’s Space-Filling Curve by Recursion

An efficient algorithm for the generation of Hilbert’s space-filling curve is given. The algorithm implements a…

www.researchgate.net

#4 HyperLogLog

HyperLogLog is an algorithm developed by Facebook’s engineering team for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical accuracy (standard error) of 2%, using 1.5 kB of memory.

Suppose you have a large data set of elements with duplicate entries chosen from a set of cardinality n and you want to find n, the number of distinct elements in the set. For example, you would like to compute the number of distinct people who visited Facebook in a given week, where every person logs in multiple times a day. This would result in a large data set with many duplicates. Obvious approaches, such as sorting the elements or simply maintaining the set of unique elements seen, are impractical because they are either too computationally intensive or demand too much memory.

References

HyperLogLog in Presto: Faster cardinality estimation — Facebook Engineering

Computing the count of distinct elements in massive data sets is often necessary but computationally intensive. Say you…

engineering.fb.com

#5 Collaborative Filtering

Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future. The concept is simple: when we want to find a new movie to watch we’ll often ask our friends for recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes similar to our own.

Most collaborative filtering systems apply the so-called similarity index-based technique. In the neighborhood-based approach, a number of users are selected based on their similarity to the active user. Inference for the active user is made by calculating a weighted average of the ratings of the selected users. Collaborative-filtering systems focus on the relationship between users and items. The similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.

That’s how Netflix onboards its content.

References

Collaborative Filtering | Recommendation Systems | Google Developers

“type”: “thumb-down”, “id”: “missingTheInformationINeed”, “label”:”Missing the information I need” },{ “type”…

developers.google.com

Markov Random Fields for Collaborative Filtering

In this paper, we model the dependencies among the items that are recommended to a user in a collaborative-filtering…

arxiv.org

#6 Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence (AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree search implementations alongside machine learning principles of reinforcement learning.

In tree search, there’s always the possibility that the current best action is actually not the most optimal action. In such cases, the MCTS algorithm becomes useful as it continues to evaluate other alternatives periodically during the learning phase by executing them, instead of the current perceived optimal strategy. This is known as the “exploration-exploitation trade-off ” . It exploits the actions and strategies that are found to be the best till now but also must continue to explore the local space of alternative decisions and find out if they could replace the current best.

References

Monte Carlo Tree Search

MCTS For Every Data Science Enthusiast

towardsdatascience.com

General Game-Playing With Monte Carlo Tree Search

A primer on the game-playing algorithm behind AlphaGo.

medium.com

#7 Kadane’s Algorithm

A simple idea of the Kadane’s algorithm is to look for all positive contiguous segments of the array (max_ending_here is used for this). And keep track of the maximum sum contiguous segment among all positive segments (max_so_far is used for this). Each time we get a positive-sum compare it with max_so_far and update max_so_far if it is greater than max_so_far.

Kadane’s algorithm works by maintaining the start position of a subarray and repeatedly looking at the next element in the array and deciding to either.

Extend the subarray by appending that element, or
Discarding the subarray and starting a new subarray after that element.

References

Kadane’s Algorithm — (Dynamic Programming) — How and Why does it Work?

If you are here, then chances are that you were trying to solve the “Maximum Subarray Problem” and came across Kadane’s…

medium.com

A quick look at Kadane's Algorithm | Coding Ninjas Blog

A Guide to Consistent Hashing

In recent years, with the advent of concepts such as cloud computing and big data, distributed systems have gained…

www.toptal.com

Consistent Hashing

Consistent hashing idea was introduced in paper Consistent Hashing and Random Trees: Distributed Caching Protocols for…

medium.com

#9 Diffie Hellman — Curve25519

The Diffie-Hellman algorithm is being used to establish a shared secret that can be used for secret communications while exchanging data over a public network using the elliptic curve to generate points and get the secret key using the parameters.

Whatsapp uses Curve25519, state-of-the-art Diffie-Hellman function, that computes the user’s 32-byte public key. Given the user’s 32-byte secret key and another user’s 32-byte public key, Curve25519 computes a 32-byte secret shared by the two users. This secret can then be used to authenticate and encrypt messages between the two users.

The protocol uses compressed elliptic point (only X coordinates), so it allows efficient use of the Montgomery ladder for ECDH, using only XZ coordinates. Curve25519 is constructed such that it avoids many potential implementation pitfalls. By design, it is immune to timing attacks and it accepts any 32-byte string as a valid public key and does not require validating that a given point belongs to the curve, or is generated by the base point.

References

Curve25519: New Diffie-Hellman Speed Records

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3958) This paper explains the design and…

link.springer.com

Implementation of Elliptic Curve25519 in Cryptography | IntechOpen

Bernstein's design implementation of elliptic Curve25519 in key exchange is claimed to be highly secure and efficient…

www.intechopen.com

#10 Elo Rating

Elo Rating Algorithm is widely used rating algorithm that is used to rank players in many competitive games or to rate people in dating apps like Tinder, OkCupid, Bumble.

FaceMash was Facebook’s predecessor, and was developed by Mark Zuckerberg in his second year at Harvard. If you’ve seen The Social Network, you might remember this scene when Zuckerberg asks Eduardo Saverin for the equation used to rank chess players, except this time it was for rating the ‘attractiveness’ of female Harvard students.

The movie claims that the Elo equations (although written incorrectly in the above scene) were used in the algorithm in the original FaceMash website. Two students were shown side-by-side and users could vote on who was more attractive. In this scene, Rᴬ can be interpreted as student A’s Elo rating, Rᴮ as student B’s Elo rating, Eᴬ as the probability that student A is more attractive than student B and Eᴮ as the probability that student B is more attractive than student A.

Applyting the same principle in Tinder, the difference in the ratings between two users serves as a predictor of the outcome of a swipe. Two users with equal ratings who are compared against each other are expected to score an equal number of wins. A user whose rating is 100 points greater than their opponent’s is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.

References

Tinder's Algorithm - How It Determines Who Gets to See Your Profile & What You Can Do About It.

Tinder's matching algorithm and the score it assigns to you determines whose profile you are shown and to whom your…

www.swipehelper.com

Understanding the Elo Rating System

In this article we’ll learn about the Elo Rating system. We’ll look at what problems it can be applied to and use…

mattmazzola.medium.com

Powering Tinder® - The Method Behind Our Matching

We're constantly asked about Tinder's algorithm. How are recommended profiles ordered, and why? Is there a way to game…

blog.gotinder.com

A lot of references from various articles, publications, and videos by Gaurav Sen

Ten interesting algorithms, as a Software Engineer you should know

Understanding Bloom Filter — DZone Big Data

A Bloom Filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present…

Bloom Filters by Example

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present…

5.1. The PageRank algorithm — Chapter 5. Centrality algorithms

This section describes the PageRank algorithm in the Neo4j Graph Algorithms library. PageRank is an algorithm that…

Algorithm for Analyzing N-Dimensional Hilbert Curve

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3739) Multidimensional Space Hilbert Mapping…

US20030004938A1 — Method of storing and retrieving multi-dimensional data using the hilbert curve …

An improved method of partitioning and indexing multi-dimensional data that maps the data to one-dimensional values…

(PDF) Algorithm 781: Generating Hilbert’s Space-Filling Curve by Recursion

An efficient algorithm for the generation of Hilbert’s space-filling curve is given. The algorithm implements a…

HyperLogLog in Presto: Faster cardinality estimation — Facebook Engineering

Computing the count of distinct elements in massive data sets is often necessary but computationally intensive. Say you…

Collaborative Filtering | Recommendation Systems | Google Developers

“type”: “thumb-down”, “id”: “missingTheInformationINeed”, “label”:”Missing the information I need” },{ “type”…

Markov Random Fields for Collaborative Filtering

In this paper, we model the dependencies among the items that are recommended to a user in a collaborative-filtering…

Monte Carlo Tree Search

MCTS For Every Data Science Enthusiast

General Game-Playing With Monte Carlo Tree Search

A primer on the game-playing algorithm behind AlphaGo.

Kadane’s Algorithm — (Dynamic Programming) — How and Why does it Work?

If you are here, then chances are that you were trying to solve the “Maximum Subarray Problem” and came across Kadane’s…

A quick look at Kadane's Algorithm | Coding Ninjas Blog

READ NEXT An algorithm is finding the contiguous sub-array within a one-dimensional numeric array which has the largest…

A Guide to Consistent Hashing

In recent years, with the advent of concepts such as cloud computing and big data, distributed systems have gained…

Consistent Hashing

Consistent hashing idea was introduced in paper Consistent Hashing and Random Trees: Distributed Caching Protocols for…

Curve25519: New Diffie-Hellman Speed Records

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3958) This paper explains the design and…

Implementation of Elliptic Curve25519 in Cryptography | IntechOpen

Bernstein's design implementation of elliptic Curve25519 in key exchange is claimed to be highly secure and efficient…

Tinder's Algorithm - How It Determines Who Gets to See Your Profile & What You Can Do About It.

Tinder's matching algorithm and the score it assigns to you determines whose profile you are shown and to whom your…

Understanding the Elo Rating System

In this article we’ll learn about the Elo Rating system. We’ll look at what problems it can be applied to and use…

Powering Tinder® - The Method Behind Our Matching

We're constantly asked about Tinder's algorithm. How are recommended profiles ordered, and why? Is there a way to game…

Written by Arun Kumar Dave