Blockchain Anomaly Detection with Machine Learning and Deep Learning Algorithms
Being a DLTian (which is what we who work at DLT Labs™ call ourselves), blockchain is a prime thing that we need to take care of in the products for data storage. All of this can be interpreted in terms of transactions. Blockchain relies on its hashing chain technology for maintaining the integrity of its transactions.
Any tampering can be caught as tampering a record will require tampering with the hashes of the entire chain as the chain will break as the previous block in the chain stores the hash of the current block which forms a chain.
>> There are two important aspects of security handled by blockchain:
- It ensures that the secure transactions completed by a participant cannot be tampered with; neither en route while being added to the chain nor after being added.
- It also ensures that every transaction allowed to be written conforms to the rules predefined by blockchain that are either programmed into the platform or added as smart contracts.
We rely on the security of blockchain for the above reasons. But there is another aspect of security that is also important i.e., reducing the possibility of fraudulent transactions occurring by enhancing its fraud detection.
As blockchain’s primary function is making secured transactions between participants, it becomes extremely important to strengthen anomaly detection systems to preserve its essence and effectiveness.
There may be a handful of such technologies that contribute to making blockchain-based transactions more secure. But let me share one such technology which shows a lot of promise in this regard. Allow me to introduce you to machine learning and deep learning.
Machine/Deep Learning Spam Detection
In our last post about Machine Learning and Deep Learning, we talked about some of its valuable contributions and use cases (if you have not read it, I would recommend you read that first).
One specific use case that we covered, and also one I want to bring to light again is the ‘Email Spam Classification’. Let us understand this use case in brief, and then try to link this use case with our current concern which we were talking about earlier.
Spam Detection Use Case
Email spam is becoming one of the many common issues faced by anyone holding a valid email ID. This is an issue I experienced recently. Some days ago, I received an email stating the possibility of earning some extra income by starting an online business. This would be achieved with the help of the company that emailed me.
This mail could very well be genuine. But I want you to look at this email more closely and highlight that my email provider has automatically classified this email as spam. But how does this work?
I am sure most people who know about its workings can appreciate its power. How can someone tell me that this email is spam? How can someone — without manually investigating — filter the email and separate it from the inbox and other important sections?
This is a popular use case of Machine Learning. Machine Learning is a much-improved form of learning which is used to build and train a model for classifying whether the mail is spam or not. This is not possible without data and at present, data is a valuable resource.
Gone are the days when it was exceedingly difficult to collect user data. In fact, almost all the top tech giants are giant not only because they have awesome products but also because they have an enormous collection of their users’ data. They then pull this out and hand it to their data science team or sell it to other companies to increase their revenue.
Spam Classification/Filtering is an extremely popular use case for analyzing such data.
While Machine Learning and Deep Learning are achieving new milestones, a basic common spam filtering technique — known as the Naïve Bayes Algorithm— is still used to filter spam emails from not spam ones.
So if this one basic approach is enough to filter out your spam email from not-spam email, just imagine how effective it will be as an additional security layer in the context of blockchain storage techniques!
#Fraud Detection Use Case
Just like spam filtering, fraud detection is an especially important use case in data science (particularly in the Machine Learning and Deep Learning domain). It involves finding patterns in data to prevent fraudulent transactions from taking place. Let me give a quick example of one such how this would look in a real-world application –
Mr. X loves to invest in the stock market but keeps a fixed budget of up to $1k per month. One day, there is an attempt made to transfer $15k from Mr. X’s account. This transaction is highly suspicious as this sudden rise in transactional value is normally unexpected from him, when taking into consideration his usual spending behavior.
As a result, there is a high probability this is a fraudulent transaction and will likely trigger the fraud detection algorithms at most banks who will then halt further processing of such a transaction.
At this point, the bank will call Mr. X to verify/confirm his identity to ensure that this transaction is indeed being requested by him.
Similarly, in case someone tries to modify a processed block in blockchain. The only difference here is that the fraud transaction is relating to the tampering of a transaction and its corresponding hashes on the blockchain.
Advantages of combining Machine Learning and Blockchain
There are several cases where machine learning helps in leveraging blockchain-based transactions -
- Security — We already know that blockchain works on hashes and encryptions as well as we already know how vital it is to maintain a great level of security in blockchain-based transactions. ML algorithms help in detecting potential frauds and security breaches to a great extent.
- Mining — Mining, in terms of blockchain, is defined as the addition of transactions to a public distributed ledger. A hash is created by the miner for protecting the integrity of the blockchain and to prevent it from forging. But this process requires a lot of computational power to calculate this hash (also known as nonce).
- Google DeepMind has trained its AI, which has helped bring energy consumptions down by cooling their datacenters by a factor of 0.4. So, hardware can be optimally utilized for mining with the use of ML.
Machine Learning and Deep Learning Algorithms for Blockchain Fraud detection
Let us look at some of the major algorithms that Machine Learning and Deep Learning can offer to the blockchain industry –
SVM is a popular and efficient method for the classification of data. It is often used for detecting anomalies and finding relationships between products and customer loyalty.
Since the evolution of ML, this traditional algorithm has become a lot more robust. This robust algorithm is called SecureSVM. The algorithm was developed and published in the research paper titled “Privacy-Preserving Support Vector Machine Training Over Blockchain-Based Encrypted IoT Data in Smart Cities” which offers a robust SVM model specifically for the blockchain industry.
2. K-Means Clustering
K-Means Clustering is a popular clustering technique (groups together samples based on similarity and distance from other similar samples). Here, K denotes the number of clusters we want to initialize or the number of clusters we wish to split our dataset into.
3. Bagging and Boosting Algorithms
Bagging and boosting algorithms leverage the power of ensemble learning (combining more than one weak model in order to create a stronger model). Such types of algorithms (e.g., Gradient Boosting, AdaBoost along with Random Forest, and more decision trees) help us cluster together blockchain addresses.
This makes it easy to find out blockchain address lying in some other cluster that comes under fraudulent transactions or is yet to be identified.
- GAN — GAN stands for Generative Adversarial Network. It is used to generate some non-real data which is almost similar to real data fed to the model.
It uses the concept of generator and discriminator, with the generator being the prime source of generation of non-real data and the discriminator deciding whether it has seen this image before or not. This is an iteration like process where the generator continues to generate data while the discriminator approves/disapproves it. GAN can also be used in the field of anomaly detection.
- LSTM Neural Network — LSTM stands for Long-Short Term Memory neural network and is an advancement over Recurrent Neural Network (popularly known as RNN). RNN is a deep learning model architecture that remembers the previous data while processing the newer data to produce better predictions. The major advantage that LSTM offers over RNN is in its ability to take care of exploding and vanishing gradients by adding a forget gate. This helps it keep only important data in its memory while discarding other not so important data.
LSTM and GAN can be used together with each other in order to create a robust anomaly detection technique for blockchain-based transactions.
Blockchain in the News
- By the year 2025, the healthcare industry will see a major transformation as Blockchain, Machine Learning, and IoT will join hands together.
- The blockchain industry will be worth nearly $57 billion by 2025.
While there exist many anomaly detection techniques for detecting fraudulent transactions in Machine Learning and Deep Learning, we have only seen a handful of them. All these algorithms play a significant role in the banking and financial sector. Knowing these algorithms also helps in creating more robust algorithms. By advancing upon them, blockchain can be made more resilient to future threats or anomalies.
I hope this article has helped you better understand Machine Learning and Deep Learning algorithms for working in synchronization with Blockchain Transactions. Thank you for reading! Happy Learning…
DLT Labs is a trademark of DLT Global, Inc.
Author — Shubh Saxena, DLT Labs™
About the Author: Shubh is a Software Engineer. He is associated with the core development of Identity Provider at DLT Labs. He has interests in Blockchain and Machine Learning.