Preventing Web Scraping


When we have a full proxy between Internet and our LAN we can do everything, even protect our servers, ;-) this is what a WAF does, protecting against Web Application Vulnerabilities, Web Scraping or DoS Attacks. This time, I want to write about Web Scraping which is a technique to download automatically the whole web site for extracting competitor price tracking, email addresses, directory listings for obtaining leads and marketing information, search competitors' web sites for images, financial information, or other product data, and also for copying the web site for phishing attacks.

There are many tools to extract data from websites for cloning it or analysing it like the simple cURL or Wget or another more advanced like HTTrack. For instance, I used the Social-Engineer Toolkit (SET) two summers ago in a speech called “Innovation, yes but with Security” for making a PoC of Phishing Attack where I copied the Gmail and elpais.com websites.


Although there are still few companies worried about this threat, they are becoming more and more aware about protecting their public data for competitive reasons. Next, we are going to see some Web Scraping mitigation techniques to protect our websites.

Bot detection

This is a method where the preventing web scraping system applies several checks for bot detection. For instance, a check for detecting rapid surfing where counts how many different URLS the client has loaded and unloaded from the application within a defined period. Another check is to ensure the client accepts cookies and processes JavaScript. And another check could use JavaScript again to determine if the client behaves like a human being or a bot.

Bot detection configuration in BIG-IP ASM
 
Session Anomaly detection

This is a method for detecting clients who open a large number of new sessions. One check is counting the new sessions per second rate and another check is detecting a spike in the number of new sessions. This method could also use the IP reputation database for detecting malicious IP addresses which is an indicator as well for triggering a violation.

Session Anomaly detection configuration in BIG-IP ASM
 
Fingerprinting

This is a method of collecting browser attributes to detect malicious users. Some attributes are browser APIs like JavaScript API supported by the browser, expressions, localization information from the browser, fonts installed in the browser, screen parameters, time and plugins.

Fingerprinting configuration in BIG-IP ASM
 
Web scraping was a concept unknown for me a year ago but preventing web scraping today can be done and it's a fact for many organization who are worried about their public information.

Regards my friends, drop me a line with the first thing you are thinking!!!

Commentaires