Fully-automated web scraping is very convenient: you have nothing to do after your scraping or automation robot is programmed and launched. Unfortunately it comes with a few downsides that can affect your projects that target password-protected websites or applications.
The first downside is password handling. If your scraping project targets a website containing high-value data, you may want to avoid storing your password on a remote server with security outside your control.
Detectability is also a concern. Automating access to a website is sometimes frowned upon, even if legitimate. We are sometimes called in to scrape our client’s own data from an application that doesn’t provide a comprehensive backup function. When that’s not enough to lock you into a product that turned out to be inferior, the application provider may even attempt to prevent automatic data extraction by recognizing (and banning) activity patterns that are not “human-like”.
A third issue with automated scraping or web automation is when your process is not fully programmable. Say there is an old and unwieldy web application that requires a lot of copy-pasting and other tedious activities to perform an essential function. While it would possible to automate 90% of the functionality and save as much manpower, a human-made decision may still be necessary such that an autonomous program could not take over.
Enter custom browser extensions, or add-ons. Browser extensions a small programs that attach to your web browser to perform a wide variety of functions. Third-party toolbars, which were one very popular, are one example. The famous AdBlock Plus ad blocker is another. Being tied into your web browser gives add-ons particular powers:
- they can access any website,
- they can use your authenticated sessions without having to know your password and
- they can be either autonomous (in the background) or interactive.
Being powered by user actions, interactive add-ons solve any possible issues with detectability. Instead of replacing human users with robots, they free them from their most repetitive and tedious tasks, letting them focus on the decisions they need to make.
Every time we’ve built and deployed a browser add-on has been a great success either because they enabled a job that couldn’t be done otherwise or because they saved hundreds or thousands of expensive man-hours. Custom browser extensions come with their downsides too however.
All other things being equal, building a browser add-on is more expensive that a regular web scraping robot. The addition expense is usually between 50% and 200%. They also require more effort on your part since you at least have to install them and login into your target sites. Once you’ve built an add-on you are also constrained to using the browser it was made for, although cross-browser extensions are a possibility.
Apart from those caveats, custom browser add-ons are a very handy tool in the web scraping toolbox, to be considered when the job involves passwords or requires human interactions.