ERap's KB - CrowLeer, the fast and reliable CLI web crawler with focus on pages download

I recently decided to release a personal project of mine on GitHub. The name is CrowLeer and you can find it here.

In the last year I worked for a customer which needed a software capable of extracting particular data from a bunch of public websites' pages. I was ready to write the code for the recognition and storage of said data, but couldn't find any existing crawler that fit my needs. They come in all shapes:

Some offer a lot of very useful SEO data but can't download pages
Others have a download feature but lack the granular control needed to avoid downloading or following a great number of irrelevant pages
The ones which can download and have proper control over the flow of the crawling lack reliability or a proper way to be integrated with other software

I ended up using one of the previously mentioned "unreliable" ones (with loads of ad-hoc middleware) and called it a day, but months later decided to create my own as a personal project.

CrowLeer was created with simplycity, control and interfaceability in mind. You can find all the details in the GitHub page on the top of the article. I have plans to greatly expand its features but I already find it much more functional than many of the competitors I've worked with.

CrowLeer, the fast and reliable CLI web crawler with focus on pages download

Latest Articles

Avoid Windows 10 forcing you to create a Microsoft account instead of a local one

Git commit history graph in the terminal

Removing files from commits after adding them to .gitignore

How to refresh Windows icons cache without rebooting

System or closed process keeping a socket open on Windows