Commit History
@master
git clone https://simonvolpert.com/fediscover/
-
Add skipped URL cound in the post-crawl report
Simon Volpert
1 year, 6 months ago
-
Add blacklisting
Simon Volpert
1 year, 6 months ago
-
Remove duplicate matches URLs from pages when scraping
Simon Volpert
1 year, 6 months ago
-
Automatically crawl the next URL if the profile queue is empty
Simon Volpert
1 year, 6 months ago
-
Add following/follower URLs to queue when crawling a profile URL
Simon Volpert
1 year, 6 months ago
-
Refactor URL loading code
Simon Volpert
1 year, 6 months ago
-
Refactor URL crawling
Simon Volpert
1 year, 6 months ago
-
Move page scraping code into its own function
Simon Volpert
1 year, 6 months ago
-
Add some code comments
Simon Volpert
1 year, 6 months ago
-
Add "dry-run" and "random" options
Simon Volpert
1 year, 6 months ago
-
Normalize profile URLs passed on the command line before processing
Simon Volpert
1 year, 6 months ago
-
Put print-to-standard-error code into its own function
Simon Volpert
1 year, 6 months ago
-
Move repeating URL caching code into a function
Simon Volpert
1 year, 6 months ago
-
Move newly processed link count to the appropriate session state containers
Simon Volpert
1 year, 6 months ago
-
Print all unimportant messages to standard error
Simon Volpert
1 year, 6 months ago
-
Reword follow page regex to satisfy the linter
Simon Volpert
1 year, 6 months ago
-
Fix use-before-declaring bug
Simon Volpert
1 year, 6 months ago
-
Process profile URLs passed to "crawl" correctly
Simon Volpert
1 year, 6 months ago
-
Insert additional profile pages in the front of the crawlable URL list
Simon Volpert
1 year, 6 months ago
-
Enforce crawlable URL uniqueness
Simon Volpert
1 year, 6 months ago
-
Add a README and a LICENSE
Simon Volpert
1 year, 6 months ago
-
Add some code comments
Simon Volpert
1 year, 6 months ago
-
Add timeout to page load
Simon Volpert
1 year, 6 months ago
-
Locate all the pages of the user's following/followers URL
Simon Volpert
1 year, 6 months ago
-
Include followers in crawling URLs
Simon Volpert
1 year, 6 months ago
-
Write a trailing newline into cache files
Simon Volpert
1 year, 6 months ago
-
Lint the code
Simon Volpert
1 year, 6 months ago
-
Extract profile URLs and following links from downloaded page
Simon Volpert
1 year, 6 months ago
-
Add web page loader and cacher
Simon Volpert
1 year, 6 months ago
-
Select a random entry from the unprocessed user list to show
Simon Volpert
1 year, 6 months ago