I love Aria2, but I’m building a web scraper / crawler and I need to download hundreds of thousands of files. Aria2 locks up around the 20,000 file mark. Is there another download manager that could possibly be able to achieve what I’m trying to do? or a more recent fork of Aria2?

I have a workaround I believe, which is to use the API to determine how many files are in queue and sleep indefinitely until there is < 1000, but I’m not sure this is the most effective. It kind of significantly slows down the download pipe.

The issue seems to lie with connections timing out in aria2, which cause them to get locked up and they have to be manually cleared. I have my timeout set at 10 seconds, but that doesn’t seem to matter. I’ve considered running a schedule task to clean them up, but was going to give downloading with Python a try first.

Any suggestions would be appreciated.

2 points

Pretty sure thats not a limitation of aria2. What system are you using? There a multiple options:

  1. Just build a queue
  2. Check if aria2 locks up because of system limitations. Max open files, max concurrent connections (there will be a practical limit no matter what you do)
  3. build a queue with a cluster to overcome such limitations
permalink
report
reply
1 point

Interesring. Source on the 20,000 file limit? It could just be that you need to increase the number of allowed file descriptors on your OS

permalink
report
reply

Self-Hosted Main

!main@selfhosted.forum

Create post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

For Example

  • Service: Dropbox - Alternative: Nextcloud
  • Service: Google Reader - Alternative: Tiny Tiny RSS
  • Service: Blogger - Alternative: WordPress

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

Community stats

  • 23

    Monthly active users

  • 1.8K

    Posts

  • 11K

    Comments

Community moderators