Hello, So basically I want to know how to code an extractor that allows you to get the m3u8 file from an embedded video. I am trying to build a scraper to get animes episodes from websites like gogoanime or 9anime. The only thing I was able to scrape so far is the embedded link for the episodes. Any idea on how I can do it ?
I build a lot of tools like that and the first thing I do is to go to the developer tool in my browser and observe the network traffic. When you find the resource you’re after you scroll back and see what requests resulted in that URL. Going from those requests you figure out in the original static HTML document and resource, which parameters are used for the construction of the URL, that might require reversing some javascript, but that’s rare. After that you’ll have a pretty good idea how you obtain the video resource from the original URL. Beware of cookie set by the requests, they might be needed to access the next requests. For building my tools I use Perl or sometimes just Bash or a GreaseMonkey userscript to fetch and parse the urls and construct the desired output.
Try to learn from the source code of similar projects.
YT-DLP has a list of extractors for lots of different sites. You could see if there are similar sites and see how their extractors work.
https://github.com/yt-dlp/yt-dlp/tree/master/yt_dlp/extractor
Or Animescraper has extractors for them, but it was last updated 6 years ago.
https://github.com/jQwotos/animeDownloaders/tree/master
https://github.com/jQwotos/anime_scrapers/tree/63b415fcaaa685f03b54fe6ee294c13178736637/scrapers