12

Benefits of using scrapy over requests/selenium

posted 1 year ago

by

IceSea@lemmy.world

in

python@programming.dev

3 commentshide report

When I’m writing webscrapers I mostly just pivot between selenium (because the website is too “fancy” and definitely needs a browser) and pure requests calls (both in conjunction with bs4).

But when reading about scrapers, scrapy is often the first mentioned Python package. What am I missing out on if I’m not using it?

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments

[ - ]

Wats0ns@programming.dev

2 points

1 year ago

The huge feature of scrapy is it’s pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on

Hugely useful when you’re scraping and extraction data, I reckon if you’re only extracting raw pages then it’s less useful I guess

report

reply

[ - ]

qwertyasdef@programming.dev

1 point

1 year ago

Oh shit that sounds useful. I just did a project where I implemented a custom stream class to chain together calls to requests and beautifulsoup.

report

reply

[ - ]

Wats0ns@programming.dev

2 points

1 year ago

Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,…

report

reply

Python

!python@programming.dev

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - “Python in HEP” Developer’s Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy’s API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

Community stats

802
Monthly active users
468
Posts
2.4K
Comments

Community moderators