Scrapy masterclass: Python web scraping and data pipelines Scrapy Masterclass course: Python web scraping and data pipelines published by Udemy Academy. Work on 7 real-world web scraping projects using Scrapy, Splash, and Selenium. Build data pipelines locally and on AWS.
Everyone tells you what to do with the data you already have. But how can you “have” this data? Most discussions of data engineering/data science today focus on how to analyze and process data sets to extract useful information from them. However, they all assume that those datasets are already available to you. which are gathered in some way. They spend quite a bit of time showing you how you can get your hands on this dataset!
This course fills this gap. Scrapy is all about setting you up in the process of extracting data of interest from websites to create a powerful web scraping pipelines. That’s right, there are tons of data sets available to you right now that you can consume for free or for a fee. However, what if those datasets are out of date? What if they don’t meet your specific needs? It’s best to know how to build your dataset from scratch, no matter how unstructured your data source is.
Scrapy is a Python web scraping framework. Thousands of companies and professionals use it to collect data and build datasets. They can then sell them or use them in their own projects. Today, you can be one of those professionals. Even build your own business based on data collection! Today, data scientists and data engineers are among the highest paid in the industry. However, they can’t do anything if they don’t have enough data to work on.
But this course covers two other aspects (Transform and Load). Using Scrapy pipelines, we’ll see how we can store our data in SQL and NoSQL databases, Elasticsearch clusters, event brokers like Kafka, object storage like S3, and message queues like AWS SQS. Even if you don’t know anything about web scraping or data collection, even if this all sounds new to you, you’ve come to the right place.
What you will learn in the Scrapy masterclass: Python web scraping and data pipelines course:
- Extract data from the most difficult websites using Scrapy
- Build ETL pipelines and store data in CSV, JSON, MySQL, MongoDB, and S3.
- Avoid getting banned and avoid bot protection techniques.
- Use the power of Selenium browser automation to scrape any website.
- Deploy your Scrapy bots in local and AWS environments.
Who is this course suitable for:
- Anyone who wants to automate data collection from websites (web scraping) using Scrapy.
- Anyone who wants to build a business around data collection and web scraping.
- Data engineers, data scientists, and ML engineers who want to master web scraping for their data collection needs.
- Developers, DevOps engineers, or IT professionals who want to change careers to data engineering.
- Python programmers who want to learn more about Scrapy or web scraping in general.
- Publisher: Udemy
- Lecturer: Ahmed Elfakharany
- English language
- Training level: introductory to advanced
- Number of courses: 40
- Training duration: 5 hours and 44 minutes
Scrapy masterclass Course topics on 12/2022
Scrapy masterclass Course prerequisites
Some Python background
All projects are run on Python 3.10 so it needs to be installed
Familiarity with Linux is recommended but not strictly required
Familiarity with the HTTP protocol and HTML
Scrapy masterclass Pictures
After Extract, view with your favorite Player.