Scrapping with Scrapy : Part 3

This is the third part of the series Scrapping with Scrapy.

In this post I will covering how to use selenium with scrapy, how to change the template, that gets loaded when a new Scrapy project is created. You may need to read part 1 and part 2 of this series to understand more.

Let’s start with how to use Selenium with Scrapy. 

First download the selenium jar, then cd to where it is present. Then start it using

java -jar selenium-server-jarfilename.jar

How to scrap when you can’t fetch data directly from the source, but you need to load the page, click somewhere, scroll down e.t.c, Selenium is for the rescue.

Here is the complete code of the scrapper.

You need to open the url using selenium, so that you can fetch what Scrapy can’t see.

Here is the code for such a spider. You need to add some lines in your spider, to get the page loaded using selenium. Have a look at the spider code, here.

To get what all functions,  it provides, you can use

dir (object name)

I will be posting some tips and tricks related to xpaths in some other posts.

Now let me tell you how to change the templates that gets loaded when you create new project in Scrapy.

First let’s install open-as-administrator, to easily edit files that requires sudo permission in Linux.

sudo add-apt-repository ppa:noobslab/apps
sudo apt-get update
sudo apt-get install open-as-administrator
nautilus -q

Then find Scrapy’s dist location, it would be somewhere here,


Here you will have templates folder, open that, go to project, then to module inside it. Here you can see the all template files.

Your template is also here named as Now right click on it, open as administrator and edit it the way you want to get it loaded.

#code, #parsers, #parsing, #python, #scrapy, #selenium, #tutorial, #web-crawling