Python - Autologin Webpage using web scrawler

Hands, Can do everything

In the recent period, the academic record of bear children has dropped due to their indulgence in television , so there is a need to restrict television without affecting the elderly at home. After thinking about it, I plan to use the crawler technology automatically login the network management switch to limit the speed of the set-top box when the child is after school . when the children is at school release speed of set-top box.

中文版

Now children are young but very smart. They can turn on their own TV and change stations to find their favorite TV programs. In the past, the violent method of unplugging the Internet cable was used to prevent children from watching TV , but often forgot to plug it in again which made the elderly could not watch TV

After thinking, I found that IPTV is connected to the Netgear network management switch. We can limit the IPTV’s speed on the switch so that the TV can not be viewed.

Speaking of crawlers, the first thinking is Python. After a google, I decided to use Selenium, Firefox / Chrome to implement the crawler function.

What is Selenium

Github-Selenium

Selenium is a testing tool for web applications. Selenium directly calls the browser for testing, just like the real user is doing. It supports IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera, HtmlUnit, phantomjs, Android (requires Selendroid or appium), IOS (requires ios-driver or appium), etc.

Selenium supports C # / JavaScript / Java / Python / Ruby development languages. It uses WebDriver to operate the browser for web testing.

Selenium is mainly used to solve JavaScript rendering problems in crawlers.

What is WebDriver

Webdriver is a programming interface used to interact with the browser. It can be used to open or close the browser, send mouse clicks, simulate keyboard input, and so on.

The W3C defines the WebDriver specification. The most popular WebDrver now is the open source software Selenium WebDriver.

WebDriver contains several modules:

  1. Support for multiple programming languages
  2. An automation framework that provides automated functions such as element search, click, and input for web pages, reducing duplicate coding.
  3. JSON protocol, automation framework and browser-driven middle layer, it provides cross-platform, cross-language capabilities.
  4. Browser driver, through which the browser is called.
  5. Browser, rendering web pages.

Installtion

Selenium

1
2
:~$ apt-get install python3 python3-pip
:~$ pip3 install selenium

WebDriver

  • ChromeDriver
    1
    :`$ apt-get install chromium-driver
  • Firefox,download from github-geckodriver.
    1
    2
    3
    :~$ wget https://github.com/mozilla/geckodriver/releases/download/v0.26.0/geckodriver-v0.26.0-linux64.tar.gz
    :~$ tar -xvf geckodriver-v0.26.0-linux64.tar.gz
    :~$ mv geckodriver /usr/local/bin/

How to use Selenium

Here, we use the script - Python.

Visit the webpage

Let’s start with the famous Hello World.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env python3
# coding=utf-8

import time
from selenium import webdriver

print("Initialize ChromeDriver and open Chrome")
driver = webdriver.Chrome()
print("Open webpage shixuen.com")
driver.get("https://www.shixuen.com")
time.sleep(5)
print("Close Chrome")
driver.close()

print("Initialize geckodriver and open Firefox")
driver = webdriver.Firefox()
driver.get("https://www.shixuen.com")
time.sleep(5)
driver.close()
  1. Declaration browser
    1
    2
    3
    4
    from selenium import webdriver

    driver = webdriver.Chrome()
    driver = webdriver.Firefox()
    Support for multiple platforms
  2. Use driver.get( “https://www.shixuen.com“ ) to open URL
    www.shixuen.com
  3. Close Browser,driver.close()

Simulate a mouse click

Let’s add some new features. After opening https://www.shixuen.com, click the article VIM Plugin-YouCompleteMe.

Look at the code first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env python3
# coding=utf-8

import time
from selenium import webdriver

print("Initialize ChromeDriver and open Chrome")
driver = webdriver.Chrome()
print("Open shixuen.com")
driver.get("https://www.shixuen.com")
print("Search the link text")
article = driver.find_element_by_link_text("VIM Plugin - YouCompleteMe")
print("Click the link")
article.click()
time.sleep(5)
print("Close Browser")
driver.close()

Key code is driver.find_element_by_link_text( “VIM Plugin - YouCompleteMe” ),search the link text VIM Plugin - YouCompleteMe, returns the object of this node when found.

Search the specified element

Web node code example:
<a id="btn_apply" class="btn_class">Apply</a>

  • Search by IDdriver.find_element_by_id( “btn_apply” )
  • Search by link text, driver.find_element_by_link_text( “Apply” )
  • Search by class, driver.find_element_by_class_name( “btn_class” )
  • Search by xpath, driver.find_element_by_xpath( “//a[@id=’btn_apply’ and @class=’btn_class’]” )
  • /: Search from the root node
  • //: Search all node
  • ./: Search Search child nodes under this node

Click the web element

Use code article.click() to simulate a mouse click。

VIM Plugin - YouCompleteMe

Finding web node code

  1. open the page with a browser first.
  2. press [F12] to bring up Web developer tools
  3. Click the button in the upper left corner of the tool to position the element
    Web Developer Tool

Login and configure the Netgear webmanage switch

Next, enter the topic of this article, log in and configure the Netgear network management switch. Still code first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#!/usr/bin/env python3
# coding=utf-8

import time, sys, getopt, os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select

_gs105e_rate_limit = {
"unlimit": 1,
"limit": 3
}
gs105e_conf = {
"url": "http://192.168.1.2",
"password": "0123456789",
"rate": _gs105e_rate_limit["limit"],
"port": "port2"
}

def browser( driver ):
print( "Open the switch web page" )
driver.get( gs105e_conf["url"] )

print( "input password" )
passwd_input = WebDriverWait( driver, 10 ).until( EC.presence_of_element_located( (By.ID,"password") ) )
passwd_input.send_keys(gs105e_conf["password"])

print("click login")
btn_login = driver.find_element_by_id("loginBtn")
bt_login.click()

print("click menu <QoS>")
menu_qos = WebDriverWait(driver,10).until( EC.presence_of_element_located( (By.ID, "QoS") ) )
menu_qos.click()

print("click submenu <Rate Limit>")
menu_qos_ratelimit = driver.find_element_by_id("QoS_RateLimit")
menu_qos_ratelimit.click()

print("wait for loading iframe")
time.sleep(4)

print("Go to iframe")
iframe = driver.find_element_by_xpath("//iframe[@id='maincontent']")
driver.switch_to.frame( iframe )

print("Click the <checkbox> corresponding to the port")
WebDriverWait(driver,10).until( EC.presence_of_element_located( (By.NAME,gs105e_conf["port"]) ) ).click()

print("select ingress Rate")
btn_select = driver.find_element_by_name("IngressRate")
Select( btn_select ).select_by_index( gs105e_conf["rate"] )

print("select outgress rate")
btn_select = driver.find_element_by_name("EgressRate")
Select( btn_select ).select_by_index( gs105e_conf["rate"] )

print("go to main page")
driver.switch_to.default_content()

print("click button <Apply>")
btn_apply = driver.find_element_by_id("btn_Apply")
btn_apply.click()

print("click <logout> ")
btn_logout = WebDriverWait(driver,10).until( EC.presence_of_element_located( (By.ID,"logout") ) )
btn_logout.click()

def main():
try:
driver = webdriver.Firefox()
browser(driver)
except:
print("error in script!")
finally:
print("Close Brwoser!")
driver.close()

if __name__ == "__main__":
main()

Above code already login the Netgear network management switch and automatically limit the speed of the TV port. Each step is written very clearly, just a few more functions and jumps between pages.

Now let us analyze the code:

  1. Use code driver.get( “url” ) to open switch’s webpage.
  2. Then input password and click login button. Here we use WebDriverWait( driver,10 ).until( EC.presence_of_element_located( (By.ID,”password”) ) ). It means using the driver to get the ID of element whtich is password within 10 seconds. If successful, the element object is returned, and if it times out, an error is reported.
    Note that iBy, EC and WebDriverWait need to be imported before to use. By.ID is searched by ID, and similarly, By.NAME, By.XPATH, By.CLASS_NAME, By.LINK_TEXT, and so on.

Why web use it? Because if the page has not loaded the element which ID is password, our code will report an error. So the code can change to

1
2
3
print ("Wait for loading iframe")
time.sleep(10)
passwd_input = driver.find_element_by_id("password")

simulate keyboard enter characterspasswd_input.send_keys( gs105e_conf[“password”] )
3. click the menus in order QoS --> Rate Limit, See below.
Netgear Page
4. Because the rate limit is loaded in an iframe, so we need the driver jump from current page to the iframe first, use driver.switch_to.frame (iframe) to jump. See below.
jump to iframe
5. Click CheckBox and then modify the ingress and outgress rates. Because the ingress and outgress rates are drop-down lists, we need to use the class selenium.webdriver.support.select to select them. Select( btn_select ).select_by_index( 3 ) select the fourth option from the drop-down list, the first option index is 0.
Rate Limit
6. from iframe to main page and click button Apply.
Click Apply
7. Last, click button logout
Logout

Final code

The previous code must be run in a graphical interface because it will pop up a browser window. but our server will report an error when there is no graphical interface, so here we use the --headless option to prevent the browser from loading the graphical interface so that it can run in the terminal.

The following is the final code. We optimized previous code and an option function is added to limit or unlimit the speed of a port on the Netgear network management switch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
#!/usr/bin/env python3
# coding=utf-8

#####################################################
# > File Name: automanagetv.py
# > Author: haven200
# > Mail: [email protected]
# > Created Time: Saturday, November 16, 2019 AM10:14:30 HKT
#####################################################

import time, sys, getopt, os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select


_gs105e_dict = {
"unlimit":1,
"limit":3,
"huawei":"port2",
"phicomm":"port5"
}
gs105e_conf = {
"url":"http://192.168.1.2",
"password":"0123456789",
"rate":_gs105e_dict["unlimit"],
"port":_gs105e_dict["huawei"],
"browser":""
}

def browser(driver):
print("open gs105 webpage")
driver.get(gs105e_conf["url"])
print("login")
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,"password"))
).send_keys(gs105e_conf["password"])
driver.find_element_by_id("loginBtn").click()
print("goto Qos page")
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID, "QoS"))).click()
driver.find_element_by_id("QoS_RateLimit").click()
time.sleep(4)
driver.switch_to.frame(driver.find_element_by_xpath("//iframe[@id='maincontent']"))
print("modify the rate")
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.NAME,gs105e_conf["port"]))).click()
Select(driver.find_element_by_name("IngressRate")).select_by_index(gs105e_conf["rate"])
Select(driver.find_element_by_name("EgressRate")).select_by_index(gs105e_conf["rate"])
print("click Apply button")
driver.switch_to.default_content()
driver.find_element_by_id("btn_Apply").click()
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,"logout"))).click()

def main():
try:
if gs105e_conf["browser"] == "firefox":
print("Open [geckodriver]")
firefox_options = webdriver.FirefoxOptions()
firefox_options.add_argument('--headless')
firefox_options.add_argument("user-agent='Mozilla/5.0 (X11; Linux i686; rv:67.0) Gecko/20100101 Firefox/67.0'")
driver = webdriver.Firefox(options=firefox_options)
elif gs105e_conf["browser"] == "chrome" :
print("Open [chromedriver]")
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3831.6 Safari/537.36'")
driver = webdriver.Chrome(options=chrome_options)
else:
show_help()

browser(driver)
except:
print("error in script!")
finally:
print("Close Brwoser!")
driver.close()

def show_help():
print("automanagetv.py -r <unlimit|limit> -p [tv|phicomm] -b [firefox|chrome]")
print("made by [email protected], version 0.0.1\n")
print(" -r, --rate limit: limit the network speed to 1Mb/s")
print(" unlimit: release network speed limit")
print(" -p, --port huawei: Huawei Set_Top_Box")
print(" phicomm: phicomm_n1 Set_Top_Box")
print(" -b, --browser firefox: use firefox")
print(" chrome: use chrome")
sys.exit()

if __name__ == "__main__":

if len(sys.argv) == 1: show_help()

if len(os.popen("whereis geckodriver | awk '{print $2}'").read()) > 5:
gs105e_conf["browser"] = "firefox"
elif len(os.popen("whereis chromedriver | awk '{print $2}'").read()) > 5:
gs105e_conf["browser"] = "chrome"

try:
opts, args = getopt.getopt(sys.argv[1:], "hr:p:b:", ["rate=", "port=", "browser="])
except getopt.GetoptError:
show_help()

for opt, arg in opts:
if opt == '-h':
show_help()
elif opt in ("-r", "--rate"):
if arg == "limit" or arg == "unlimit": gs105e_conf["rate"] = _gs105e_dict[arg]
elif opt in ("-p", "--port"):
if arg == "huawei" or arg == "phicomm": gs105e_conf["port"] = _gs105e_dict[arg]
elif opt in ("-b", "--browser"):
if arg == "firefox" or arg == "chrome": gs105e_conf["browser"] = arg
main()

How to use the script:

1
2
3
4
# limit the speed of tv
:~$ python3 automanagetv.py -r limit -p tv
# recover the speed of tv
:~$ python3 automanagetv.py -r unlimit -p tv

At last, we use Cron to schedule the script.

1
2
3
4
5
:~$ crontab -e
00 12 * * 1-5 /etc/init.d/automanagetv.py -r limit -p tv
00 13 * * 1-5 /etc/init.d/automanagetv.py -r unlimit -p tv
00 19 * * 1-5 /etc/init.d/automanagetv.py -r limit -p tv
40 20 * * 1-5 /etc/init.d/automanagetv.py -r unlimit -p tv

From Monday to Friday, the speed will be limit from 12 to 13 noon, and from 19:00 to 20:40 at night.
And Saturday and Sunday TV belongs to the children.


References:

  • selenium.org
  • testproject
  • hongkiat
  • csdn