Project #0
What are GZIP and ZIP?
How do they differ?
Do they use the same compression algorithm?
What other decompression algorithms does your web browser support?
Project #1
Use a web browser to download a web pages.
Modify (clean up) the code below.
Use it to download and decompress
web pages. Display the first 60 characters/bytes.
Find and test 10 other web pages.
Are they compressed or uncompressed?
What is the difference in file sizes?
Project #2
Wireshark is an open source program for monitoring
network traffic.
Download and install Wireshark.
Monitor network traffic when requesting and downloading
a web page using Wireshark.
Display some of the packets.
Project #3
Test creating a compressed and uncompressed web page.
- create a web page (HTML file) containing several paragraphs, etc.
- using the code below, read the html file
- write the file's data as a compressed and uncompressed html files
- open each file in a web browser to verify the page
- display the file sizes (compressed vs uncompressed)
See Project: Create a Simple Website
for information on creating web pages.
Links
Wireshark (home)
Real Python - Programming Sockets in Python
(The first 2 or 3 videos are free to watch.)
GZIP Compression: How to Enable for Faster Web Pages
Sample Code
#!/usr/bin/python3
# ===================================================================
# Based on: realpython.com/courses/programming-sockets/">
# Real Python - Programming Sockets in Python
#
#FYI: GZIP magic number is:
# b'\x1f\x8b' bytes or 0x1f8b hex or 8075 dec
# (It indicates the text was compressed using GZIP.)
# ===================================================================
from urllib import request
import gzip
# ---- compressed and uncompressed web page
url = 'https://python.org'
#url = 'http://www.tomshodgepodge.com/programming-projects'
#url = 'http://httpforever.com'
# ---- download a web page
with request.urlopen(url) as response:
html = response.read()
print(f'HTML {len(html)} bytes')
print(f'HTML data type {type(html)}')
print('HTML first 50 bytes')
print(f'[:50] {html[:50]}')
# ---- compressed or uncompressed?
if html[0] == 0x1f and html[1] == 0x8b:
print()
print(f'---- web page is compressed')
print()
dc_html = gzip.decompress(html)
print(f'decompressed {len(dc_html)} bytes')
print(f'decompressed data type {type(dc_html)}')
x1 = dc_html[:50]
##print(f'[:50] {len(x1)} bytes')
##print(f'[:50] data type {type(x1)}')
print(f'{x1}')
print()
decode_dc_html = dc_html.decode('utf-8')
print(f'decode("utf=8") {len(decode_dc_html)} characters')
print(f'decode("utf=8") data type {type(decode_dc_html)}')
x2 = decode_dc_html[:50]
##print(f'[:50] {len(x2)} characters')
##print(f'[:50] data type {type(x2)}')
print(f'{x2}')
print()
else:
print()
print(f'---- web page is not compressed')
print()
print(f'[:50] {html[:50]}')
print(f'[:50] {html[:50].decode('utf-8')}')
print()