Quite recently I faced small challange when I wanted to download some video files from the webpage.
Normally if there are more videos to download I was using Firefox with DownThemALl! extension.
Generally speaking it works fine when you are using Windows/Linux with GUI. This time as the amount of the videos to download was more than 800 (don’t ask for the details 😉 ).
I've decided to run it from my linux server running in the cloud (really doesn’t matter where).
root@ubuntu:~ wget https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4 --2018-09-07 23:44:36-- https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.24.57
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.24.57|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden 2018-09-07 23:44:37 ERROR 403: Forbidden.
The final resolution is to use some extra flags with wget.
wget -O New filename.mp4 --referer=http://www.google.com --user-agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 --header=Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 --header=Accept-Language: en-us,en;q=0.5 --header=Accept-Encoding: gzip,deflate --header=Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 --header=Keep-Alive: 300 -dnv https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4
Let me explain new switches in wget:
-O New filename.mp4 - we change the default name of the file from video.mp4 to the value in
--referer=http://www.google.com - Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers
--user-agent=agent-string - we fake the browser. In our case Firefox
--header - what kind of http request, encoding, charset we support
-dnv - we want to see some details of the download but we don't want to debug
Finally we receive http 200 code which means it works.
setting --no (verbose) to 0
DEBUG output created by Wget 1.17.1 on linux-gnu.
Reading HSTS entries from /root/.wget-hsts URI encoding = 'ANSI_X3.4-1968'converted 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (ANSI_X3.4-1968) -> 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (UTF-8)
Caching s3-us-west-1.amazonaws.com => 54.231.235.25
Created socket 4.
Releasing 0x0000559b1541fd60 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 4 to SSL handle 0x0000559b154200f0
certificate:
subject: CN=*.s3-us-west-1.amazonaws.com,O=Amazon.com Inc.,L=Seattle,ST=Washington,C=US
issuer: CN=DigiCert Baltimore CA-2 G2,OU=www.digicert.com,O=DigiCert Inc,C=US
X509 certificate successfully verified and matches host s3-us-west-1.amazonaws.com
---request begin---
GET /bucket-name/video.mp4 HTTP/1.1
Referer: http://www.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip,deflate
Host: s3-us-west-1.amazonaws.com
Connection: Keep-Alive
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
---request end---
---response begin---
HTTP/1.1 200 OK
x-amz-id-2: dGFs1NEVkw2xOzfkbIm1PWAR9zAYsXGOAUuqn5xz/LzgsKpGaNaTfv6HcKy4sfRDO8BSn0vcwt4=
x-amz-request-id: AA528E7DF42B458C
Date: Fri, 07 Sep 2018 15:58:36 GMT
Last-Modified: Thu, 30 Aug 2018 23:37:06 GMT
ETag: e18629b490b0253b379f8ddae566a438
Accept-Ranges: bytes
Content-Type: video/mp4
Content-Length: 942460456
Server: AmazonS3
---response end---
Registered socket 4 for persistent reuse.