How to download files from AWS S3 bucket?

Sep 7, 2018 16:01 ยท 387 words ยท 2 minute read Object Storage S3 wget Simple Storage Service AWS

Quite recently I faced small challange when I wanted to download some video files from the webpage.

Normally if there are more videos to download I was using Firefox with DownThemALl! extension.

Generally speaking it works fine when you are using Windows/Linux with GUI. This time as the amount of the videos to download was more than 800 (don’t ask for the details ๐Ÿ˜‰ ).

I\’ve decided to run it from my linux server running in the cloud (really doesn’t matter where).

root@ubuntu:~ wget https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4 --2018-09-07 23:44:36--  https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.24.57
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.24.57|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden 2018-09-07 23:44:37 ERROR 403: Forbidden.

The final resolution is to use some extra flags with wget.

wget -O New filename.mp4 --referer=http://www.google.com --user-agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 --header=Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 --header=Accept-Language: en-us,en;q=0.5 --header=Accept-Encoding: gzip,deflate --header=Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 --header=Keep-Alive: 300 -dnv https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4

Let me explain new switches in wget: * -O New filename.mp4 - we change the default name of the file from video.mp4 to the value in * --referer=http://www.google.com - Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers * --user-agent=agent-string - we fake the browser. In our case Firefox * --header - what kind of http request, encoding, charset we support * -dnv - we want to see some details of the download but we don’t want to debug

Finally we receive http 200 code which means it works.

setting --no (verbose) to 0 
DEBUG output created by Wget 1.17.1 on linux-gnu.
Reading HSTS entries from /root/.wget-hsts URI encoding = 'ANSI_X3.4-1968'converted 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (ANSI_X3.4-1968) -> 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (UTF-8)
Caching s3-us-west-1.amazonaws.com => 54.231.235.25
Created socket 4.
Releasing 0x0000559b1541fd60 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 4 to SSL handle 0x0000559b154200f0
certificate:
  subject: CN=*.s3-us-west-1.amazonaws.com,O=Amazon.com Inc.,L=Seattle,ST=Washington,C=US
  issuer:  CN=DigiCert Baltimore CA-2 G2,OU=www.digicert.com,O=DigiCert Inc,C=US
  X509 certificate successfully verified and matches host s3-us-west-1.amazonaws.com
---request begin---
  GET /bucket-name/video.mp4 HTTP/1.1
  Referer: http://www.google.com
  User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
  Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
  Accept-Encoding: gzip,deflate
  Host: s3-us-west-1.amazonaws.com
  Connection: Keep-Alive
  Accept-Language: en-us,en;q=0.5
  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
  Keep-Alive: 300
---request end---
---response begin---
  HTTP/1.1 200 OK
  x-amz-id-2: dGFs1NEVkw2xOzfkbIm1PWAR9zAYsXGOAUuqn5xz/LzgsKpGaNaTfv6HcKy4sfRDO8BSn0vcwt4=
  x-amz-request-id: AA528E7DF42B458C
  Date: Fri, 07 Sep 2018 15:58:36 GMT
  Last-Modified: Thu, 30 Aug 2018 23:37:06 GMT
  ETag: e18629b490b0253b379f8ddae566a438
  Accept-Ranges: bytes
  Content-Type: video/mp4
  Content-Length: 942460456
  Server: AmazonS3
  ---response end---
  Registered socket 4 for persistent reuse.
Tweet Share