Instagram Scraper
instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user’s photos and videos. Use responsibly.
Install
To install instagram-scraper:
1 |
$ pip install instagram-scraper |
To update instagram-scraper:
1 |
$ pip install instagram-scraper --upgrade |
Alternatively, you can clone the project and run the following command to install: Make sure you cd into the instagram-scraper-master folder before performing the command below.
1 2 |
$ python setup.py install |
Usage
To scrape a user’s media:
1 |
$ instagram-scraper <span class="pl-k"><</span>username<span class="pl-k">></span> -u <span class="pl-k"><</span>your username<span class="pl-k">></span> -p <span class="pl-k"><</span>your password<span class="pl-k">></span> |
NOTE: To scrape a private user’s media you must be an approved follower.
By default, downloaded media will be placed in <current working directory>/<username>
.
Providing username and password is optional, if not supplied the scraper runs as a guest. Note: In this case all private user’s media will be unavailable. All user’s stories and high resolution profile pictures will also be unavailable.
To scrape a hashtag for media:
1 |
$ instagram-scraper <span class="pl-k"><</span>hashtag without <span class="pl-c">#> --tag </span> |
It may be useful to specify the --maximum <#>
argument to limit the total number of items to scrape when scraping by hashtag.
To specify multiple users, pass a delimited list of users:
1 |
$ instagram-scraper username1,username2,username3 |
You can also supply a file containing a list of usernames:
1 |
$ instagram-scraper -f ig_users.txt |
1 2 3 4 5 6 7 8 |
# ig_users.txt username1 username2 username3 # and so on... |
The usernames may be separated by newlines, commas, semicolons, or whitespace.
You can also supply a file containing a list of location ids:
1 |
$ instagram-scraper --tag <span class="pl-k"><</span>your_tag_here<span class="pl-k">></span> --include-location --filter_location_file my_locations.txt |
1 2 3 4 5 6 7 8 9 10 11 |
# my_locations.txt [some_reagion1] location_id1 location_id2 [some_region2] location_id3 location_id4 # and so on... |
The resulting directory structure will be:
1 2 3 4 5 6 |
your_tag ├── some_reagion1 │ └── images_here └── some_reagion2 └── images_here |
The locations can only be separated by newlines and spaces.
OPTIONS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
--help -h Show help message and exit. --login-user -u Instagram login user. --login-pass -p Instagram login password. --followings-input Use profiles followed by login-user as input --followings-output Output profiles from --followings-input to file --filename -f Path to a file containing a list of users to scrape. --destination -d Specify the download destination. By default, media will be downloaded to <current working directory>/<username>. --retain-username -n Creates a username subdirectory when the destination flag is set. --media-types -t Specify media types to scrape. Enter as space separated values. Valid values are image, video, story (story-image & story-video), broadcast or none. Stories require a --login-user and --login-pass to be defined. --latest Scrape only new media since the last scrape. Uses the last modified time of the latest media item in the destination directory to compare. --latest-stamps Specify a file to save the timestamps of latest media scraped by user. This works similarly to `--latest` except the file specified by `--latest-stamps` will store the last modified time instead of using timestamps of media items in the destination directory. This allows the destination directories to be emptied whilst still maintaining history. --cookiejar File in which to store cookies so that they can be reused between runs. --quiet -q Be quiet while scraping. --maximum -m Maximum number of items to scrape. --media-metadata Saves the media metadata associated with the user's posts to <destination>/<username>.json. Can be combined with --media-types none to only fetch the metadata without downloading the media. --include-location Includes location metadata when saving media metadata. Implicitly includes --media-metadata. --profile-metadata Saves the user profile metadata to <destination>/<username>.json. --proxies Enable use of proxies, add a valid JSON with http or/and https urls. Example: '{"http": "http://<ip>:<port>", "https": "https://<ip>:<port>" }' --comments Saves the comment metadata associated with the posts to <destination>/<username>.json. Implicitly includes --media-metadata. --interactive -i Enables interactive login challenge solving. Has 2 modes: SMS and Email --retry-forever Retry download attempts endlessly when errors are received --tag Scrapes the specified hashtag for media. --filter Scrapes the specified hashtag within a user's media. --filter_location Filter scrape queries by command line location(s) ids --filter_location_file Provide location ids by file to filter queries --location Scrapes the specified instagram location-id for media. --search-location Search for a location by name. Useful for determining the location-id of a specific place. --template -T Customize and format each file's name. Default: {urlname} Options: {username}: Scraped user {shortcode}: Post shortcode (profile_pic and story are empty) {urlname}: Original file name from url. {mediatype}: The type of media being downloaded. {datetime}: Date and time of upload. (Format: 20180101 01h01m01s) {date}: Date of upload. (Format: 20180101) {year}: Year of upload. (Format: 2018) {month}: Month of upload. (Format: 01-12) {day}: Day of upload. (Format: 01-31) {h}: Hour of upload. (Format: 00-23h) {m}: Minute of upload. (Format: 00-59m) {s}: Second of upload. (Format: 00-59s) If the template is invalid, it will revert to the default. Does not work with --tag and --location. |