3 extensions to avoid overwrite (called “clobber”), I removed them by running rm *.jpg. Some of the files we ended up with have mangled extensions because wget saves already existing ones with. The “-w 10” option adds a 10 second wait after each load request to avoid overloading anyone’s server: $ wget -i jpegurls.txt -w 10 Wget is linux utility that you can feed a list of urls into using the “-i filename” parameter and it will simulate a browser and download each one. Now we need to download all of them using a simple wget command. All of these are built-in shell commands included in every linux or macOS distribution. The “tr” command replaces quotes to new lines, the grep command searches for words starting with https and ending in jpg, the second grep removes any lines containing, sort -u removes duplicates and sorts the whole list alphabetically. We could go all sophisticated and parse JSON but for now, simply replacing double quotes with new lines then searching for “https…jpg” and removing thumbnails (they all come from ) should be good enough. DuckDuckGo search results in JSON in Google Chrome Dev ToolsĪfter appending the content of each of these to a file (double click, copy/paste) named them urls.txt, we can run a one liner shell command to extract URLs out of them. Read more about JSON here: What is JSON and how to use it. It’s returned in an easy to process JSON style format that we can extract data from, with a few easy commands. Opening developer tools in Chrome shows that DuckDuckGo fetches every 100 results using XHR (a web request downloading extra information in the background). I did an image search on the word “random” on DuckDuckGo ( ), then downloaded all the first few hundred images to have a sample library of completely random images. It’s a great little text-processing exercise so I thought I’d share them below. Normally you’ll run these commands on a batch of images from various sources but to be able to provide you information on how these procedures work and on the effectiveness of space reduction, I decided to get a random sample of images from the internet that will hopefully give us an idea how well these procedures are working. Obtaining sample images, processing some JSON The advantage of using command-line utilities is that they can easily be scripted, automated and customized to your specific requirements as opposed to a GUI-driven application that will always require user interaction to complete. On macOS, they’re availabe through the Homebrew package manager from by running “brew install jpegoptim”, etc.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |