Iβm scraping a bunch of websites lately and got bored with using File.write
to store cached versions of websites. Because Iβm still developing the script I donβt want it to hit the real website every time. So simple way to fix that is with the vcr gem. While made primarily for testing you can also use it for this kind of tasks.
First you need some kind of configuration file that loads before your actual script. I have it in config/vcr.rb
:
VCR.configure do |c|
c.cassette_library_dir = 'cassettes'
c.hook_into :webmock
c.allow_http_connections_when_no_cassette = true
end
Then I have a Shared
module with the cache
method which I include
in any classes I need this functionality:
module Shared
def cache name
VCR.use_cassette name do
yield
end
end
end
And now you can use this magic, to have the website youβre scraping instantly cached:
def github_for user
cache "gh-#{user}" do
response = open("https://api.github.com/users/#{user}").read
JSON[response]
end
end