Introduction

Hello! if you have been using a few public apis that rate limit your requests, this blog post might be able to help you bypass restrictions and help mask your requests using free proxies. Warning, please be aware this uses free proxies for making requests, please use more reliable proxy services or different retry methods before using in a production environment. That being said, hope this helps you if you are being blocked by you location and would need a reliable way to bypass rate limits.

I was using Yahoo finance apis to get stock and crypto information to use for a side project to fetch trending stock and crypto information that ran every 5 mins, but as soon as I began Yahoo started blocking my requests as a stop gaps to prevent scripts / rouge bots to send out mass requests to their APIs. Yahoo is a great source for financial data and is pretty reliable for a free service, but they deprecated their public api in 2017. There are tons of alternatives out there but each with a freemium model that charge you based on the amount of requests you make.

Warning: please be aware this uses free proxies for making requests, use more reliable proxy services or different retry methods before using in a production environment.

Yahoo does however still use their apis internally for serving yahoo finance pages and other services and there plenty of libraries that help you use historical stock / crypto data at no cost. Example from yfinance docs

import yfinance as yf

msft = yf.Ticker("MSFT")

# get stock info
msft.info

# get historical market data
hist = msft.history(period="max")

# show actions (dividends, splits)
msft.actions

# show dividends
msft.dividends

# show splits
msft.splits

This works for a bit but Yahoo internally rate limit all requests based on your ip address and region to prevent abusing their servers, so after it hits the rate limit it will respond with 401/403 responses. Here’s a quick sample of sending out multiple requests and getting a blocked by their public api.

import requests
import datetime
import pandas as pd
from io import StringIO

base_url = "https://query1.finance.yahoo.com/v7/finance/download"
user_agent_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}
params = {"interval": "1d", "events": "history", "range": "5y"}

utc_now = datetime.datetime.utcnow().replace(microsecond=0).isoformat()
utc_now

btc_symbol = "BTC-USD"
eth_symbol = "ETH-USD"
ada_symbol = "ADA-USD"

for i in range(1000):
    response = requests.get(
        f"{base_url}/{btc_symbol}", params=params, headers=user_agent_headers
    )
    print(response.status_code)
    print(response.text)
    btc_data = StringIO(response.text)

    btc = pd.read_csv(btc_data, index_col="Date").dropna()
    btc["Date"] = btc.index
    btc

200
Date,Open,High,Low,Close,Adj Close,Volume
2016-11-16,711.166992,747.614990,709.039001,744.197998,744.197998,141294000
2016-11-17,744.875977,755.645020,739.510986,740.976990,740.976990,108579000
2016-11-18,740.705017,752.882019,736.890015,751.585022,751.585022,87363104
...
2021-11-14,64455.371094,65495.179688,63647.808594,65466.839844,65466.839844,25122092191
2021-11-15,65521.289063,66281.570313,63548.144531,63557.871094,63557.871094,30558763548
2021-11-16,63190.082031,63190.082031,60605.054688,61369.347656,61369.347656,37291376640

401
{
  "finance": {
    "error": {
      "code": "Unauthorized",
      "description": "Invalid cookie"
    }
  }
}

This is kinda of a weird error for rate limiting requests, but there are a couple ways to bypass it.

  1. Run a proxy service to mask all requests being made from your host or script. (This is kinda heavy since all other services running on your machine or host will have their requests proxied, which depending on the reliability of the proxy service could add to time it takes to get a response for all apps and services)

  2. Use a cached session to make requests. (might not be the best option if you’re hoping to get new information like recent quotes on every request)

  3. Use a different proxy for making requests to service. (This was the better option, since I needed to make frequent requests but didn’t want to run a proxy service affecting other services)

The yfinance library supports using a proxy server to download information from yahoo finance. Example from their docs

import yfinance as yf

msft = yf.Ticker("MSFT")

msft.history(..., proxy="PROXY_SERVER")
msft.get_actions(proxy="PROXY_SERVER")
msft.get_dividends(proxy="PROXY_SERVER")
msft.get_splits(proxy="PROXY_SERVER")
msft.get_balance_sheet(proxy="PROXY_SERVER")
msft.get_cashflow(proxy="PROXY_SERVER")
msft.option_chain(..., proxy="PROXY_SERVER")
...

This is super neat, but I wanted to have a list of free proxy services that could use to make requests and fallback on another services if the proxy service was down. proxy-list service allows you to fetch free proxy server information and use them to proxy your requests to yahoo finance. It’s also possible to scrape this info but getting good quality reliable proxy servers might be a better option for scraping related tasks.

❯ curl -X GET https://www.proxy-list.download/api/v1/get\?type\=http\&anon\=elite\&country\=US
209.97.150.167:3128
157.230.233.189:3000
191.96.42.80:8080
191.96.42.80:3128
137.184.73.52:80
50.233.42.98:51696
170.106.120.202:59394
68.183.59.38:8080
23.107.176.80:32180
138.68.60.8:8080
198.199.86.11:8080
198.199.86.11:3128

Awesome! now we can use this list in a client to make requests to yahoo finance. Let’s setup the utils needed for the client.

proxy.py

import random
import requests
from typing import Optional


def get_proxy_server() -> Optional[str]:
    response = requests.get(
        "https://www.proxy-list.download/api/v1/get?type=http&anon=elite&country=US"
    )
    if response.status_code == 200:
        proxies = response.text.splitlines()
        return random.choice(proxies)
    return None

exceptions.py

class YahooFinanceException(Exception):
    def __init__(self, message, ticker):
        super().__init__(message)
        self.ticker = ticker

yahoo.py

import yfinance as yf
import requests
import pandas as pd
from io import StringIO
from pprint import pprint
from typing import Optional
from datetime import datetime
from dags.utils.constants import YAHOO_FINANCE_BASE_URL
from dags.utils.proxy import get_proxy_server
from dags.utils.exceptions import YahooFinanceException
from dags.models.stock_history import StockHistory


class YahooClient:
    proxy_server: Optional[str]

    def __init__(self):
        self.proxy_server = get_proxy_server()

    def reset_proxy_server(self):
        self.proxy_server = get_proxy_server()

    def history(self, ticker, crypto=False) -> Optional[pd.DataFrame]:
        params = {"interval": "1d", "events": "history", "range": "5y"}
        user_agent_headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
        }
        norm_ticker = f"{ticker}-USD" if crypto else ticker

        try:
            proxies = (
                {
                    "http": f"http://{self.proxy_server}",
                    "https": f"http://{self.proxy_server}",
                }
                if self.proxy_server
                else None
            )
            pprint(f"proxies: {proxies}")

            # YAHOO_FINANCE_BASE_URL = "https://query1.finance.yahoo.com/v7/finance/download"
            response = requests.get(
                f"{YAHOO_FINANCE_BASE_URL}/{norm_ticker}",
                params=params,
                headers=user_agent_headers,
                proxies=proxies,
            )

            # Retry different proxy server if blocked with 401/403
            if response.status_code != 200:
                raise YahooFinanceException(message="Non valid response", ticker=ticker)

            data = StringIO(response.text)
            results = pd.read_csv(data, index_col="Date").dropna()
            results["Date"] = results.index

            return results

        except YahooFinanceException as yfe:
            print("### Retrying fetch for historical data")
            self.reset_proxy_server()
            return self.history(yfe.ticker, crypto)

        except Exception as e:
            print(e)
            print("### Retrying fetch for historical data")
            self.reset_proxy_server()
            return self.history(ticker, crypto)

    @staticmethod
    def recent_quote(ticker, crypto=False) -> Optional[StockHistory]:
        norm_ticker = f"{ticker}-USD" if crypto else ticker
        quote = yf.Ticker(norm_ticker).info
        regular_market_price = quote.get("regularMarketPrice")
        previous_close = quote.get("regularMarketPreviousClose")
        last = regular_market_price if regular_market_price else previous_close

        if quote:
            return StockHistory(
                ticker=ticker,
                quote_date=datetime.utcnow(),
                last=last,
                open=quote.get("open"),
                high=quote.get("regularMarketDayHigh"),
                low=quote.get("regularMarketDayLow"),
                volume=quote.get("regularMarketVolume"),
            )
        return None

The yahoo client will now try to use different proxies to make requests to yahoo finance 🎉

Sample log from script using the proxied client.

[2021-11-16 03:10:32,489] INFO - HTTPSConnectionPool(host='query1.finance.yahoo.com', port=443): Max retries exceeded with url: /v7/finance/download/ADA-USD?interval=1d&events=history&range=5y (Caused by ProxyError('Cannot connect to proxy.', ConnectionResetError(104, 'Connection reset by peer')))
[2021-11-16 03:10:32,489] INFO - ### Retrying fetch for historical data
[2021-11-16 03:10:32,547] INFO - ("proxies: {'http': 'http://191.96.42.80:3128', 'https': "
 "'http://191.96.42.80:3128'}")
[2021-11-16 03:10:35,066] INFO - HTTPSConnectionPool(host='query1.finance.yahoo.com', port=443): Max retries exceeded with url: /v7/finance/download/ADA-USD?interval=1d&events=history&range=5y (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 400 Bad Request')))
[2021-11-16 03:10:35,066] INFO - ### Retrying fetch for historical data
[2021-11-16 03:10:35,123] INFO - ("proxies: {'http': 'http://50.233.42.98:51696', 'https': "
 "'http://50.233.42.98:51696'}")
[2021-11-16 03:12:45,603] INFO - HTTPSConnectionPool(host='query1.finance.yahoo.com', port=443): Max retries exceeded with url: /v7/finance/download/ADA-USD?interval=1d&events=history&range=5y (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f171992b130>: Failed to establish a new connection: [Errno 110] Connection timed out')))
[2021-11-16 03:12:45,603] INFO - ### Retrying fetch for historical data
[2021-11-16 03:12:45,663] INFO - ("proxies: {'http': 'http://157.230.233.189:3000', 'https': "
 "'http://157.230.233.189:3000'}")
[2021-11-16 03:13:07,408] INFO - ### Writing 4 records for ticker: ADA ###
[2021-11-16 03:13:07,416] INFO - ### Write complete for ticker: ADA, records added: 4 ###

This will retry every time a request fails so it might be worth adding a max retries or round robin the proxies or have a set of trusted proxies to use. From running this for past 2+ weeks it seems to be working well for what I needed although feel free to use a different variation for your client.

Conclusion

Hope this was helpful in getting past the ip based rate limiting from any public api service and mask your requests from different parts of the world! Although it might be better to use a dedicated api service for gathering data since scraping the web generally won’t be as reliable and would have better documented api endpoints for using in production environments.

Join the email list and get notified about new content

Be the first to receive latest content with the ability to opt-out at anytime.
We promise to not spam your inbox or share your email with any third parties.

The email you entered is not valid.