By Rohit Bhatia, Mollie Bates, Google Chrome Security
There are various threats a user faces when surfing the web. Users can be tricked into sharing sensitive information like their passwords with a deceptive or fake website, also known as phishing. They can also be led to install malicious software on their machines, called malware, which can collect personal data and also hold it for ransom. Google Chrome, henceforth called Chrome, enables its users to protect themselves from such threats on the Internet. When Chrome users surf the web with Protected Browsing protection, Chrome uses the Safe Browsing service from Google to identify and ward off various threats.
Safe Browsing works in different ways depending on the user’s preferences. In the most common case, Chrome uses the privacy-conscious Update API (Application Programming Interface) of the Safe Browsing service. This API was developed with users’ privacy in mind and ensures that Google gets as little information about the user’s browsing history as possible. If the user has opted in to “Enhanced protection” (covered in a previous post) or “Make searches and browsing better”, Chrome only shares limited additional data with Protected Browsing to further improve user protection.
This post describes how Chrome implements the Update API, with appropriate references to the technical implementation and details on the privacy-conscious aspects of the Update API. This should be helpful for users to understand how Safe Browsing protects them and for interested developers to review and understand the implementation. We will cover the APIs used for Extended Protection users in a future post.
Threats on the Internet
When a user navigates to a web page on the Internet, their browser retrieves objects hosted on the Internet. These objects include the structure of the web page (HTML), the style (CSS), dynamic behavior in the browser (Javascript), images, downloads initiated by the navigation and other web pages embedded in the main web page. These objects, also called resources, have a web address, which is called their URL (Uniform Resource Locator). Additionally, URLs may redirect to other URLs when loaded. Each of these URLs can potentially host threats such as phishing sites, malware, unwanted downloads, malicious software, unfair billing practices, and more. Chrome with Safe Browsing checks all URLs, redirects or included resources to identify such threats and protect users.
Safe browsing lists
Safe Browsing provides a list for every threat it protects users against on the Internet. A complete catalog of lists used in Chrome can be found by visiting chrome://safe-browsing/#tab-db-manager
on desktop platforms.
A list does not contain unsafe web addresses, also called URLs, in their entirety; it would be prohibitively expensive to store them all in a device’s limited memory. Instead, it maps a URL, which can be very long, through a cryptographic hash function (SHA-256), to a unique fixed-size string. This fixed-size distinct string, called a hash, allows a list to be stored efficiently in limited memory. The Update API only handles URLs in the form of hashes and is also called hash-based API in this post.
Also, a list doesn’t store hashes in its entirety either, as even that would be too memory intensive. Instead of blocking a case where data is not shared with Google and the list is small, it includes prefixes for the hashes. We refer to the original hash as a full hash, and a hash prefix as a partial hash.
A list is updated according to the Request Frequency section of the Update API. Chrome also follows a back-off mode in case of a failed response. These updates take place approx. every 30 minutes after the minimum wait specified by the server in the list update response.
For those interested in browsing relevant source code, see here:
Source code
- GetListInfos() contains all the lists along with their associated threat types, the platforms they are used on, and their filenames on disk.
- HashPrefixMap shows how the lists are stored and maintained. They are grouped by the size of prefixes and added together to allow fast binary search-based lookups.
How hash-based URL lookups are performed
As an example of a safe browsing list, let’s say we have one for malware that contains partial hashes of URLs known to host malware. These partial hashes are generally 4 bytes long, but for illustrative purposes we only show 2 bytes.
['036b', '1a02', 'bac8', 'bb90']
When Chrome needs to check the reputation of a resource with the Update API, e.g. when you navigate to a URL, does not share the raw URL (or any part of it) with Safe Browsing to perform lookups. Instead, Chrome uses full hashes of the URL (and some combinations) to look up the partial hashes in the locally maintained Safe Browsing list. Chrome only sends these matched partial hashes to the Safe Browsing service. This ensures that Chrome provides these protections while respecting user privacy. This hash-based lookup happens in three steps in Chrome:
Step 1: Generate URL combinations and full hashes
When Google blocks URLs hosting potentially unsafe resources by placing them on a safe browsing list, the malicious actor can host the resource at another URL. A malicious actor can cycle through different subdomains to generate new URLs. Safe Browsing uses host suffixes to identify malicious domains that host malware in their subdomains. Similarly, malicious actors can also cycle through various subpaths to generate new URLs. So Safe Browsing also uses path prefixes to identify sites that host malware on different subpaths. This prevents malicious actors from cycling through subdomains or paths for new malicious URLs, enabling robust and efficient threat identification.
To incorporate these host suffixes and path prefixes, Chrome first calculates the full hashes of the URL and some patterns derived from the URL. Following the Safe Browsing API’s URLs and Hashing specifications, Chrome calculates the full hashes of URL combinations by following these steps:
- First, Chrome converts the URL to a canonical format, as defined in the specification.
- Then Chrome generates up to 5 host suffixes/variants for the URL.
- Then Chrome generates up to 6 path prefixes/variants for the URL.
- Chrome then generates the full hash for each combination for the combined 30 host suffix and path prefix combinations.
Source code
- V4LocalDatabaseManager::CheckBrowseURL is an example that performs a hash-based lookup.
- V4ProtocolManagerUtil::UrlToFullHashes creates the various URL combinations for a URL and calculates their full hashes.
Example
Let us e.g. say a user tries to visit https://evil.example.com/blah#frag
. The canonical url is https://evil.example.com/blah
. The host suffixes to try are evil.example.com
and example.com
. The path prefixes are /
and /blah
. The four combined URL combinations are evil.example.com/
, evil.example.com/blah
, example.com/
and example.com/blah
.
url_combinations = ["evil.example.com/", "evil.example.com/blah","example.com/", "example.com/blah"] full_hashes = ['1a02…28', 'bb90…9f', '7a9e…67', 'bac8…fa']
Step 2: Search partial hashes in local lists
Chrome then checks the full hashes of the URL combinations against the locally maintained Safe Browsing lists. These lists, which contain partial hashes, do not provide a conclusive malicious verdict, but can quickly identify whether the URL is considered non-malicious. If the full hash of the URL does not match any of the partial hashes from the local lists, the URL is considered safe and Chrome continues to load it. This happens for more than 99% of the checked URLs.
Source code
- V4LocalDatabaseManager::GetPrefixMatches gets the matching partial hashes for the full hashes of the URL and its combinations.
Example
Chrome finds that three full hashes 1a02…28
, bb90…9f
and bac8…fa
match local partial hashes. We note that this is for demonstration purposes and a match here is rare.
Step 3: Get matching full hashes
Next, Chrome sends only the matching partial hash (not the full URL or any particular part of the URL, or even their full hash) to the Safe Browsing service’s fullHashes.find
method. In response, it receives the full hashes of all malicious URLs for which the full hash begins with one of the partial hashes sent by Chrome. Chrome checks the fetched full hashes against the generated full hashes for the URL combinations. If a match is found, it identifies the URL with different threats and their severity derived from the matched full hashes.
Source code
- V4GetHashProtocolManager::GetFullHashes performs lookups for the full hashes of the matched partial hashes.
Example
Chrome sends the matched partial hashes 1a02, bb90 and bac8 to retrieve the full hashes. The server returns full hashes that match these partial hashes, 1a02…28, bb90…ce,
and bac8…01
. Chrome detects that one of the full hashes matches the full hash of the URL combination being checked and identifies the malicious URL as hosting malware.
Conclusion
Safe Browsing protects Chrome users from various malicious threats on the Internet. While Chrome provides these protections, it faces challenges such as memory capacity limitations, network bandwidth usage, and a dynamic threat landscape. Chrome is also aware of users’ privacy choices and shares little data with Google.
In a follow-up post, we’ll cover the more advanced protections Chrome provides to its users who have signed up for “Enhanced Protection”.