Cache and Web cache basics

Cache introduction

Caching is a technique to speed up the access to data by using an intermediate device called cache. This intermediate device stores previous accessed data to speed up subsequent requests for this data.
Caching can be implemented at different levels : CPU, hard disk, Web server, proxy server and browser.

CPU

A CPU cache is a cache used by the Central Processing Unit of a computer to reduce the average time to access data from the main memory. The cache sits between the CPU and the main memory. The CPU cache is a smaller and faster memory than the main memory, it stores copies of the data from frequently used main memory locations.

GPU caches works basically similar as CPU caches.

Hard disk

Hard disk cache

A hard disk cache is a part of the main memory which holds disk data. When the operating system request the data the second time, it fetches them from the hard disk cache. This is much more faster then getting the data every time from the hard disk.

Hardware disk cache

A hardware disk cache solution is to have additional memory on the hard disk controller. This is usually called a disk buffer, not a cache.

Web cache

The goal of Web caching is fast delivery of Web pages and to reduce network traffic on the Web server. Web caching can be basically done on the level of the Web server, the proxy server and the browser.

Web server

The Web server stores on its top the content of one or multiple Websites (content examples : Web pages, documents, videos, shop …). Web server caching can be done on different levels. I will explain : WordPress Web page caching, CMS database caching and PHP caching.

WordPress Web pages

WordPress stores its Web pages in parts. When a Web page is requested, WordPress must combine various parts : the header, the body content, the footer, the sidebar, etc. Additionally, the post data is retrieved from the database. With all these elements, the complete Web page will be created and returned to the browser.

Web page caching saves the Web pages as static pages in order to serve them quicker for the following requests.

CMS database caching

Result set caching is storing the results of a database query. Every time a Web page generates a query, the applications checks whether the results are already cached. If this is the case, it pulls them from the cache.
Instead of asking the database to return the same posts 1,000 times, the result of this query can be cached.

PHP cache

PHP OpCache improves PHP performance by storing precompiled script bytecode in the cache. This avoids PHP to load and parse the scripts on each request.

Proxy server

The proxy server acts as an intermediary for requests from browsers seeking resources from a Web server. A proxy server prevents the browser from having direct access to data on the Web server. This adds an additional layer of security to the Web server. The proxy also reduces network traffic from and to the Web server and is so acting as a load balancer.
Proxy caching is one of the tasks of a proxy server.

Examples

NGINX
Apache
WinGate

Forward proxy cache

A forward proxy acts as an intermediary for its associated clients to contact any Web server.
A forward proxy is a public, shared cache and often employed by an Internet Service Provider (ISP).

The proxy cache server must be close to users in order to deliver the information fast.

Proxy caching stores copies of frequently accessed Web pages and serves this information to users.
Forward proxy caches save bandwidth and speed up the delivery of Websites.

Proxy caches aren’t part of the client or the origin Web server – they are out on the Internet – so requests have to be routed to them somehow. One way to do this is to use the browser’s proxy setting to manually tell it what proxy to use. Another way is using interception. Interception proxies have Web requests redirected to them by the underlying network itself, so that browsers don’t need to be configured for them.

Example : Squid

Reverse proxy cache

A reverse proxy acts as an intermediary for its associated Web servers to be contacted by any client.
They are set up by the system administrator for security reasons, load balancing and Web acceleration (caching, compressing).

Reverse proxy caches sit in front of the Web server and act much in the same way like forward proxy servers, except they serve only specific Web pages.

Reverse proxy caches are also known as “surrogate caches” or “gateway caches”. Reverse proxy caches usually satisfy a considerable number of Web site requests, this is reducing the load on the origin Web server.

Content Delivery Networks

Content delivery networks (CDNs) are distributed reverse proxies throughout the Internet and deliver specific Web site(s) to users. The goal of a CDN is to serve content to users with high availability and high performance. This is done by delivering content from the closest CDN server to the user.
Multiple users are served by multiple CDN servers at the same time, this relives the Web server hosting the Website(s). CDNs are typically deployed by Webmasters to make their Web sites more scalable, reliable and better performing.

Examples

Cloudflare
Fastly
CloudFront
MaxCDN
Akamai

Proxy comparison

Forward proxy Reverse proxy CDN

Set up by : ISP Sys Admin Webmaster

Web page
delivering : any, as demanded only specific only specific

Clients : only associated any any

Geographic
location : specific near the server multiple

Browser

The browser cache is also called : full page cache, static cache, HTTP cache.
Browser caching concerns the static content like HTML, CSS, JS, image (+ favicon), video files, etc. Browser caching can also improve the performance when you retrieve information from the Internet. The browser uses the hard disk as cache to store Web pages.

The first time you ask for a Web page, your browser renders it and a copy of it is also stored on your hard disk. The next time you request access to this Web page, your browser checks if the date of the Web page on the Internet is newer than the one cached. If the date is the same, your browser uses the Web page on your hard disk instead of downloading it from Internet.

Some technical details

When cache controls (examples : “ETag” / “expires” / “max-age”) are properly set, a browser will ask the Web server if the content has changed since it was last downloaded.

The less a resource changes (images, pdfs, etc.), the longer you should cache it. If it never changes, then cache it for as long as you can (example : a year)!

Phoenix IT MOS – blog

high quality IT solutions

Cache and Web cache basics