Controlling Certificate Lifetime and Revocation
In a Public Key Infrastructure (PKI), it may happen that an issued certificate shall not be valid anymore. There are three techniques to solve this, each with their own advantages and disadvantages: CRLs, OCSP, and Short Lifetime Certificates. This article compares these techniques and illustrates them on the example of the architecture for our product SCEPman.
Certificate Revocation Lists (CRLs)
For a long time, CRLs were the de-facto standard for PKIs. A Certification Authority (CA) regularly issues a list of certificate revocations, each with
- the certificate’s serial number,
- the time of revocation, and
- the revocation reason.
The list itself has a creation and expiration date and has a digital signature, most often from the CA. It is published as a file on one or more CRL Distribution Points (CDPs). This used to be an LDAP URL, but nowadays it is often only HTTP. CRLs of Root CAs that only issue Sub CA certificates typically have a validity of 6 to 12 months. Common validity periods for CRLs of Sub CAs are 1 or 2 weeks.
A system usually does not download a CRL on every certificate check. Instead, it relies on cached CRLs downloaded on earlier checks. It downloads a new CRL only shortly before the old one expires. This is because CRLs can become quite large – CRLs of public CAs may contain many certificates and grow to multiple MB in size.
This has the additional advantage that systems can check certificate validity even during CDP outages. Browsers, email clients, NACs, and so on treat certificates as invalid if they cannot check their revocation status: Attackers could use their stolen and revoked certificates just by interrupting the connection to the CDP. This depends on the settings, though.
On the downside, revocations arrive at participating systems only with some latency. Assume an admin revokes a certificate immediately after the CA issues a CRL with two weeks validity. Then systems relying on this CRL may use the certificate for another two weeks after revocation.
A modern solution to ensure availability of the CDPs is a Content Delivery Network (CDN), for example, based on Azure Blob Storage. As a best practice for Microsoft CAs, a Scheduled Task regularly issues a new CRL and uploads it to Azure Blob Storage.
SCEPman has a stateless architecture, it has no database on its own for common operations. This has many advantages. It does not require backups. Multiple SCEPman instances run in parallel without any configuration; this enables automatic scale-out to serve during performance peaks. So statelessness is very good for cloud apps, but CRLs are not possible without a database to store the list of revoked certificates. SCEPman 2.4 dynamically generates a CRL for each request on the CDP. The CRL contains only manually revoked certificates, though, analogously to a classic PKI.
Luckily, there is a better alternative for a cloud PKI:
Online Certificate Status Protocol (OCSP)
In 1999, OCSP was developed for high security applications for which the latency of CRLs was not acceptable. Instead of keeping a list revoked certificates, a system can request the current status of a specific certificate from a so-called OCSP Responder. Thus, verifying the validity of a certificate in real-time requires only a comparably small HTTP request. The part of the OCSP response detailing the status of the requested certificate has the same ASN.1 data structure as a CRL entry: serial number, time of revocation, and revocation reason. Hence, no difference to CRLs here.
Why neither CDP nor OCSP use HTTPS
CRL requests from a CDP as well as OCSP requests commonly do not use TLS, i.e. HTTPS, but HTTP. Otherwise, a chicken-and-egg problem could occur, as the certificates used for the TLS connection require validation as well. Of course, the creators of CDPs and OCSP have taken this into account. CRLs as well as OCSP responses have cryptographic signatures from their CA or an accredited authority. This ensures authenticity of the revocation information even through unsecure channels like HTTP.
The low latency of revocation information in OCSP comes with a price. If the OCSP Responders of a CA are down, it is not possibly anymore to check the validity of certificates – which usually means that all issued certificates become unusable. Therefore, OCSP Responders should be designed for high availability.
In detail, there are some important differences between implementations. For example, Microsoft’s OCSP servers use the CRL as revocation data base. This means, when an OCSP request comes in, the OCSP Responders searches for the certificate within the CRL and answers accordingly. Revoking a certificate at the CA does not automatically issue a new CRL, so the OCSP Responder will still claim that the certificate is valid as long as the CRL does. Using OCSP does not automatically give real-time revocation.
SCEPman uses OCSP to control certificate validity. The moment an OCSP request comes in, SCEPman searches for the corresponding object, a device or user, in Azure AD or JAMF database and compares whether it matches the configured requirements. For example, if a computer object in AAD is disabled or deleted, its certificate becomes invalid immediately. This way, our users can revoke certificates without latency and even without the tedious certificate management required in traditional PKIs.
As pointed out in the last section, SCEPman is stateless – we achieved this by using AAD and JAMF for certificate information instead of a separate database. This allows an easy high-availability installation of SCEPman, even with geo-redundancy. Microsoft guarantees 99.95% uptime for their Azure App Services, so even with a single instance, the VPN Gateway or WiFi NAC might fail more often than the OCSP Responder. While the high availability necessary for OCSP can be a blocker for on-premises PKIs, it is no problem when using SCEPman.
Even with SCEPman, some certificates are not linked to a directory object or an administrator wants to revoke them independently of the directory object’s state. Examples are certificate enrollment with Mosyle – unless you link these certificates to AAD objects – or server certificates. For these special cases, SCEPman uses a database, specifically an Azure Storage Account with Table Storage. SCEPman requires only read access to the database for OCSP, Table Storage is non-relational, and even in the cheapest SKU three-times redundant, so replications and backups are not necessary. SCEPman uses the database as a source of revocation information in parallel to a possible MDM directory. If needed, SCEPman may still store issued certificates in the database to allow easy manual revocation in addition to automatic revocation.
The following diagram illustrates how OCSP works in a SCEPman setup on three distributed App Service instances:
When a client (which in this case need not to be an end-user device, but possibly a RADIUS server like RADIUS-as-a-Service) wants to check whether a certificate is still valid, it uses the DNS-based Azure Traffic Manager to find a healthy and nearby SCEPman instance and then sends the OCSP request to that instance. The chosen SCEPman instance queries both Azure Storage and in this case Intune in parallel to check whether the device object linked to the certificate is there and in a good state. The OCSP response reflects these results. A single Azure Storage instance suffices, as it is always redundant. The level of redundancy depends on the selected SKU. Both, Azure Key Vault and MEM/Intune are redundant without extra configuration. The SCEPman instances do not communicate with each other and clients do not establish ongoing sessions with SCEPman instances for OCSP requests, so no cookies or the like are involved. Therefore, SCEPman instances can be added or removed as required.
Short Lifetime Certificates
In some applications, it is an advantage to have no revocation check at all. For example, systems completely without network connection and therefore without contact to CDP or OCSP Responder. Or checking the validity requires interaction with the certificate holder and therefore only the certificate holder can initiate the check, but not the CA. Attackers might still compromise these certificates and PKIs must limit the damage by invalidating the certificates.
These are cases for Short Lifetime Certificates: Certificates with a short validity period, only days or hours. It is not necessary to revoke these certificates, as a compromised certificate becomes invalid after a short time – often faster than one revoked on a CRL.
The simpler architecture has some more additional advantages: It requires no planning or operations for CDP and OCSP Responder. Certificate usage does not require network usage. Management of these certificates can be cut down because of their low value.
A pitfall is that a PKI should not manage these certificates. Because of their short validity, a CA issues much more of them than compared to long-lived certificates. When storing Short Lifetime Certificates in a database nevertheless, it will grow quickly to a large size for which it is not made. Microsoft’s CA therefore offers a setting in certificate templates to not even store these certificates. Note that you have to set the compatibility level of the certificate template to Windows Server 2008 R2 or newer, which is not the default.
For WiFi and VPN client certificates, Short Lifetime Certificates are unsuited. Connecting to the network requires a valid certificate; in order to get a new certificate, the client needs a network connection. This is not a problem if the client renews its certificates long enough before its expiration. Microsoft Intune and JAMF in conjunction with SCEPman do this automatically. A traditional Microsoft CA with auto enrollment also supports this without user interaction. When using Short Lifetime Certificates, a machine slips into this vicious circle of missing network connection and missing certificate when its user is on vacation for a week and the device is turned of during this time.
Let’s Encrypt issues TLS certificates with shorter lifetimes than common for other CAs. For servers, this is no issue, as they are always connected and use automated certificate issuance.
For the same reasons, SCEPman recommends Short Lifetime Certificates also only for servers. Starting with SCEPman v1.7, customers can configure the certificate validity per endpoint. Thus, they can use OCSP for client certificates with longer validity, while automatically issuing and renewing Short Lifetime Domain Controller certificates. Furthermore, we recommend short lifetimes for additional systems supplied with certificates via the static SCEP endpoint.
Summary
Each revocation method has advantages and disadvantages, so the choice depends on the context. For a traditional on-premises infrastructure PKI, CRLs are a good choice, because they are easy to implement. For a modern cloud PKI like SCEPman, OCSP is better because it allows revocations in real time and a cloud-friendly stateless implementation. Short Lifetime Certificates have more specialized use cases and can be mixed with CRLs or OCSP within a single PKI. Then, the choice depends on the type of certificate. The following table summarizes some of the most important aspects of the three techniques:
CRL | OCSP | Short Lifetime | |
---|---|---|---|
Latency | Until next CRL update, typically 1-2 weeks at most | 0-3 minutes in good implementations | Until natural expiration, typically 1-14 days |
Required Availability | Low | High | None |
Architectural Complexity | Medium (DB required) | Medium to high if a DB is used, otherwise low | None |
Revocation Reasons | All | All | None |
Temporary Revocations | Yes, but often impractical because of the latency | Yes | No |
For Public CAs, Apple and Mozilla require CRLs, because OCSP introduces a privacy concern: Each OCSP request tells the OCSP provider which domain is visited from which IP address. This is not a problem for private CAs like SCEPman, because the customer is the OCSP provider and also manages the clients to which SCEPman enrolls certificates, so it gains no additional information. Aaron Gable from Let’s Encrypt has summarized this and explained how Let’s Encrypt deals with it.
The Special Case of AAD CBA
A special case is Certificate-based authentication for Azure AD, which has entered General Availability state recently. Microsoft has seemingly not used one of the common crypto libraries all of which support OCSP a well as CRLs, and instead re-wrote the cryptographic routines from scratch and only implemented CRL support. Additionally, they do not read the CDP from the certificate, but use one that is separately configured in the Azure Portal. This results in special requirements for this use case in terms of certificate revocation.
This is a case where you can configure SCEPman’s CDP in AAD. When AAD checks revocation, it will use the CRL. Other systems use the OCSP responder with its more up-to-date revocation information and better performance.