An article to understand the hidden cryptography in the Web

An article to understand the hidden cryptography in the Web

Preface

When developing the website login function, how to ensure the security of the password during transmission/storage?

I believe that many front-end and back-end friends will be asked similar questions during the interview.

When I don’t know anything about cryptography, I just answer: " MD5Encryption."

As everyone knows, the application of cryptography in the seven-layer network model and even in webdevelopment is much more than I imagined.

1. What is cryptography?

Cryptography is necessary for various security applications, and modern cryptography aims to create mechanisms for protecting information through the application of mathematical principles and computer science. In contrast, cryptanalysis aims to decrypt such mechanisms in order to gain illegal access to information.

Cryptography has three key attributes:

  • Confidentiality , in order to prevent unauthorized parties from accessing information (in other words, to ensure that only authorized persons can access restricted data).
  • Integrity refers to the protection of information from being tampered with at will
  • The authenticity is related to the owner of the identification information.

For example, personal medical data:

  • Confidentiality , personal medical data needs to be kept secret, which means that only doctors or medical staff can access it.
  • Integrity must also be protected, because tampering with such data may lead to erroneous diagnosis or treatment and bring health risks to patients.
  • For authenticity , patient data should be linked to identified individuals, and the patient needs to know who the operator (doctor) is.

In this article, we will get started with the four basic cryptographic techniques of encryption, hashing, encoding and obfuscation.

The pictures in this article have been reproduced for easy understanding.

The outline and main content are quoted from: How Secure Are Encryption, Hashing, Encoding and Obfuscation?[1]

2. What is encryption?

Encryption definition: The process of transforming data in a manner that guarantees confidentiality.

For this reason, encryption requires the use of a confidential tool, which is called the "key" in terms of cryptography.

The encryption key and any other encryption keys should have some properties:

  • To protect confidentiality, the value of the key should be difficult to guess.
  • It should be used in a single context to avoid repeated use in different contexts (analogous to JS scope). Key reuse brings security risks, and if its confidentiality is circumvented, the impact is even greater, because it "unlocks" more sensitive data.

2.1 Classification of encryption: symmetric and asymmetric

Encryption is divided into two categories: symmetric and asymmetric

Symmetric encryption:

Purpose: File system encryption, Wi-Fi Protected Access (WPA), database encryption (such as credit card details)

Asymmetric encryption:

TLSUse: V**, SSH, .

The main difference is: the number of keys required :

  • In a symmetric encryption algorithm, a single key is used to encrypt and decrypt data. Only those who have access to the data can have a single shared key.
  • In the asymmetric encryption algorithm, two keys are used: one is a public key and the other is a private key. As the name suggests, the private key must be kept secret, and everyone can know the public key.
    • When applying encryption, a public key will be used, while decryption will require a private key.
    • Anyone should be able to send us encrypted data, but only we can decrypt and read it.

  1. Usually when asymmetric encryption is used to communicate over an insecure channel, a public key is established securely between the two parties.
  2. With this shared key, both parties switch to symmetric encryption.
  3. This kind of encryption is faster and more suitable for processing large amounts of data.

Encryption algorithms that can be recognized by the cryptographic community are public :

  • Some companies use proprietary or "military-grade" encryption technology for encryption, which is "private". And based on "complex" algorithms, but this is not how encryption works.
  • All encryption algorithms widely used and recognized by the cryptographic community are public, because they are based on mathematical algorithms and can only be solved with a key or advanced computing power.
  • The public algorithm has been widely adopted and proved its value.

3. What is a hash?

Hash algorithm definition: A cryptographic algorithm that can only be encrypted but not decrypted, which can convert any length of information into a fixed-length string.

The encryption algorithm is reversible (using a key) and can provide confidentiality (some newer encryption algorithms can also provide authenticity), while the hash algorithm is irreversible and can provide integrity to prove that a particular data.

The premise of the hash algorithm is simple: given an input of any length, output bytes of a specific length . In most cases, this sequence of bytes will be unique to the input and will not give an indication of what the input is. Put another way:

  1. It is impossible to determine the original data based on the output of the hash algorithm alone.
  2. Take some arbitrary data and use the hash algorithm output to verify whether this data matches the original input data, so there is no need to view the original data.

To illustrate this point, imagine a powerful hashing algorithm works by putting each unique input in its own bucket. When we want to check whether two inputs are the same, we can simply check whether they are in the same bucket.

The storage unit of the hash file is called a bucket (Bucket)

3.1 Example 1: Resource download

Websites that provide file downloads usually return the hash value of each file so that users can verify the integrity of their downloaded copies.

For example, in Debianthe image download service, you will find other files, for example SHA256SUMS, which contains the hash output (in this case, the SHA-256algorithm) of each file available for download .

  • After downloading the file, you can pass it to the selected hash algorithm and output a hash value
  • Use the hash value to match the hash value listed in the checksum file to verify the consistency.

In the terminal, it can be used opensslto hash the file:

$ openssl sha256/Users/hiro/Downloads/asymmetry.pngSHA256(/Users/hiro/Downloads/asymmetry.png) = 7c264efc9ea7d0431e7281286949ec4c558205f690c0df601ff98d59fc3f4f64

When the same file uses the same hashalgorithm, it can be used to verify whether it is of the same origin.

In a powerful hash algorithm, if there are two different inputs, it is almost impossible to obtain the same output.

On the contrary, if the scope of the calculated result is limited, there will be different data after calculation to obtain the same value, which is a hash conflict. (The results of two different data calculations are the same)

This is called: hash collision (hash collision) .

If two different inputs end up in the same bucket, conflicts will occur. If MD5and SHA-1, this will happen. This is problematic because we cannot distinguish which collision value matches the input.

The powerful hash algorithm creates a new bucket for almost every unique input.

3.2 Example 2: Website login

In webdevelopment, the most frequently used hash algorithm is on the website login application:

Most websites will hash the password and store it when storing the login data.

  • This is to prevent others from stealing the database information and restoring your initial input.
  • And the next time you log in, the web application will hash your password again and compare this hash with the previously stored hash.
  • If the hashes match, the web application is confident that you know the password even if there is no actual password storage in the web application.

registered:

Login:

An interesting aspect of the hash algorithm is that regardless of the length of the input data, the output of the hash is always the same length.

In theory, collisions will always be within the scope of possibility, although the possibility is very small.

The opposite is coding .

4. What is encoding?

Encoding definition: The process of converting data from one form to another, and has nothing to do with encryption .

It does not guarantee the three encryption attributes of confidentiality, integrity and authenticity, because:

  • It does not involve any secrets and is completely reversible .
  • Usually the amount of data proportional to the input value is output, and it is always the only value for that input.
  • The encoding method is considered to be public and is commonly used for data processing .
  • Encoding is never suitable for operational safety related .

4.1 URLCoding

Also known as percent sign coding, it is a uniform resource location ( URL) coding method. URLThe address (often said the URL) stipulates:

  • Commonly used numbers and letters can be used directly, and another batch of special user characters can also be used directly ( /,:@etc.)
  • All remaining characters must be %xxprocessed by encoding.

It has now become a standard, and basically all programming languages ​​have this kind of coding, such as:

  • js: encodeURI, encodeURIComponent
  • PHP: urlencode, urldecode, etc.

Coding method is very simple, the byte asciiplus front hexadecimal character code %such as a space character. asciiCode is 32, corresponding to a hex '20', then the urlencoderesult of the coding is: %20.

# Source text: The quick brown fox jumps over the lazy dog
# After coding: #!shell%54%68%65%20%71%75%69%63%6b%20%62%72%6f%77%6e%20%66%6f%78%20%6a% 75%6d%70%73%20%6f%76%65%72%20%74%68%65%20%6c%61%7a%79%20%64%6f%67

4.2 HTML实体编码

In HTML, the data needs to be HTMLencoded to comply with the required HTMLcharacter format. The same goes for escaping to avoid XSS attacks.

4.3 Base64/32/16Coding

base64, base32, base16May encode respectively converted to 8-bit bytes 6, 5, 4.

16,32,64 respectively indicate how many characters are used to encode,

Base64It is often used to represent, transmit, and store some binary data in situations where text data is usually processed. Comprising MIMEof email,email via MIME, in XMLcomplex data is stored.

Coding principle:

  1. Base64Encoding requires 3 8-bit bytes to be converted into 4 6-bit bytes
  2. Then add two 0s in front of the 6 bits to form an 8-bit one byte form
  3. The largest number that can be represented in a 6-digit binary system is 2 to the 6th power is 64, which is why it is 64 characters
    • A-Z,a-z,0-9,+,/These 64 coded characters, the =number is not a coded character, but a filling character

Base64The mapping table is as follows:

Give a chestnut:

Quote from: An article thoroughly understands the principle of Base64 encoding [2]

  • The first Mstep: " ", a" ", " n" corresponding to ASCIIthe code values were 77,97,110, corresponding to the binary value 01001101, 01100001, 01101110. As shown in the second and third lines of the figure, a 24-bit binary string is formed.
  • Step 2: As shown in the red box, divide each group of 24 bits into four groups of 6 binary bits.
  • Third step: In front of each group of the above two fill 0, extended to 32 bits, four bytes at this time becomes: 00010011, 00010110, 00000101, 00101110. The corresponding values ​​( Base64coding index) are: 19, 22, 5, 46.
  • Step 4: The above values in the lookup table Base64 encoding, respectively: T、W、F、u. Therefore,Man " " Base64after the coding TWFubecomes: .

The above example is intended to point out that the use case of encoding is only for data processing, and does not provide protection for the encoded data.

4. What is confusion?

Definition of 将人类可读的字符串转换为难以理解的字符串confusion: .

  • Contrary to encryption, the obfuscation process does not include the encryption key.
  • Similar to encoding, obfuscation does not guarantee any security, although it is sometimes mistakenly used as an encryption method

Although confidentiality cannot be guaranteed, there are other applications of obfuscation:

  • Used to prevent tampering and protect intellectual property rights.
  • App source code is usually obfuscated before packaging
    • Because the source code is located in the user's device, the code can be extracted from it. Since the code is not friendly after obfuscation, it prevents reverse engineering and helps protect intellectual property rights.
    • In turn, this prevents tampering with the code and redistributing it for malicious use.

However, there are many tools that help eliminate application code confusion. That's another topic. . .

4.1 Example 1: JavaScriptConfusion

JavaScriptSource code:

function hello(name) {console.log('Hello, '+ name);}
hello('New user');

After confusion:

var _0xa1cc=["\x48\x65\x6C\x6C\x6F\x2C\x20","\x6C\x6F\x67","\x4E\x65\x77\x20\x75\x73\x65\x72"]; function hello(_0x2cc8x2){console[_0xa1cc[1]](_0xa1cc[0]+ _0x2cc8x2 "_0xa1cc[1]")}hello(_0xa1cc[2])

summary

Analyze four cryptographic techniques from confidentiality, integrity and authenticity:

encryption

Hash

coding

Confuse

Confidentiality

Completeness

Authenticity

  • Although encryption is to ensure the confidentiality of data, some modern encryption algorithms also use other strategies to ensure data integrity (sometimes through embedded hash algorithms) and authenticity.
  • Hash can only guarantee integrity, but it can be controlled by integrity comparison, such as: hash-based message authentication code ( HMAC) and some transport layer security ( TLS) methods.
  • Encoding has been used to mean encryption in the past and still has this meaning outside of the technical field, but in the programming world, it is only a data processing mechanism and has never provided any security measures .
  • Obfuscation can be used to improve the ability to resist attacks; however, it can never guarantee the confidentiality of data. Cunning opponents will eventually bypass the obfuscation strategy. As with coding, never treat obfuscation as a reliable security control .

Appendix: Hash function

Commonly used hash functions :

  • MD5, A widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value to ensure complete and consistent information transmission. * Extensive, but outdated.
  • SHA-256/SHA512, "With salt". In Bitcoin, the blockchain uses an SHA-256algorithm as the base cryptographic hash function.
    • The secure hash algorithm secure hash algorithmis a family of password hash functions.
    • SHAThere are five algorithms in the family, namelySHA-1,SHA-224,SHA-256,SHA-384,SHA-512
    • They are government standards in the United States, and the next four are calledSHA-2

  • bcrypt: The bcryptalgorithm is a relatively slow algorithm.

  • There is a common saying in cryptography: the slower the algorithm, the more secure. The more the algorithm is calculated, the higher the cost of hacking:
  • By saltsumming constthese two values ​​to slow down the encryption process, the encryption time of ta (hundred ms level) far exceeds md5(about 1msabout).
  • For computers, Bcryptthe calculation speed is very slow, but for users, this process is not too slow.
  • bcryptIt is one-way, and after the processing of the saltsum cost, rainbowthe probability of being attacked and cracked is greatly reduced, and the difficulty of cracking is also increased a lot.
  • Compared with MD5other encryption methods, it is more secure and easier to use.
  • A well-designed key expansion algorithm, such as PBKDF2, bcrypt, scrypt.

Postscript & Reference

  • How Secure Are Encryption, Hashing, Encoding and Obfuscation?[3]
  • Encryption and coding with the brains in CTF[4]
  • Storage of hashed files-'bucket'[5]

So, how to ensure the security of the password during transmission/storage?

Let's break it down next time!

❤️ After watching three things

If you think this content is quite inspiring for you, I would like to invite you to do three small favors for me:

  1. Like it, so that more people can see this content (If you don't like it, it's a hooligan).
  2. Pay attention to the public account "Front-end adviser" and share original knowledge from time to time.
  3. Also look at other articles

You can also come to my GitHubblog to get the source files of all articles:

Front-end persuasion guide : https://github.com/roger-hiro/BlogFN

Reference

[1]

How Secure Are Encryption, Hashing, Encoding and Obfuscation?: https://auth0.com/blog/how-secure-are-encryption-hashing-encoding-and-obfuscation/#What-is-Encoding-

[2]

Quoted from: An article thoroughly understands the principle of Base64 encoding: https://blog.csdn.net/wo541075754/article/details/81734770

[3]

How Secure Are Encryption, Hashing, Encoding and Obfuscation?: https://auth0.com/blog/how-secure-are-encryption-hashing-encoding-and-obfuscation/#What-is-Encoding-

[4]

Encoding and encryption in CTF: https://www.cnblogs.com/godoforange/articles/10850493.html

[5]

Storage of hash files-'bucket': https://blog.csdn.net/Dearye_1/article/details/78492021

Reference: https://cloud.tencent.com/developer/article/1539550 An article to understand the hidden cryptography in the Web-Cloud + Community-Tencent Cloud