Clean Your JavaScript: Get Rid of the Regular Expressions — Part 2

DANIEL SSEJJEMBA
Stackademic
Published in
5 min readNov 20, 2023

--

Photo by Tim Gouw on Unsplash

In this part of the series, I’ll look at Regular expressions, also commonly known as Regex, and try to convince you that having those anywhere around your codebase is a bad idea. I don’t attempt to explain the inner workings and application of the Regex for brevity but I’ll try to convince you to refactor those out of your codebase.

What are Regular Expressions?

The MDN Docs define regular expressions as patterns used to match character combinations in strings. What this really means is that they are a special syntax that once learned can be used to search for matches of particular patterns or character combinations in a string.

As you can notice, this can be useful, especially when making validations for forms. This is one of the most security-sensitive operations on the web, in most cases, we are validating an input because we want to restrict access or some other slightly related problem. We’ll see regular expressions being used to do things like credit card validation, email validation, password validation, search filters, and so on.

What is the problem with Regular Expressions?

Well, you might be asking yourself at this point, “What’s the issue then with using them, they seem useful”. And you wouldn’t be so far from the truth. Regular expressions are often irrelevant as the complexity of the Regular expression syntax makes it very hard to understand what is being achieved by a pattern at a glance.

I have written code professionally for close to 8 years now and I have never worked with an engineer who could recite what a Regex was trying to achieve easily at just a glance. This makes code reviews extremely painful to do since I need to check if the Regex pattern checks the requirements boxes fully and correctly. This makes the code very fragile and prone to typos that can lead to catastrophic business losses.

Regex is confusing to understand

Poorly written regular expressions can be exploited in some cases, leading to security vulnerabilities. For example, certain patterns can be susceptible to ReDoS (Regular Expression Denial of Service) attacks, where an attacker provides a specially crafted input that causes the system to consume an excessive amount of CPU or memory.

Regular expressions can be inefficient for certain tasks. They can lead to performance issues, particularly when dealing with very large texts or extremely complex patterns.

Debugging regular expressions can be challenging. When a regular expression doesn’t work as expected, it can be difficult to pinpoint exactly where the problem lies, especially in more complex patterns.

Regular expressions can quickly become complex and hard to read, especially for those who are not familiar with their syntax. Complex patterns can be difficult to understand and maintain, leading to a steep learning curve for beginners.

For example, check out this link here to find the full-on Regex for email validation. Then you’ll notice that you’ll quickly lose interest in trying to understand how it works and either just look for a different solution or just trust that it works and copy-paste it into your codebase, which will result in a myriad of problems if suddenly, somewhere down the line someone modifies a character in the pattern.

Generally speaking, Regex implementations will also depend on programming language and are known to be very slow with growing complexity and cannot handle complex tasks like parsing nested structures e.g. XML, and HTML text processing.

An interesting study here states that:

10% of the Node.js-based web services they examined are vulnerable to ReDoS. In this already harsh scenario, in the authors find that only 38% of the developers that they surveyed knew about the existence of ReDoS attacks.

How do I replace Regular Expressions?

Actually, the answer is quite easy and obvious for this. It’s one of the things every programmer learns early on before they even know about the existence of Regex, string methods. Based on what you are trying to achieve, you might end up using a combination of these with Array methods.

When you need to do some complex parsing, then you might want to approach the parse as a separate module in your codebase and write a custom parser using readable clean code functions. This makes your code easier to read out loud which makes it easy to maintain.

Even in cases where you feel the Regex solution is a quick “one-liner”, avoid using it at all costs. It is tempting especially for Junior developers to jump at the quick solution or to look at less code as elegant, but the truth is far from that. Elegant code is well-written maintainable and scalable code.

Conclusion

In this part of the series, I have attempted to convince you that using Regular expressions in your codebase is a code smell that should call for a high-risk refactor. While Regex might seem like a quick and powerful solution for string manipulation and pattern matching, its inherent complexity, the potential for security vulnerabilities like ReDoS attacks, and performance issues, especially with large texts or complex patterns, make it a less desirable option in many scenarios.

Therefore, the next time you reach for Regex as a solution, consider whether simpler methods could achieve the same result with greater clarity. This approach will not only make your code more accessible to your peers but also safeguard it against subtle bugs and performance issues, ultimately leading to a more robust and sustainable codebase.

Read More

  1. Clean Your Javascript: Transform conditional statements — Part 1

Stackademic

Thank you for reading until the end. Before you go:

  • Please consider clapping and following the writer! 👏
  • Follow us on Twitter(X), LinkedIn, and YouTube.
  • Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.

--

--