A better swift find/replace interface for NSRegularExpression
Regular expressions are a powerful tool in any programmers toolbox. But the interface for using them varies greatly across languages. Swift for example unfortunately depends on NSRegularExpression
from Objective C. While NSRegularExpression
is a powerful API, it is confusing and often overcomplicated for simple regex tasks - such as find/replace with capture groups.
Inspired by sed/vim
As a programmer I became accustom to the simple s/<pattern>/<replace/g
method of performing a find replace found in many common tools on any UNIX based system. My goal was to bring the same simplicity and flexibility to swift.
The code (Swift 4)
extension String {
func replace(_ pattern: String, options: NSRegularExpression.Options = [], collector: ([String]) -> String) -> String {
guard let regex = try? NSRegularExpression(pattern: pattern, options: options) else { return self }
let matches = regex.matches(in: self, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: NSMakeRange(0, (self as NSString).length))
guard matches.count > 0 else { return self }
var splitStart = startIndex
return matches.map { (match) -> (String, [String]) in
let range = Range(match.range, in: self)!
let split = String(self[splitStart ..< range.lowerBound])
splitStart = range.upperBound
return (split, (0 ..< match.numberOfRanges)
.compactMap { Range(match.range(at: $0), in: self) }
.map { String(self[$0]) }
)
}.reduce("") { "\($0)\($1.0)\(collector($1.1))" } + self[Range(matches.last!.range, in: self)!.upperBound ..< endIndex]
}
func replace(_ regexPattern: String, options: NSRegularExpression.Options = [], collector: @escaping () -> String) -> String {
return replace(regexPattern, options: options) { (_: [String]) in collector() }
}
}
This is an extension to the string class. As a result you can call this on any string in your package. Here is an example:
> "I like apples".replace("apples") { "oranges" }
I like oranges
This is just a simple word for word replace. Where this gets more powerful is when you want to use capture groups. Lets try to swap two words:
> "apples oranges apples oranges".replace("(\\w*) (\\w*)\\s*") { "\($0[2]) \($0[1]) " }
oranges apples oranges apples
This is functionally equivalent to %s/(\w*) (\w*)\s*/\2 \1 /g
in Vim.
What is happening here is the closure collector
gets called for every match of the pattern. This closure's arguments are an array of the capture group results. So $0[0]
contains the full pattern match. $0[1]
contains the first capture group and $0[n]
can address the rest of the capture groups. Pretty cool right?
Breaking it down
Lets look a little closer at what the code is doing.
Validate the input
guard let regex = try? NSRegularExpression(pattern: pattern, options: options) else { return self }
let matches = regex.matches(in: self, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: NSMakeRange(0, (self as NSString).length))
The first part is validating our input and ensuring that our pattern is valid. If our regex pattern fails to match (syntax error) or matches nothing, we want to safely return the source string. One important thing to note is we have to convert to an NSString
to get the character count for NSRegularExpression
since swift and Objective-C treat characters like \r\n
differently. Once we are totally sure we have at least one match we can move on.
Split by pattern
var splitStart = startIndex
return matches.map { (match) -> (String, [String]) in
let range = Range(match.range, in: self)!
let split = String(self[splitStart ..< range.lowerBound])
splitStart = range.upperBound
...
You can think of this next part like a split
function. A split will normally take a string and break it up into an array on some delimiter. For example: "A,List,Of,Words"
When split by ,
would become ["A", "List", "Of", "Words"]
. But what we need to do is split by a pattern. In order to do that we have to keep track of the start of the next split in splitStart
and chop our string up into "splits" - pieces of the string that match nothing while also keeping track of the matches. This part returns a tuple in the form (String, [String])
where the first part of the tuple is the unmatched split, and the second part is an array of matches.
Convert NSRange matches into Swift Ranges matches
return (split, (0 ..< match.numberOfRanges)
.compactMap { Range(match.range(at: $0), in: self) }
.map { String(self[$0]) }
)...
Now that we have our split string, we need to convert NSRegularExpression
's array of matches to substrings. This code returns the final tuple, but also constructs the second part by enumerating the number of ranges, mapping them to a swift range, and then mapping it to an array of substrings.
Reduce to the final output
.reduce("") { "\($0)\($1.0)\(collector($1.1))" } + self[Range(matches.last!.range, in: self)!.upperBound ..< endIndex]
The last line simply takes the mapped array of tuples, and reduces it to the final string. Reduce will take the final string $0
and append the preceding split $1.0
and then the modified matches/capture groups from the caller by calling collector($1.1)
and finally it appends the remainder of the matched string (if there is any)
Taking it further
This extension has other advantages since you are not limited to working with strings, and because you can alter the string in the closure you can add more complex logic to rewrite your input. Lets increment numbers:
> Nums: 1 2 3 4 5 6 7 8 9 10".replace("(\\d+)") { String(Int($0[1])! + 1) }
Nums: 2 3 4 5 6 7 8 9 10 11
By casting the matched number into an Int
we can perform other operations on it. Another example could be rewriting a timestamp:
> "Log date: 1413937910 ....".replace("(\\d+)") { Date(timeIntervalSince1970: TimeInterval($0[1])!).description }
Log date: 2014-10-22 00:31:50 +0000 ....
Download
You can download SwiftReplace from GitHub!
Other info
The reason the replace
is overloaded is to handle the possibility that you don't want to deal with capture groups. Otherwise you would have to pass _ in
in every replace function. This just cleans up the interface a little bit.