正则表达式(正则表达式)匹配和解析文本。正则表达式语言是一种用于描述模式的强大速记。 PowerShell以几种方式使用正则表达式。有时很容易忘记这些命令正在使用正则表达式,因为它是如此紧密集成。您可能已经使用这些命令中的一些,甚至没有实现它。
图片来自 XKCD.com,略微改变
指数
本文的范围
教学regex语法和语言超出了本文的范围。我将涵盖我需要的东西,以专注于Powershell。我的regex示例将故意非常基本。
正则表达式快速入门
您可以使用模式中的正常数字和字符以进行精确匹配。当您确切地了解需要匹配的情况时,这就是有效的。有时您需要一个模式,其中任何数字或字母应该使匹配有效。以下是我可以在这些示例中使用的一些基本模式。
\d digit [0-9]
\w alpha numeric [a-zA-Z0-9_]
\s whitespace character
. any character except newline
() sub-expression
\ escape the next character
So a pattern of \d\d\d-\d\d-\d\d\d\d
will match a USA social security number. Three digits, then a dash, two digits, then a dash and then 4 digits. There are better and more compact ways to represent that same pattern. But this will work for our examples today.
正则表达式资源
以下是一些正则表达式资源,可以帮助您找到任务的正确模式。
交互式正则表达式计算器:
文档和培训:
选择字符串
此cmdlet非常适合搜索文本模式的文件或字符串。
Get-ChildItem -Path $logFolder | 选择字符串 -Pattern 'Error'
This example searches all the files in the $logFolder
for lines that have the word Error
. The pattern parameter is a regular expression and in this case, the word Error
is valid regex. It will find any line that has the word error in it.
Get-ChildItem -Path $logFolder |
选择字符串 -Pattern '\d\d\d-\d\d-\d\d\d\d'
这一个是搜索文本文档,以获取看起来像美国社会安全号码的数字。
-比赛
The -比赛
opperator takes a regular expression and returns $true
if the pattern matches.
PS> $message = 'there is an error with your file'
PS> $message -比赛 'error'
True
PS> '123-45-6789' -比赛 '\d\d\d-\d\d-\d\d\d\d'
True
如果将匹配数组应用于数组,则会获取与模式匹配的所有项目的列表。
PS> $data = @(
"General text without meaning"
"my ssn is 123-45-6789"
"some other string"
"another SSN 123-12-1234"
)
PS> $data -比赛 '\d\d\d-\d\d-\d\d\d\d'
my ssn is 123-45-6789
another SSN 123-12-1234
变化
-imatch
使您在执行不敏感操作(默认)的情况下进行显式
-cmatch
使操作区分大小写。
-notmatch
没有匹配时返回true。
The i
and c
variants of an operator is available for all comparison operators.
-喜欢
The -喜欢
command is like -比赛
except it does not use regex. It uses a simpler wildcard pattern where ?
is any character and *
is multiple unknown characters.
$message -喜欢 '*error*'
One important difference is that the -喜欢
command expects an exact match unless you include the wildcards. So if you are looking for a pattern within a larger string, you will need to add the wildcards on both ends. '*error*'
Sometimes all you need is a basic wildcard and that is where -喜欢
comes in.
This operator has -ilike
, -clike
, -notlike
variants.
string.Contains()
If all you want to do is test to see if your string has a substring, you can use the string.contains($substring)
appraoch.
PS> $message = 'there is an error with your file'
PS> $message.contains('error')
True
string.contains()
区分大小写。这将更快地执行速度更快,然后使用其他针对性方案的Opperator。
-代替
replace命令用来使用Regex进行模式匹配。
PS> $message = "Hi, my name is Dave."
PS> $message -代替 'Dave','Kevin'
Hi, my name is Kevin.
PS> $message = "My SSN is 123-45-6789."
PS> $message -代替 '\d\d\d-\d\d-\d\d\d\d', '###-##-####'
My SSN is ###-##-####.
The other variants of this command are -creplace
and -ireplace
.
string.replace()
The .Net String.Replace($pattern,$replacement)
funciton does not use regex. I mention this because it performs faster than -代替
.
PS> $message = "Hi, my name is Dave."
PS> $message.replace('Dave','Kevin')
Hi, my name is Kevin.
这一个也很敏感。 INFACT,所有字符串功能都区分大小写。
-分裂
此命令通常被视为使用正则表达式的命令。我们经常拆分恰好是正则表达式的简单模式,我们甚至没有注意到。
PS> 'CA,TX,NE' -分裂 ','
CA
TX
NE
每一次和一段时间,我们会尝试使用其他一些字符,这意味着正则表达式中的其他东西。这将导致非常意外的结果。如果我们将逗号更改为一段时间,我们会得到一堆空白行。
PS> 'CA.TX.NE' -分裂 '.'
PS>
The reason is that .
will match any character, so in this case it matches every character. It ends up spliting at every character and giving us 9 empty values.
PS> ('CA.TX.NE' -分裂 '.').count
9
这就是为什么要记住使用正则表达式的命令很重要。
-isplit
and -csplit
are the variants on this command.
string.split()
Like with the replace command, there is a string.split()
function that does not use regex. It will be faster when splitting on a character (or substring) and give you the same results.
转变
By default, the switch
statement does exact matches. But it does have an -regex
option to use regex matches instead.
switch -regex ($message)
{
'\d\d\d-\d\d-\d\d\d\d' {
Write-Warning 'message may contain a SSN'
}
'\d\d\d\d-\d\d\d\d-\d\d\d\d-\d\d\d\d' {
Write-Warning 'message may contain a credit card number'
}
'\d\d\d-\d\d\d-\d\d\d\d' {
Write-Warning 'message may contain a phone number'
}
}
This feature of switch
is often overlooked.
多个交换机匹配
在交换机中使用Regex的有趣的事情是它将测试每个模式,以便您可以使用多个匹配到一个交换机。
使用上述交换机语句运行此示例:
PS> $message = "Hey, call me at 123-456-1234, there is an issue with my 1234-5678-8765-4321 card"
WARNING: message may contain a credit card number
WARNING: message may contain a phone number
Even though we had one string in the $message
, 2 of the switch statements executed.
validatepattern.
When creating an advanced function, you can add a [ValidatePattern()]
to your parameter. This will validate the incomming value has the pattern that you expect.
function Get-Data
{
[cmdletbinding()]
param(
[validatepattern.('\d\d\d-\d\d-\d\d\d\d')]
[string]
$SSN
)
# ... #
}
此示例请求来自用户的SSN,它对输入执行验证。如果无效,这将为用户提供错误消息。我的问题是,默认情况下它不会给出良好的错误消息。
PS> Get-Data 'Kevin'
get-data : Cannot validate argument on parameter 'SSN'. The argument "Kevin" does not match
the "\d\d\d-\d\d-\d\d\d\d" pattern. Supply an argument that matches "\d\d\d-\d\d-\d\d\d\d"
and try the command again.
验证
One way around that is to use a [ValidateScript({...})]
instead that throws a custom error message.
[验证({
if( $_ -比赛 '\d\d\d-\d\d-\d\d\d\d')
{
$true
}
else
{
throw 'Please provide a valid SSN (ex 123-45-5678)'
}
})]
现在我们收到此错误消息
PS> get-data 'Kevin'
get-data : Cannot validate argument on parameter 'SSN'.
Please provide a valid SSN (ex 123-45-5678)
它可能会使我们的参数复杂化,但我们的用户更容易理解。
验证PS 6中的Errormessage
使用验证脚本只是为了给出良好的错误消息是丑陋的。 PS 6中的一个新功能是您可以使用errormessage参数为valiatepattern指定自定义错误消息。以下是您将如何指定errormessage
[validatepattern.('\d\d\d-\d\d-\d\d\d\d',ErrorMessage = 'The pattern does not match a valid US SSN format.')]
当值不匹配时,则在下面的错误呈现用户。
Get-Data : Cannot validate argument on parameter 'SSN'. The pattern does not match a valid US SSN format.
虽然这是一个很好的新功能,但它使代码无效的PowerShell。如果我在Windows PowerShell中运行相同的代码,我会收到此错误消息:
Property 'ErrorMessage' cannot be found for type 'System.Management.Automation.ValidatePatternAttribute'.
变量上的验证器
我们主要将验证者视为高级功能的一部分,但实际情况是它们适用于变量,可以在高级功能之外使用。
PS> [validatepattern.('\d\d\d-\d\d-\d\d\d\d')]
PS> [string]$SSN = '123-45-6789'
PS> $SSN = "I don't know"
The variable cannot be validated because the value `I don't know`
is not a valid value for the SSN variable.
我不能说我真的是做到这一点,但这将是一个很好的知识。
$比赛
When you use the -比赛
operator, an automatic variable called $matches
contains the results of the match. If you have any sub expressions in your regex, those sub matches are also listed.
$message = 'My SSN is 123-45-6789.'
$message -比赛 'My SSN is (\d\d\d-\d\d-\d\d\d\d)\.'
$比赛[0]
$比赛[1]
My SSN is 123-45-6789.
123-45-6789
命名比赛
这是我最喜欢的功能之一,大多数人都不知道。如果您使用命名的Regex匹配,则可以通过匹配项中的名称访问该匹配。
$message = 'My Name is Kevin and my SSN is 123-45-6789.'
if($message -比赛 'My Name is (?<Name>.+) and my SSN is (?<SSN>\d\d\d-\d\d-\d\d\d\d)\.')
{
$比赛.Name
$比赛.SSN
}
In the example above, the (?<Name>.+)
is a named sub expression. This value is then placed in the $比赛.Name
property. Same goes for SSN.
.NET Regex.
因为这是PowerShell,我们可以完全访问.NET Regex对象。其中大多数由上述功能覆盖。如果您进入更高级的正则表达式,则需要自定义选项,然后拍摄此对象。
[regex]::new($pattern) | Get-Member
所有.NET Regex方法都区分大小写。
I’m going to touch on [regex]::Escape()
because there is not a PowerShell equivalent.
逃生正则表达式
regex is a complex language with common symbols and a shorthand syntax. There are times where you may want to match a literal value instead of a pattern. The [regex]::Escape()
will escape out all the regex syntax for you.
Take this string for example (123)456-7890
. It contains regex syntax that may not be obvious to you.
PS> $message = $message = 'My phone is (123)456-7890'
PS> $message -比赛 '(123)456-7890'
False
You may think this is matching a specific phone number but the thing it would match is 123456-7890
. My point is that when you use a literal string where a regex is expected, that you will get unexpected results. This is where the [regex]::Escape()
solves that issue.
PS> $message -比赛 [regex]::Escape('(123)456-7890')
True
I don’t want to talk on this too much because this is an anti-pattern. If you are needing to regex escape your entire pattern before you match it, then you should use the string.Contains()
method instead.
您唯一应该逃避正则表达式的时间是在更复杂的正则表达式内部将该值放置在一个。即使是用更复杂的正则表达式模式解决。
如果您在代码中使用此功能。重新思考你为什么需要它,因为赔率是,你正在使用错误的运算符或方法。
每行多匹配
The -Match
operator will only match once per line so the $matches
variable only contains that first match. There are times where I want to grab every occurace of a pattern even if there are multiples per line. I have 2 ways that I approach this scenario.
-AllMatches.
选择字符串
offers support for this with the -AllMatches.
parameter. In this case the returned object contains a 火柴
property for every match.
PS> $data = 'The event runs from 2018-10-06 to 1018-10-09'
PS> $datePattern = '\d\d\d\d-\d\d-\d\d'
PS> $results = $data | 选择字符串 $datePattern -AllMatches.
PS> $results.火柴.Value
2018-10-06
2018-10-09
正则表达式匹配()
The [Regex]
object method 火柴 is the other option.
PS> $data = 'The event runs from 2018-10-06 to 1018-10-09'
PS> $datePattern = [正则表达式]::new('\d\d\d\d-\d\d-\d\d')
PS> $matches = $datePattern.火柴($data)
PS> $matches.Value
2018-10-06
2018-10-09
应该匹配
When using Pester tests, the Should Match
uses a regular expression.
It "contains a SSN"{
$message = Get-Data
$message | Should Match '\d\d\d-\d\d-\d\d\d\d'
}
When with Pester is the exception to the rule of not using [regex]::Escape()
. Pester does not have a substring match alternative.
It "contains $subString"{
$message = Get-Data
$message | Should Match ([regex]::Escape($subString))
}
把它整合在一起
正如您所看到的,您可以使用大量地点,或者您可以使用正则表达式或可能已经使用Regex而不是知道它。 PowerShell确实完善将这些整合到语言中。但是,如果表现是一个问题,那么谨慎使用它们,并且您实际上并不是使用正则表达式模式。
如果您发现PowerShell中的任何其他常见方法,请告诉我。我很想听到他们并将它们添加到我的列表中。