- DataArray.str.extract(pat, dim, case=None, flags=0)[source]#
Extract the first match of capture groups in the regex pat as a new dimension in a DataArray.
For each string in the DataArray, extract groups from the first match of regular expression pat.
If pat is array-like, it is broadcast against the array and applied elementwise.
re.Patternor array-like of
re.Pattern) – A string containing a regular expression or a compiled regular expression object. If array-like, it is broadcast.
dim (hashable or
None) – Name of the new dimension to store the captured strings in. If None, the pattern must have only one capture group and the resulting DataArray will have the same size as the original.
True) – If True, case sensitive. Cannot be set if pat is a compiled regex. Equivalent to setting the re.IGNORECASE flag.
0) – Flags to pass through to the re module, e.g. re.IGNORECASE. see compilation-flags.
0means no flags. Flags can be combined with the bitwise or operator
|. Cannot be set if pat is a compiled regex.
same type as valuesor
ValueError – pat has no capture groups.
ValueError – dim is None and there is more than one capture group.
ValueError – case is set when pat is a compiled regular expression.
KeyError – The given dimension is already present in the DataArray.
Create a string array
>>> value = xr.DataArray( ... [ ... [ ... "a_Xy_0", ... "ab_xY_10-bab_Xy_110-baab_Xy_1100", ... "abc_Xy_01-cbc_Xy_2210", ... ], ... [ ... "abcd_Xy_-dcd_Xy_33210-dccd_Xy_332210", ... "", ... "abcdef_Xy_101-fef_Xy_5543210", ... ], ... ], ... dims=["X", "Y"], ... )
>>> value.str.extract(r"(\w+)_Xy_(\d*)", dim="match") <xarray.DataArray (X: 2, Y: 3, match: 2)> array([[['a', '0'], ['bab', '110'], ['abc', '01']], [['abcd', ''], ['', ''], ['abcdef', '101']]], dtype='<U6') Dimensions without coordinates: X, Y, match