xarray.core.accessor_str.StringAccessor.extract#
- StringAccessor.extract(pat, dim, case=None, flags=0)[source]#
Extract the first match of capture groups in the regex pat as a new dimension in a DataArray.
For each string in the DataArray, extract groups from the first match of regular expression pat.
If pat is array-like, it is broadcast against the array and applied elementwise.
- Parameters
pat (
str
orre.Pattern
or array-like ofstr
orre.Pattern
) – A string containing a regular expression or a compiled regular expression object. If array-like, it is broadcast.dim (hashable or
None
) – Name of the new dimension to store the captured strings in. If None, the pattern must have only one capture group and the resulting DataArray will have the same size as the original.case (
bool
, default:True
) – If True, case sensitive. Cannot be set if pat is a compiled regex. Equivalent to setting the re.IGNORECASE flag.flags (
int
, default:0
) – Flags to pass through to the re module, e.g. re.IGNORECASE. see compilation-flags.0
means no flags. Flags can be combined with the bitwise or operator|
. Cannot be set if pat is a compiled regex.
- Returns
extracted (
same type as values
orobject array
)- Raises
ValueError – pat has no capture groups.
ValueError – dim is None and there is more than one capture group.
ValueError – case is set when pat is a compiled regular expression.
KeyError – The given dimension is already present in the DataArray.
Examples
Create a string array
>>> value = xr.DataArray( ... [ ... [ ... "a_Xy_0", ... "ab_xY_10-bab_Xy_110-baab_Xy_1100", ... "abc_Xy_01-cbc_Xy_2210", ... ], ... [ ... "abcd_Xy_-dcd_Xy_33210-dccd_Xy_332210", ... "", ... "abcdef_Xy_101-fef_Xy_5543210", ... ], ... ], ... dims=["X", "Y"], ... )
Extract matches
>>> value.str.extract(r"(\w+)_Xy_(\d*)", dim="match") <xarray.DataArray (X: 2, Y: 3, match: 2)> Size: 288B array([[['a', '0'], ['bab', '110'], ['abc', '01']], [['abcd', ''], ['', ''], ['abcdef', '101']]], dtype='<U6') Dimensions without coordinates: X, Y, match
See also
DataArray.str.extractall
,DataArray.str.findall
,re.compile
,re.search
,pandas.Series.str.extract