xarray.core.accessor_str.StringAccessor.extractall#

StringAccessor.extractall(pat, group_dim, match_dim, case=None, flags=0)[source]#

Extract all matches of capture groups in the regex pat as new dimensions in a DataArray.

For each string in the DataArray, extract groups from all matches of regular expression pat. Equivalent to applying re.findall() to all the elements in the DataArray and splitting the results across dimensions.

If pat is array-like, it is broadcast against the array and applied elementwise.

Parameters:
  • pat (str or re.Pattern) – A string containing a regular expression or a compiled regular expression object. If array-like, it is broadcast.

  • group_dim (hashable) – Name of the new dimensions corresponding to the capture groups. This dimension is added to the new DataArray first.

  • match_dim (hashable) – Name of the new dimensions corresponding to the matches for each group. This dimension is added to the new DataArray second.

  • case (bool, default: True) – If True, case sensitive. Cannot be set if pat is a compiled regex. Equivalent to setting the re.IGNORECASE flag.

  • flags (int, default: 0) – Flags to pass through to the re module, e.g. re.IGNORECASE. see compilation-flags. 0 means no flags. Flags can be combined with the bitwise or operator |. Cannot be set if pat is a compiled regex.

Returns:

extracted (same type as values or object array)

Raises:
  • ValueError – pat has no capture groups.

  • ValueError – case is set when pat is a compiled regular expression.

  • KeyError – Either of the given dimensions is already present in the DataArray.

  • KeyError – The given dimensions names are the same.

Examples

Create a string array

>>> value = xr.DataArray(
...     [
...         [
...             "a_Xy_0",
...             "ab_xY_10-bab_Xy_110-baab_Xy_1100",
...             "abc_Xy_01-cbc_Xy_2210",
...         ],
...         [
...             "abcd_Xy_-dcd_Xy_33210-dccd_Xy_332210",
...             "",
...             "abcdef_Xy_101-fef_Xy_5543210",
...         ],
...     ],
...     dims=["X", "Y"],
... )

Extract matches

>>> value.str.extractall(
...     r"(\w+)_Xy_(\d*)", group_dim="group", match_dim="match"
... )
<xarray.DataArray (X: 2, Y: 3, group: 3, match: 2)>
array([[[['a', '0'],
         ['', ''],
         ['', '']],

        [['bab', '110'],
         ['baab', '1100'],
         ['', '']],

        [['abc', '01'],
         ['cbc', '2210'],
         ['', '']]],


       [[['abcd', ''],
         ['dcd', '33210'],
         ['dccd', '332210']],

        [['', ''],
         ['', ''],
         ['', '']],

        [['abcdef', '101'],
         ['fef', '5543210'],
         ['', '']]]], dtype='<U7')
Dimensions without coordinates: X, Y, group, match

See also

DataArray.str.extract, DataArray.str.findall, re.compile, re.findall, pandas.Series.str.extractall