hong2p.olf module

Functions for loading YAML metadata created by my tom-f-oconnell/olfactometer repo, and dealing with the resulting representations of odors delivered during an experiment.

Keeping these functions here rather than in the olfactometer repo because it has other somewhat heavy dependencies that the analysis side of things will generally not need.

hong2p.olf.abbrev(odor_str, abbrevs=None, *, component_delim=' + ', conc_delim='@')[source]

Abbreviates odor name in input, when an abbreviation is available.

Parameters
  • odor_str (str) – can optionally contain concentration information (followed by

  • olf.conc_delimiter

  • so) (if) –

  • abbrevs (Optional[Dict[str, str]]) – dict mapping from input names to the names (abbreviations) you want. if not passed, the dict olf.odor2abbrev is used

Return type

str

hong2p.olf.add_abbrevs_from_odor_lists(odor_lists, name2abbrev=None, yaml_path=None, *, if_abbrev_mismatch='warn', verbose=False)[source]

Adds name->abbreviation mappings in odor_lists to odor2abbrev input.

Parameters

yaml_path (Union[str, Path, None]) – this is used included in some print/warning messages, but is not loaded.

Return type

None

hong2p.olf.add_mix_str_index_level(df, mix_col='odor')[source]
Return type

DataFrame

hong2p.olf.format_mix_from_strs(odor_strs, *, delim=' + ', warn_unused_levels=False)[source]
hong2p.olf.format_odor(odor_dict, conc=True, name_conc_delim=None, conc_key='log10_conc', cast_int_concs=False)[source]

Takes a dict representation of an odor to a pretty str.

Expected to have at least ‘name’ key, but will also use ‘log10_conc’ (or conc_key) if available, unless conc=False.

Parameters

cast_int_concs (bool) – if True, will convert (log10) concentrations to integer if they are np.isclose to their nearest integer.

>>> odor = {'name': 'ethyl acetate', 'log10_conc': -2}
>>> format_odor(odor)
'ethyl acetate @ -2'
hong2p.olf.format_odor_list(odor_list, *, delim=' + ', **kwargs)[source]

Takes list of dicts representing odors for one trial to pretty str.

Return type

str

hong2p.olf.is_odor_component_level(level_name)[source]

Returns True if column/level name or Series-key is named to store odor metadata

Values for matching keys should store strings representing one, of potentially multiple, component odors presented (simultaneously) on a given trial. My convention for representing multiple components presented together one one trial is to make multiple variables (e.g. columns), named such as [‘odor1’, ‘odor2’, …], with a different sufffix number for each component.

Return type

bool

hong2p.olf.is_odor_var(var_name)[source]

Returns True if column/level name or Series-key is named to store odor metadata

Values for matching keys should store strings representing one, of potentially multiple, component odors presented (simultaneously) on a given trial. My convention for representing multiple components presented together one one trial is to make multiple variables (e.g. columns), named such as [‘odor1’, ‘odor2’, …], with a different sufffix number for each component.

Return type

bool

hong2p.olf.load_stimulus_yaml(yaml_path)[source]
hong2p.olf.n_odor_component_levels(df)[source]
Return type

int

hong2p.olf.odor_index_sort_key(level, sort_names=True, names_first=True, name_order=None, require_in_name_order=False, warn=True, _debug=False)[source]
Parameters
  • level (Index) – one level from a pd.MultiIndex with odor metadata. elements should be odor strings (as parse_odor_name() and parse_log10_conc()).

  • sort_names (bool) – whether to use odor names as part of sort key. If False, only sorts on concentrations.

  • names_first (bool) – if True, sorts on names primarily, otherwise sorts on concentrations primarily. Ignored if sort_names is False.

  • name_order (Optional[List[str]]) – list of odor names to use as a fixed order for the names. Concentrations will be sorted within each name.

  • require_in_name_order (bool) – if True, raises ValueError if odors with not in name_order are present. Otherwise sorts such odors alphabetically after those in name_order.

  • warn (bool) – if True and require_in_name_order=False, warns about which odors were not in name_order

Return type

Index

hong2p.olf.odor_lists_to_multiindex(odor_lists, *, sort_components=True, pad_to_n_odors=None, **format_odor_kwargs)[source]
Parameters

pad_to_n_odors (Optional[int]) – if int, returned MultiIndex will have at least this many levels dedicated to odor components (+ the 1 ‘repeat’ level always included).

Return type

MultiIndex

hong2p.olf.odordict_sort_key(odor_dict)[source]

Returns a hashable key for sorting odors by name, then concentration.

Return type

Tuple[str, float]

hong2p.olf.pad_odor_index_to_n_components(df, n)[source]

Pads dataframe odor index, so that it has n ‘odor<n>’ component levels.

Parameters

n (int) – target number of odor levels

Odors presented together (e.g. in one trial, mixed in air), should each have their own level in the odor MultiIndex, with olf.solvent_str used to fill when a given trial had less components presented at once.

Return type

DataFrame

hong2p.olf.pad_odor_indices_to_max_components(dfs)[source]

Pads odor index each each dataframe to max number of input component levels.

Return type

Sequence[DataFrame]

hong2p.olf.panel_odor_orders(df, panel2name_order=None, **kwargs)[source]

Returns dict of panel names to ordered unique odor strs (with concentration).

Parameters
  • df (DataFrame) – DataFrame with columns ‘panel’ and >=1 matching is_odor_var

  • panel2name_order (Optional[Dict[str, List[str]]]) – dict mapping panels to lists of odor names, each in the desired order

  • **kwargs – passed through to sort_odors

hong2p.olf.parse_log10_conc(odor_str, *, require=False)[source]

Takes formatted odor string to float log10 vol/vol concentration.

Returns None if input does not contain olf.conc_delimiter.

Parameters
  • odor_str (str) – contains odor name, and generally also concentration

  • require (bool) – if True, raises ValueError if olf.conc_delimiter is not in input

>>> parse_log10_conc('ethyl acetate @ -2')
-2
Return type

Optional[float]

hong2p.olf.parse_odor(odor_str, *, require_conc=False)[source]
Return type

dict

hong2p.olf.parse_odor_list(trial_odors_str, *, delim=' + ', **parse_odor_kwargs)[source]
Return type

Sequence[NewType()(OdorDict, dict)]

hong2p.olf.parse_odor_name(odor_str, *, require_conc=True)[source]

Takes formatted odor string to just the name of the odor.

Returns None if input matches olf.solvent_str, but otherwise raises ValueError if odor_str does not contain olf.conc_delimiter.

Parameters
  • odor_str (str) – contains odor name and concentration. name and concentration must be separated by olf.conc_delimiter (‘@’), with whitespace on either side of it.

  • require_conc (bool) – if False, will return odor_str if it contains no olf.conc_delimiter

>>> parse_odor_name('ethyl acetate @ -2')
'ethyl acetate'
>>> parse_odor_name(solvent_str) is None
True
Return type

Optional[str]

hong2p.olf.remove_consecutive_repeats(odor_lists)[source]

Returns a list without any consecutive repeats and int # of consecutive repeats.

Raises ValueError if there is a variable number of consecutive repeats.

Assumed that all elements of odor_lists are repeated the same number of times, for each consecutive group of repeats. As long as any repeats are to full n_repeats and consecutive, it is ok for a particular odor (e.g. solvent control) to be repeated n_repeats times in each of several different positions.

>>> without_repeats, n = remove_consecutive_repeats(['a','a','a','b','b','b'])
>>> without_repeats
['a', 'b']
>>> n
3
>>> without_repeats, n = remove_consecutive_repeats(['a','a','b','b','a','a'])
>>> without_repeats
['a', 'b', 'a']
>>> n
2
>>> without_repeats, n = remove_consecutive_repeats(['a','a','a','b','b'])
Traceback (most recent call last):
ValueError: variable number of consecutive repeats

Wanted to also take a list-of-lists-of-dicts, where each dict represents one odor and each internal list represents all of the odors on one trial, but the internal lists (nor the dicts they contain) would not be hashable, and thus cannot work with Counter as-is.

Return type

Tuple[List[Hashable], int]

hong2p.olf.save_odor2abbrev_cache()[source]
hong2p.olf.sort_odor_list(odor_list)[source]

Returns a sorted list of dicts representing odors for one trial

Name takes priority over concentration, so with the same set of odor names in each trial’s odor_list, this should produce a consistent ordering (and same indexes can be used assuming equal length of all)

hong2p.olf.sort_odors(df, *, panel_order=None, panel2name_order=None, panel=None, if_panel_missing='warn', axis=None, _debug=False, **kwargs)[source]

Sorts DataFrame by odor index/columns.

Parameters
  • df (DataFrame) – should have columns/index-level names where olf.is_odor_var(<col name>) returns True

  • panel_order (Optional[List[str]]) – list of str panel names. If passed, must also provide panel2name_order. Will sort panels first, then odors within each panel.

  • panel2name_order (Optional[Dict[str, List[str]]]) – maps str panel names to lists of odor name orders, for each. If passed, must also pass panel_order.

  • panel (Optional[str]) – to specify panel for input data, if it does not have separate index level(s) / column indicating which panel each odor belongs to. must have a matching key in panel2name_order. all data will be assumed to belong to this panel.

  • if_panel_missing – ‘warn’|’err’|None

  • axis (Optional[str]) – if None, detect which axes to sort (and may sort both). otherwise, expecting ‘columns’|’index’

  • **kwargs – passed through to odor_index_sort_key().

Notes: Index will be checked first, and if it contains odor information, will sort on that. Otherwise, will check and sort on matching columns.

Sorts by concentration, then name. solvent_str is treated as less than all odors.

>>> df = pd.DataFrame({
...     'odor1': ['B @ -2', 'A @ -2', 'A @ -3'],
...     'odor2': ['solvent'] * 3,
...     'delta_f': [1.1, 1.2, 0.9]
... }).set_index(['odor1', 'odor2'])

Names are sorted alphabetically by default, then within each name they are sorted by concentration. Pass names_only=False to only sort on concentration, or names_first=False to sort on concentrations first. >>> sort_odors(df)

delta_f

odor1 odor2 A @ -3 solvent 0.9 A @ -2 solvent 1.2 B @ -2 solvent 1.1

>>> sort_odors(df, name_order=['B','A'])
                delta_f
odor1  odor2
B @ -2 solvent      1.1
A @ -3 solvent      0.9
A @ -2 solvent      1.2
Return type

DataFrame

hong2p.olf.strip_concs_from_odor_str(odor_str, **kwargs)[source]

Works with input representing either single components or air mixtures of multiple.

Parameters

**kwargs – passed thru to format_odor

Return type

str

hong2p.olf.yaml_data2odor_lists(yaml_data, *, sort=True)[source]

Returns a list-of-lists of dictionary representation of odors.

Each dictionary will have at least the key ‘name’ and generally also ‘log10_conc’.

The i-th list contains all of the odors presented simultaneously on the i-th odor presentation.

Parameters
  • yaml_data (dict) – parsed contents of stimulus YAML file

  • sort (bool) – (default=True) whether to, within each trial, sort odors. Irrelevant if there are is only ever a single odor presented on each trial.

hong2p.olf.yaml_data2pin_lists(yaml_data)[source]

Pins used as balances can be part of these lists despite not having a corresponding odor in ‘pins2odors’.