Title: | A Parser for Human Names |
---|---|
Description: | Human names are complicated and nonstandard things. Humaniformat, which is based on Anthony Ettinger's 'humanparser' project <https://github.com/chovy/humanparser> provides functions for parsing human names, making a best-guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations. |
Authors: | Oliver Keyes [aut, cre] |
Maintainer: | Oliver Keyes <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.0 |
Built: | 2025-02-21 03:44:51 UTC |
Source: | https://github.com/ironholds/humaniformat |
as in the lubridate package, individual components of a name can be both extracted or set using the relevant function call - see the examples.
first_name(x) first_name(x) <- value
first_name(x) first_name(x) <- value
x |
a name, or vector of names |
value |
a replacement value for x's first name. |
salutation
, middle_name
, last_name
and suffix
for other accessors.
#Get a first name example_name <- "Mr Jim Jeffries" first_name(example_name) #Set a first name first_name(example_name) <- "Prof"
#Get a first name example_name <- "Mr Jim Jeffries" first_name(example_name) #Set a first name first_name(example_name) <- "Prof"
a common pattern for names is for first and middle names to be represented
by initials. Unfortunately depending on how this is done, that can make things problematic;
"G. K. Chesterton" is easy to parse, but "G.K. Chesterton" or "G.K.Chesterton" is not.
format_period
takes names that are period-separated in this fashion and reformats
them to ensure there are spaces between each initial. Periods after any space in the name
are preserved, so "G.K. Chesterton, M.D." does not become "G. K. Chesterton, M. D. ".
format_period(names)
format_period(names)
names |
a vector of names following this convention. Names that lack periods will be returned entirely intact, so assuming you don't have (legitimate) periods in names not following this format, there's no need to worry if your vector has mixed formatting. |
format_reverse
for names stored as "Lastname, Firstname", and
parse_names
to parse the output of this function.
format_period("G.K.Chesterton")
format_period("G.K.Chesterton")
a common pattern for names is 'Lastname Suffix, Salutation Firstname' -
or to put that more practically, 'Jeffries PhD, Mr Bernard'. format_reverse
takes these reversed names and reformats them to a form that parse_names
can handle.
format_reverse(names)
format_reverse(names)
names |
a vector of names following this convention. Names that lack commas will be returned entirely intact, so assuming you don't have (legitimate) commas in names not following this format, there's no need to worry if your vector has mixed formatting. |
a vector containing the reformatted names
parse_names
, which works more reliably if reversed names have
been reformatted, and format_period
for period-separated names.
# Take a reversed name and un-reverse it format_reverse("Keyes, Oliver")
# Take a reversed name and un-reverse it format_reverse("Keyes, Oliver")
Human names are complicated and nonstandard things. Humaniformat attempts to provide functions for parsing those names, making a best-guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations.
as in the lubridate package, individual components of a name can be both extracted or set using the relevant function call - see the examples.
last_name(x) last_name(x) <- value
last_name(x) last_name(x) <- value
x |
a name, or vector of names |
value |
a replacement value for x's last name. |
salutation
, first_name
, middle_name
and suffix
for other accessors.
#Get a last name example_name <- "Mr Jim Toby Jeffries" last_name(example_name) #Set a last name last_name(example_name) <- "Smith"
#Get a last name example_name <- "Mr Jim Toby Jeffries" last_name(example_name) #Set a last name last_name(example_name) <- "Smith"
as in the lubridate package, individual components of a name can be both extracted or set using the relevant function call - see the examples.
middle_name(x) middle_name(x) <- value
middle_name(x) middle_name(x) <- value
x |
a name, or vector of names |
value |
a replacement value for x's middle name. |
salutation
, first_name
, last_name
and suffix
for other accessors.
#Get a middle name example_name <- "Mr Jim Toby Jeffries" middle_name(example_name) #Set a middle name middle_name(example_name) <- "Richard"
#Get a middle name example_name <- "Mr Jim Toby Jeffries" middle_name(example_name) #Set a middle name middle_name(example_name) <- "Richard"
human names are complex things; sometimes people have honorifics, or not. Or a single middle name, or many. Or a compound surname, or not a compound surname but 'PhD' at the end of their name, and augh.
parse_names
provides a simple
function for taking consistently formatted human names and splitting them into salutation
, first_name
,
middle_name
, last_name
and suffix
. It is capable of dealing with compound surnames, multiple middle names,
and similar variations, and is fully vectorised.
parse_names(names)
parse_names(names)
names |
a character vector of names to parse. |
a data.frame with the columns salutation
, first_name
,
middle_name
, last_name
, suffix
and full_name
(which contains the original name). In the
event that a name doesn't have a salutation, middle name, suffix, or so on, an NA will appear.
# Parse a simple name parse_names("Oliver Keyes") # Parse a more complex name parse_names("Hon. Oliver Timothy Keyes Esq.")
# Parse a simple name parse_names("Oliver Keyes") # Parse a more complex name parse_names("Hon. Oliver Timothy Keyes Esq.")
as in the lubridate package, individual components of a name can be both extracted or set using the relevant function call - see the examples. In the event that you attempt to set a component to NA, no modification will be made; in the event that you try to get a component that isn't present, an NA will be returned.
salutation(x) salutation(x) <- value
salutation(x) salutation(x) <- value
x |
a name, or vector of names |
value |
a replacement value for x's salutation |
first_name
, middle_name
, last_name
and suffix
for other accessors.
#Get a salutation example_name <- "Mr Jim Jeffries" salutation(example_name) #Set a salutation salutation(example_name) <- "Prof"
#Get a salutation example_name <- "Mr Jim Jeffries" salutation(example_name) #Set a salutation salutation(example_name) <- "Prof"
as in the lubridate package, individual components of a name can be both extracted or set using the relevant function call - see the examples.
suffix(x) suffix(x) <- value
suffix(x) suffix(x) <- value
x |
a name, or vector of names |
value |
a replacement value for x's suffix. |
salutation
, first_name
, middle_name
and last_name
for other accessors.
#Get a suffix] example_name <- "Mr Jim Toby Jeffries Esq" suffix(example_name) #Set a suffix suffix(example_name) <- "PhD"
#Get a suffix] example_name <- "Mr Jim Toby Jeffries Esq" suffix(example_name) #Set a suffix suffix(example_name) <- "PhD"