Skip to content

Cast

data types

Interprets and changes a column's data to another (semantic) type.

This has two consequences:

  1. It will allow the resulting column to be used by steps only accepting the new type, e.g. when casting a column of concatenated texts to the "Url" type, so that it may be used where Urls are expected (e.g. the step fetch_url_content).
  2. It will change any values not conformant with the new type to the missing value (NaN). E.g., casting a column of mixed data containing numbers to the "Number" type, will replace all values that cannot be read as numbers with NaN.

Note that for each possible type a column can be cast to (via the "type" parameter, e.g. "Number", "Category" etc.), the steps accepts different configuration parameters. See the subsections under Parameters below for further details.

Example

E.g. to simply convert a Text column to a Category column, use:

cast(ds.text, {"type": "Category"}) -> (ds.new_cat)

Usage

The following are the step's expected inputs and outputs and their specific types.

cast(input: column, {"param": value}) -> (output: column)

Inputs


input: column

The column you wish to cast.

Outputs


output: column

A new column with original data cast to the desired type.

Parameters


The input column can be cast to any of the specified types, each having their own parameters controlling exactly how to cast the input values.

type: string

Desired semantic type of the converted data. Make data numerical with "type": "Number"

Must be one of: "Number", "number"


dtype: string | null

The specific Pandas/Numpy Dtype. If null, will be inferred automatically from the data itself, using the smallest possible type (e.g. int8, uint16 etc.) that can hold all valid numerical input values.

Note that for integers, a dtype starting with "u" means "unsigned", which can hold only 0 and positive values. Using capitalized initial letters for integers is a Pandas convention and indicates that the corresponding type can also hold missing values (NaN, which "normal" integer columns can not).

Must be one of: "int8", "uint8", "Int8", "UInt8", "int16", "uint16", "Int16", "UInt16", "int32", "uint32", "Int32", "UInt32", "int64", "uint64", "Int64", "UInt64", "float32", "float64"


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


unit: string | null

Prefix or suffix indicating the number's units. By default ("unit": null), any strings containing unit symbols, or any characters other than digits, decimal or thousands separators, will be converted to missing data (NaN).

To ignore/remove any potential unit suffixes or prefixes, and to convert only the numerical parts of the data, use "unit": remove. Note that this may create a column with mixed/incompatible data if the original data contained mixed units (e.g. different currencies of money).

Alternatively, to detect the most common unit, convert only those numbers containing it, and the rest to NaN, use "unit": detect. This ensures only data with the single most common unit is kept, the rest being removed.

As a last option, you may also explicitly state a specific unit string to identify the numbers to be kept (converted). This can be a simple string ("$") or a regular expression. Only matching data will then be converted. Any data containing non-digit characters which do not match the pattern will be removed. The explicit unit string may also be a regular expression (in which this needs to be indicated with the unit_regex parameter, see below).

Must be one of: "detect", "remove"


unit_regex: boolean = False

Interpret unit parameter as regular expression. Use to indicate that the provided unit string should be used as a regular expression to match more complicated non-digit strings in the original data that should be removed.

type: string

Desired semantic type of the converted data. Make data a currency with "type": "Currency"

Must be one of: "Currency", "currency"


iso_code: string

The ISO 4217 currency code. Identifies the single currency in this column. In the future, we may detect this automatically from any currency symbol, pre- or postfix present in the column.

Must be one of: "AED", "AFN", "ALL", "AMD", "ANG", "AOA", "ARS", "AUD", "AWG", "AZN", "BAM", "BBD", "BDT", "BGN", "BHD", "BIF", "BMD", "BND", "BOB", "BOV", "BRL", "BSD", "BTN", "BWP", "BYN", "BZD", "CAD", "CDF", "CHE", "CHF", "CHW", "CLF", "CLP", "CNY", "COP", "COU", "CRC", "CUC", "CUP", "CVE", "CZK", "DJF", "DKK", "DOP", "DZD", "EGP", "ERN", "ETB", "EUR", "FJD", "FKP", "GBP", "GEL", "GHS", "GIP", "GMD", "GNF", "GTQ", "GYD", "HKD", "HNL", "HRK", "HTG", "HUF", "IDR", "ILS", "INR", "IQD", "IRR", "ISK", "JMD", "JOD", "JPY", "KES", "KGS", "KHR", "KMF", "KPW", "KRW", "KWD", "KYD", "KZT", "LAK", "LBP", "LKR", "LRD", "LSL", "LYD", "MAD", "MDL", "MGA", "MKD", "MMK", "MNT", "MOP", "MRU", "MUR", "MVR", "MWK", "MXN", "MXV", "MYR", "MZN", "NAD", "NGN", "NIO", "NOK", "NPR", "NZD", "OMR", "PAB", "PEN", "PGK", "PHP", "PKR", "PLN", "PYG", "QAR", "RON", "RSD", "RUB", "RWF", "SAR", "SBD", "SCR", "SDG", "SEK", "SGD", "SHP", "SLL", "SOS", "SRD", "SSP", "STN", "SVC", "SYP", "SZL", "THB", "TJS", "TMT", "TND", "TOP", "TRY", "TTD", "TWD", "TZS", "UAH", "UGX", "USD", "USN", "UYI", "UYU", "UYW", "UZS", "VES", "VND", "VUV", "WST", "XAF", "XAG", "XAU", "XBA", "XBB", "XBC", "XBD", "XCD", "XDR", "XOF", "XPD", "XPF", "XPT", "XSU", "XTS", "XUA", "XXX", "YER", "ZAR", "ZMW", "ZWL"


decimal: string = "."

Separator to mark the decimal part. Use "." or "," to indicate how decimal values are separated when parsing text strings into numerical format. It is automatically assumed that the other character is used as the thousands separator. E.g. "decimal": "." assumes that the period "." is used to separate decimals and "," thousands, as in the number string "12,173.12".

Must be one of: ".", ","


unit: string | null

Prefix or suffix indicating the number's units. By default ("unit": null), any strings containing unit symbols, or any characters other than digits, decimal or thousands separators, will be converted to missing data (NaN).

To ignore/remove any potential unit suffixes or prefixes, and to convert only the numerical parts of the data, use "unit": remove. Note that this may create a column with mixed/incompatible data if the original data contained mixed units (e.g. different currencies of money).

Alternatively, to detect the most common unit, convert only those numbers containing it, and the rest to NaN, use "unit": detect. This ensures only data with the single most common unit is kept, the rest being removed.

As a last option, you may also explicitly state a specific unit string to identify the numbers to be kept (converted). This can be a simple string ("$") or a regular expression. Only matching data will then be converted. Any data containing non-digit characters which do not match the pattern will be removed. The explicit unit string may also be a regular expression (in which this needs to be indicated with the unit_regex parameter, see below).

Must be one of: "detect", "remove"


unit_regex: boolean = False

Interpret unit parameter as regular expression. Use to indicate that the provided unit string should be used as a regular expression to match more complicated non-digit strings in the original data that should be removed.

type: string

Desired semantic type of the converted data. Convert data to the Date type with "type": "Date". This will allow e.g. the extraction of particular components of the date, like year, month, or day of week (with extract_date_components), the calculation of elapsed time since a given date (time_interval), as well as enable the use of the Trends section in graphext's interface.

Must be one of: "Date", "date"


format: string

Format to parse date strings. When input data contains strings (dates in text format), indicate how these strings are constructed. E.g. if dates are in the format "21/07/2020", use "format": “%d/%m/%Y” to indicate the day, month, year order and the use of "/" as the separator of date components. For more details on how to indicate the different components of the date format see e.g. Python's strftime.


unit: string

Unit of timestamp data. When input data is numeric, indicates whether the numbers correspond to seconds, milliseconds, microseconds or nanoseconds. Dates will be interpreted as so many elapsed units since the origin (see origin parameter below).

For example, with "unit": "ms" and "origin": "unix" (the default), this would calculate the date corresponding to x milliseconds since 01/01/1970, where x denotes the input numbers.

Must be one of: "D", "s", "ms", "us", "ns"


origin: string = "unix"

Reference date for timestamp data. The input numbers would be parsed as so many units (defined by unit) since this reference date.

Must be one of: "unix", "julian"


tz: string | null = "UTC"

Timezone of the parsed dates. "tz": null will make the cast data timezone-naive, i.e. it will assume that all dates are relative to an arbitrary local timezone. To convert or ensure a single valid timezone, specify one of the below values. This will convert any dates with existing timezone information to the specified timezone, or add the new timezone if none was already present.

Must be one of: "Africa/Abidjan", "Africa/Accra", "Africa/Addis_Ababa", "Africa/Algiers", "Africa/Asmara", "Africa/Asmera", "Africa/Bamako", "Africa/Bangui", "Africa/Banjul", "Africa/Bissau", "Africa/Blantyre", "Africa/Brazzaville", "Africa/Bujumbura", "Africa/Cairo", "Africa/Casablanca", "Africa/Ceuta", "Africa/Conakry", "Africa/Dakar", "Africa/Dar_es_Salaam", "Africa/Djibouti", "Africa/Douala", "Africa/El_Aaiun", "Africa/Freetown", "Africa/Gaborone", "Africa/Harare", "Africa/Johannesburg", "Africa/Juba", "Africa/Kampala", "Africa/Khartoum", "Africa/Kigali", "Africa/Kinshasa", "Africa/Lagos", "Africa/Libreville", "Africa/Lome", "Africa/Luanda", "Africa/Lubumbashi", "Africa/Lusaka", "Africa/Malabo", "Africa/Maputo", "Africa/Maseru", "Africa/Mbabane", "Africa/Mogadishu", "Africa/Monrovia", "Africa/Nairobi", "Africa/Ndjamena", "Africa/Niamey", "Africa/Nouakchott", "Africa/Ouagadougou", "Africa/Porto-Novo", "Africa/Sao_Tome", "Africa/Timbuktu", "Africa/Tripoli", "Africa/Tunis", "Africa/Windhoek", "America/Adak", "America/Anchorage", "America/Anguilla", "America/Antigua", "America/Araguaina", "America/Argentina/Buenos_Aires", "America/Argentina/Catamarca", "America/Argentina/ComodRivadavia", "America/Argentina/Cordoba", "America/Argentina/Jujuy", "America/Argentina/La_Rioja", "America/Argentina/Mendoza", "America/Argentina/Rio_Gallegos", "America/Argentina/Salta", "America/Argentina/San_Juan", "America/Argentina/San_Luis", "America/Argentina/Tucuman", "America/Argentina/Ushuaia", "America/Aruba", "America/Asuncion", "America/Atikokan", "America/Atka", "America/Bahia", "America/Bahia_Banderas", "America/Barbados", "America/Belem", "America/Belize", "America/Blanc-Sablon", "America/Boa_Vista", "America/Bogota", "America/Boise", "America/Buenos_Aires", "America/Cambridge_Bay", "America/Campo_Grande", "America/Cancun", "America/Caracas", "America/Catamarca", "America/Cayenne", "America/Cayman", "America/Chicago", "America/Chihuahua", "America/Coral_Harbour", "America/Cordoba", "America/Costa_Rica", "America/Creston", "America/Cuiaba", "America/Curacao", "America/Danmarkshavn", "America/Dawson", "America/Dawson_Creek", "America/Denver", "America/Detroit", "America/Dominica", "America/Edmonton", "America/Eirunepe", "America/El_Salvador", "America/Ensenada", "America/Fort_Nelson", "America/Fort_Wayne", "America/Fortaleza", "America/Glace_Bay", "America/Godthab", "America/Goose_Bay", "America/Grand_Turk", "America/Grenada", "America/Guadeloupe", "America/Guatemala", "America/Guayaquil", "America/Guyana", "America/Halifax", "America/Havana", "America/Hermosillo", "America/Indiana/Indianapolis", "America/Indiana/Knox", "America/Indiana/Marengo", "America/Indiana/Petersburg", "America/Indiana/Tell_City", "America/Indiana/Vevay", "America/Indiana/Vincennes", "America/Indiana/Winamac", "America/Indianapolis", "America/Inuvik", "America/Iqaluit", "America/Jamaica", "America/Jujuy", "America/Juneau", "America/Kentucky/Louisville", "America/Kentucky/Monticello", "America/Knox_IN", "America/Kralendijk", "America/La_Paz", "America/Lima", "America/Los_Angeles", "America/Louisville", "America/Lower_Princes", "America/Maceio", "America/Managua", "America/Manaus", "America/Marigot", "America/Martinique", "America/Matamoros", "America/Mazatlan", "America/Mendoza", "America/Menominee", "America/Merida", "America/Metlakatla", "America/Mexico_City", "America/Miquelon", "America/Moncton", "America/Monterrey", "America/Montevideo", "America/Montreal", "America/Montserrat", "America/Nassau", "America/New_York", "America/Nipigon", "America/Nome", "America/Noronha", "America/North_Dakota/Beulah", "America/North_Dakota/Center", "America/North_Dakota/New_Salem", "America/Ojinaga", "America/Panama", "America/Pangnirtung", "America/Paramaribo", "America/Phoenix", "America/Port-au-Prince", "America/Port_of_Spain", "America/Porto_Acre", "America/Porto_Velho", "America/Puerto_Rico", "America/Punta_Arenas", "America/Rainy_River", "America/Rankin_Inlet", "America/Recife", "America/Regina", "America/Resolute", "America/Rio_Branco", "America/Rosario", "America/Santa_Isabel", "America/Santarem", "America/Santiago", "America/Santo_Domingo", "America/Sao_Paulo", "America/Scoresbysund", "America/Shiprock", "America/Sitka", "America/St_Barthelemy", "America/St_Johns", "America/St_Kitts", "America/St_Lucia", "America/St_Thomas", "America/St_Vincent", "America/Swift_Current", "America/Tegucigalpa", "America/Thule", "America/Thunder_Bay", "America/Tijuana", "America/Toronto", "America/Tortola", "America/Vancouver", "America/Virgin", "America/Whitehorse", "America/Winnipeg", "America/Yakutat", "America/Yellowknife", "Antarctica/Casey", "Antarctica/Davis", "Antarctica/DumontDUrville", "Antarctica/Macquarie", "Antarctica/Mawson", "Antarctica/McMurdo", "Antarctica/Palmer", "Antarctica/Rothera", "Antarctica/South_Pole", "Antarctica/Syowa", "Antarctica/Troll", "Antarctica/Vostok", "Arctic/Longyearbyen", "Asia/Aden", "Asia/Almaty", "Asia/Amman", "Asia/Anadyr", "Asia/Aqtau", "Asia/Aqtobe", "Asia/Ashgabat", "Asia/Ashkhabad", "Asia/Atyrau", "Asia/Baghdad", "Asia/Bahrain", "Asia/Baku", "Asia/Bangkok", "Asia/Barnaul", "Asia/Beirut", "Asia/Bishkek", "Asia/Brunei", "Asia/Calcutta", "Asia/Chita", "Asia/Choibalsan", "Asia/Chongqing", "Asia/Chungking", "Asia/Colombo", "Asia/Dacca", "Asia/Damascus", "Asia/Dhaka", "Asia/Dili", "Asia/Dubai", "Asia/Dushanbe", "Asia/Famagusta", "Asia/Gaza", "Asia/Harbin", "Asia/Hebron", "Asia/Ho_Chi_Minh", "Asia/Hong_Kong", "Asia/Hovd", "Asia/Irkutsk", "Asia/Istanbul", "Asia/Jakarta", "Asia/Jayapura", "Asia/Jerusalem", "Asia/Kabul", "Asia/Kamchatka", "Asia/Karachi", "Asia/Kashgar", "Asia/Kathmandu", "Asia/Katmandu", "Asia/Khandyga", "Asia/Kolkata", "Asia/Krasnoyarsk", "Asia/Kuala_Lumpur", "Asia/Kuching", "Asia/Kuwait", "Asia/Macao", "Asia/Macau", "Asia/Magadan", "Asia/Makassar", "Asia/Manila", "Asia/Muscat", "Asia/Nicosia", "Asia/Novokuznetsk", "Asia/Novosibirsk", "Asia/Omsk", "Asia/Oral", "Asia/Phnom_Penh", "Asia/Pontianak", "Asia/Pyongyang", "Asia/Qatar", "Asia/Qostanay", "Asia/Qyzylorda", "Asia/Rangoon", "Asia/Riyadh", "Asia/Saigon", "Asia/Sakhalin", "Asia/Samarkand", "Asia/Seoul", "Asia/Shanghai", "Asia/Singapore", "Asia/Srednekolymsk", "Asia/Taipei", "Asia/Tashkent", "Asia/Tbilisi", "Asia/Tehran", "Asia/Tel_Aviv", "Asia/Thimbu", "Asia/Thimphu", "Asia/Tokyo", "Asia/Tomsk", "Asia/Ujung_Pandang", "Asia/Ulaanbaatar", "Asia/Ulan_Bator", "Asia/Urumqi", "Asia/Ust-Nera", "Asia/Vientiane", "Asia/Vladivostok", "Asia/Yakutsk", "Asia/Yangon", "Asia/Yekaterinburg", "Asia/Yerevan", "Atlantic/Azores", "Atlantic/Bermuda", "Atlantic/Canary", "Atlantic/Cape_Verde", "Atlantic/Faeroe", "Atlantic/Faroe", "Atlantic/Jan_Mayen", "Atlantic/Madeira", "Atlantic/Reykjavik", "Atlantic/South_Georgia", "Atlantic/St_Helena", "Atlantic/Stanley", "Australia/ACT", "Australia/Adelaide", "Australia/Brisbane", "Australia/Broken_Hill", "Australia/Canberra", "Australia/Currie", "Australia/Darwin", "Australia/Eucla", "Australia/Hobart", "Australia/LHI", "Australia/Lindeman", "Australia/Lord_Howe", "Australia/Melbourne", "Australia/NSW", "Australia/North", "Australia/Perth", "Australia/Queensland", "Australia/South", "Australia/Sydney", "Australia/Tasmania", "Australia/Victoria", "Australia/West", "Australia/Yancowinna", "Brazil/Acre", "Brazil/DeNoronha", "Brazil/East", "Brazil/West", "CET", "CST6CDT", "Canada/Atlantic", "Canada/Central", "Canada/Eastern", "Canada/Mountain", "Canada/Newfoundland", "Canada/Pacific", "Canada/Saskatchewan", "Canada/Yukon", "Chile/Continental", "Chile/EasterIsland", "Cuba", "EET", "EST", "EST5EDT", "Egypt", "Eire", "Etc/GMT", "Etc/GMT+0", "Etc/GMT+1", "Etc/GMT+10", "Etc/GMT+11", "Etc/GMT+12", "Etc/GMT+2", "Etc/GMT+3", "Etc/GMT+4", "Etc/GMT+5", "Etc/GMT+6", "Etc/GMT+7", "Etc/GMT+8", "Etc/GMT+9", "Etc/GMT-0", "Etc/GMT-1", "Etc/GMT-10", "Etc/GMT-11", "Etc/GMT-12", "Etc/GMT-13", "Etc/GMT-14", "Etc/GMT-2", "Etc/GMT-3", "Etc/GMT-4", "Etc/GMT-5", "Etc/GMT-6", "Etc/GMT-7", "Etc/GMT-8", "Etc/GMT-9", "Etc/GMT0", "Etc/Greenwich", "Etc/UCT", "Etc/UTC", "Etc/Universal", "Etc/Zulu", "Europe/Amsterdam", "Europe/Andorra", "Europe/Astrakhan", "Europe/Athens", "Europe/Belfast", "Europe/Belgrade", "Europe/Berlin", "Europe/Bratislava", "Europe/Brussels", "Europe/Bucharest", "Europe/Budapest", "Europe/Busingen", "Europe/Chisinau", "Europe/Copenhagen", "Europe/Dublin", "Europe/Gibraltar", "Europe/Guernsey", "Europe/Helsinki", "Europe/Isle_of_Man", "Europe/Istanbul", "Europe/Jersey", "Europe/Kaliningrad", "Europe/Kiev", "Europe/Kirov", "Europe/Lisbon", "Europe/Ljubljana", "Europe/London", "Europe/Luxembourg", "Europe/Madrid", "Europe/Malta", "Europe/Mariehamn", "Europe/Minsk", "Europe/Monaco", "Europe/Moscow", "Europe/Nicosia", "Europe/Oslo", "Europe/Paris", "Europe/Podgorica", "Europe/Prague", "Europe/Riga", "Europe/Rome", "Europe/Samara", "Europe/San_Marino", "Europe/Sarajevo", "Europe/Saratov", "Europe/Simferopol", "Europe/Skopje", "Europe/Sofia", "Europe/Stockholm", "Europe/Tallinn", "Europe/Tirane", "Europe/Tiraspol", "Europe/Ulyanovsk", "Europe/Uzhgorod", "Europe/Vaduz", "Europe/Vatican", "Europe/Vienna", "Europe/Vilnius", "Europe/Volgograd", "Europe/Warsaw", "Europe/Zagreb", "Europe/Zaporozhye", "Europe/Zurich", "GB", "GB-Eire", "GMT", "GMT+0", "GMT-0", "GMT0", "Greenwich", "HST", "Hongkong", "Iceland", "Indian/Antananarivo", "Indian/Chagos", "Indian/Christmas", "Indian/Cocos", "Indian/Comoro", "Indian/Kerguelen", "Indian/Mahe", "Indian/Maldives", "Indian/Mauritius", "Indian/Mayotte", "Indian/Reunion", "Iran", "Israel", "Jamaica", "Japan", "Kwajalein", "Libya", "MET", "MST", "MST7MDT", "Mexico/BajaNorte", "Mexico/BajaSur", "Mexico/General", "NZ", "NZ-CHAT", "Navajo", "PRC", "PST8PDT", "Pacific/Apia", "Pacific/Auckland", "Pacific/Bougainville", "Pacific/Chatham", "Pacific/Chuuk", "Pacific/Easter", "Pacific/Efate", "Pacific/Enderbury", "Pacific/Fakaofo", "Pacific/Fiji", "Pacific/Funafuti", "Pacific/Galapagos", "Pacific/Gambier", "Pacific/Guadalcanal", "Pacific/Guam", "Pacific/Honolulu", "Pacific/Johnston", "Pacific/Kiritimati", "Pacific/Kosrae", "Pacific/Kwajalein", "Pacific/Majuro", "Pacific/Marquesas", "Pacific/Midway", "Pacific/Nauru", "Pacific/Niue", "Pacific/Norfolk", "Pacific/Noumea", "Pacific/Pago_Pago", "Pacific/Palau", "Pacific/Pitcairn", "Pacific/Pohnpei", "Pacific/Ponape", "Pacific/Port_Moresby", "Pacific/Rarotonga", "Pacific/Saipan", "Pacific/Samoa", "Pacific/Tahiti", "Pacific/Tarawa", "Pacific/Tongatapu", "Pacific/Truk", "Pacific/Wake", "Pacific/Wallis", "Pacific/Yap", "Poland", "Portugal", "ROC", "ROK", "Singapore", "Turkey", "UCT", "US/Alaska", "US/Aleutian", "US/Arizona", "US/Central", "US/East-Indiana", "US/Eastern", "US/Hawaii", "US/Indiana-Starke", "US/Michigan", "US/Mountain", "US/Pacific", "US/Samoa", "UTC", "Universal", "W-SU", "WET", "Zulu"

type: string

Desired semantic type of the converted data. Convert data to the Text type with "type": "Text". This allows the resulting column to be used e.g. in steps involving natural language processing (NLP).

Must be one of: "Text", "text"

type: string

Desired semantic type of the converted data. Convert data to the Url type with "type": "Url". This will allow e.g. fetching of any textual content found at the specified Url (with fetch_url_content), or linking of a network node in the interface to the given website (configure_node_url).

Must be one of: "Url", "url"

type: string

Desired semantic type of the converted data. Convert data to the List type with "type": "List". This allows e.g. for strings like "keyword1, keyword2, ..." to be treated as multivalued categorical data, i.e. indicating that the corresponding rows belong to more than one category.

Must be one of: "List", "list"


element_semantic: string

Coerce list elements to this semantic type. If null, the element semantic will be inferred.

Must be one of: "Number", "Date", "Url", "Category", "Boolean"


element_dtype: string | null

If elements are numeric, which specific numpy dtype to use. If null, will be inferred.

Must be one of: "int8", "uint8", "Int8", "UInt8", "int16", "uint16", "Int16", "UInt16", "int32", "uint32", "Int32", "UInt32", "int64", "uint64", "Int64", "UInt64", "float32", "float64"


brackets: string | null = "[]"

A 2-character string identifying the opening and closing brackets used to identify list strings. For example "[]", "()", "{}" etc. If null, any possible bracket characters at the beginning and end of a string will be removed before parsing the elements.


separator: string = ","

Which seperation character to use to split input string into list elements. Note that spaces will always be stripped from individual elements.

type: string

Desired semantic type of the converted data. Convert data to the Category type with "type": "Category". This will influence how the column is presented in graphext's interface, and enables the use of steps like trim_frequencies, merge_categories etc.

Must be one of: "Category", "category"


categories: array | null

Which categories to keep. If null will maintain all unique values encountered in the input. If a list of strings is specified explicitly, the resulting categorical column will have only these categories, and any remaining values will be replaced with NaN (missing).


ordered: boolean = False

Whether the categories are to be ordered or not. In which case the order is specified by the categories parameter (a list).

type: string

Desired semantic type of the converted data. Convert data to the Sex type with "type": "Sex". This is essentially a categorical type with two predefined values for male and female. How the two categories are detected/parsed in raw data, and with which label to represent them can be configured with below parameters.

Must be one of: "Sex", "sex"


labels: object = {'female': 'Female', 'male': 'Male'}

The labels used to identify female and male categories. An object of the form {"female": "female_label", "male": "male_label"}, indicating how to represent each sex in the data. E.g. as F/M or ♀️/♂️ etc.

Items in labels

female: string = "Female"

Label for the female category.


male: string = "Male"

Label for the male category.

type: string

Desired semantic type of the converted data. Convert data to the Boolean (logical) type with "type": "Boolean". If the input data is numeric, 0s will be treated as False and all other values as True. If the input data contains text strings, the values {"t", "true", "1", "1.0"} in lower- or uppercase will be interpreted as True, and the values {"f", "false", "0", "0.0"} as False. Any remaining values will be converted to NaN (missing).

Must be one of: "Boolean", "boolean"

type_name: string

Select a type using a shortened yet fully specified name. Instead of passing a type name and various parameters, use one of the abbreviated forms below. Except for the type of elements in lists, the values of parameters mentioned under a specific type will be set to their defaults.

Must be one of: "category", "date", "number", "boolean", "url", "sex", "text", "list[number]", "list[category]", "list[url]", "list[boolean]", "list[date]"