urltools - Vectorised Tools for URL Handling and Parsing
A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
Last updated 4 years ago
access-logsdata-importurlcpp
13.24 score 131 stars 254 dependents 968 scripts 24k downloadstriebeard - 'Radix' Trees in 'Rcpp'
'Radix trees', or 'tries', are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. 'triebeard' provides an implementation of 'radix trees' for use in R programming and in developing packages with 'Rcpp'.
Last updated 2 years ago
data-structruesradix-trietriecpp
10.26 score 32 stars 258 dependents 15 scripts 25k downloadsWikipediR - A MediaWiki API Wrapper
A wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia. It can be used to retrieve page text, information about users or the history of pages, and elements of the category tree.
Last updated 11 months ago
api-clientapi-wrappermediawiki
8.85 score 68 stars 20 dependents 81 scripts 1.4k downloadshumaniformat - A Parser for Human Names
Human names are complicated and nonstandard things. Humaniformat, which is based on Anthony Ettinger's 'humanparser' project <https://github.com/chovy/humanparser> provides functions for parsing human names, making a best-guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations.
Last updated 8 years ago
namesparsercpp
7.40 score 55 stars 7 dependents 43 scripts 763 downloadswebreadr - Tools for Reading Formatted Access Log Files
Read and tidy various common forms of web request log, including the Common and Combined Web Log formats and various Amazon access log types.
Last updated 4 years ago
access-logscpp
6.18 score 50 stars 12 scripts 209 downloadspageviews - An API Client for Wikimedia Traffic Data
Pageview data from the 'Wikimedia' sites, such as 'Wikipedia' <https://www.wikipedia.org/>, from entire projects to per-article levels of granularity, through the new RESTful API and data source <https://wikimedia.org/api/rest_v1/?doc>.
Last updated 12 months ago
mediawikipageviewpageview-datawikimediawikipedia
5.87 score 24 stars 104 scripts 259 downloadspiton - Parsing Expression Grammars in Rcpp
A wrapper around the 'Parsing Expression Grammar Template Library', a C++11 library for generating Parsing Expression Grammars, that makes it accessible within Rcpp. With this, developers can implement their own grammars and easily expose them in R packages.
Last updated 4 years ago
parsing-engineparsing-expression-grammarcpp
5.67 score 17 stars 14 dependents 4 scripts 1.3k downloadsreconstructr - Session Reconstruction and Analysis
Functions to reconstruct sessions from web log or other user trace data and calculate various metrics around them, producing tabular, output that is compatible with 'dplyr' or 'data.table' centered processes.
Last updated 3 years ago
log-analysissession-reconstructioncpp
5.62 score 29 stars 29 scripts 356 downloadsbatman - Convert categorical representations of logicals to actual logicals
Survey systems and other third-party data sources commonly use non- standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
Last updated 8 years ago
5.28 score 11 stars 70 scripts 213 downloadsolctools - Open Location Code Handling in R
'Open Location Codes' (https://openlocationcode.com/) are a Google- created standard for identifying geographic locations. olctools provides utilities for validating, encoding and decoding entries that follow this standard.
Last updated 9 years ago
cpp
5.16 score 13 stars 11 scripts 162 downloadsrdian - Client Library for The Guardian
A client library for 'The Guardian' (https://www.guardian.com/) and their API, this package allows users to search for Guardian articles and retrieve both the content and metadata.
Last updated 9 years ago
3.40 score 5 stars 6 scripts 156 downloadsmuckrock - Data on Freedom of Information Act Requests
A data package containing public domain information on requests made by the 'MuckRock' (https://www.muckrock.com/) project under the United States Freedom of Information Act.
Last updated 9 years ago
2.70 score 1 scripts 134 downloads