urltools - Vectorised Tools for URL Handling and Parsing
A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
Last updated 4 years ago
access-logsdata-importurl
13.30 score 131 stars 259 packages 960 scripts 27k downloadstriebeard - 'Radix' Trees in 'Rcpp'
'Radix trees', or 'tries', are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. 'triebeard' provides an implementation of 'radix trees' for use in R programming and in developing packages with 'Rcpp'.
Last updated 2 years ago
data-structruesradix-trietrie
10.29 score 32 stars 263 packages 14 scripts 27k downloadsWikipediR - A MediaWiki API Wrapper
A wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia. It can be used to retrieve page text, information about users or the history of pages, and elements of the category tree.
Last updated 8 months ago
api-clientapi-wrappermediawiki
9.34 score 68 stars 35 packages 78 scripts 2.6k downloadshumaniformat - A Parser for Human Names
Human names are complicated and nonstandard things. Humaniformat, which is based on Anthony Ettinger's 'humanparser' project <https://github.com/chovy/humanparser> provides functions for parsing human names, making a best-guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations.
Last updated 8 years ago
namesparser
7.36 score 54 stars 7 packages 40 scripts 942 downloadswebreadr - Tools for Reading Formatted Access Log Files
Read and tidy various common forms of web request log, including the Common and Combined Web Log formats and various Amazon access log types.
Last updated 4 years ago
access-logs
6.18 score 50 stars 12 scripts 141 downloadspageviews - An API Client for Wikimedia Traffic Data
Pageview data from the 'Wikimedia' sites, such as 'Wikipedia' <https://www.wikipedia.org/>, from entire projects to per-article levels of granularity, through the new RESTful API and data source <https://wikimedia.org/api/rest_v1/?doc>.
Last updated 9 months ago
mediawikipageviewpageview-datawikimediawikipedia
5.88 score 23 stars 109 scripts 481 downloadsreconstructr - Session Reconstruction and Analysis
Functions to reconstruct sessions from web log or other user trace data and calculate various metrics around them, producing tabular, output that is compatible with 'dplyr' or 'data.table' centered processes.
Last updated 3 years ago
log-analysissession-reconstruction
5.62 score 29 stars 29 scripts 319 downloadspiton - Parsing Expression Grammars in Rcpp
A wrapper around the 'Parsing Expression Grammar Template Library', a C++11 library for generating Parsing Expression Grammars, that makes it accessible within Rcpp. With this, developers can implement their own grammars and easily expose them in R packages.
Last updated 4 years ago
parsing-engineparsing-expression-grammar
5.53 score 17 stars 11 packages 4 scripts 1.2k downloadsbatman - Convert categorical representations of logicals to actual logicals
Survey systems and other third-party data sources commonly use non- standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
Last updated 8 years ago
5.28 score 11 stars 69 scripts 177 downloadsolctools - Open Location Code Handling in R
'Open Location Codes' (https://openlocationcode.com/) are a Google- created standard for identifying geographic locations. olctools provides utilities for validating, encoding and decoding entries that follow this standard.
Last updated 9 years ago
5.16 score 13 stars 11 scripts 133 downloadsrdian - Client Library for The Guardian
A client library for 'The Guardian' (https://www.guardian.com/) and their API, this package allows users to search for Guardian articles and retrieve both the content and metadata.
Last updated 9 years ago
3.40 score 5 stars 6 scripts 160 downloadsmuckrock - Data on Freedom of Information Act Requests
A data package containing public domain information on requests made by the 'MuckRock' (https://www.muckrock.com/) project under the United States Freedom of Information Act.
Last updated 8 years ago
2.70 score 1 scripts 125 downloads