I recently caught up with Or Lenchner, CEO at Luminati, to discuss his company’s Data Collector product, an automated data collection tool, allowing customers to collect the most accurate data at scale quickly, easily, and without getting blocked. Ever since his appointment as CEO, Lenchner has continued to expand the company’s market base, as a data collection operator, dedicated to maintaining the openness, transparency and integrity of the online ecosystem. Among Luminati’s 10,000+ customers, you will find Fortune 500 companies, major e-commerce firms and sites, prominent travel companies, ad agencies, security firms, finance services, government offices and more. Prior to his career at Luminati, Lenchner founded and managed several web-based businesses, developing digital assets and online marketing programs. Joining Luminati as head of product development, Lenchner’s career and evolvement at the company has been driven by his firm belief in a transparent, ethical by design web-environment benefiting both, enterprises/businesses and consumers.
insideBIGDATA: Many of our readers are data scientists and data engineers who are hungry for new sources of data. Web scraping is a solution that keeps growing in popularity. Can you give us a high-level view of your Data Collector product and how it satisfies the accelerating desire for more data?
Or Lenchner: The Data Collector is the first-of-its-kind automated data collection tool, which allows our customers to collect the most accurate data at scale quickly, easily, and without getting blocked. The Data Collector integrates and automates all stages of the data collection process for customers but leaves them in full control over the data they collect. Customers can choose from hundreds of existing collection templates. They can “click and collect” a new target with the Data Collector extension without any prior coding knowledge. Alternatively, they can use the IDE code editor, where they can write more complex crawl code, and the system extracts the data they need without any manual intervention or onboarding required.
For customers, building a system from which to collect data effectively usually takes months and requires ongoing management and maintenance. The Data Collector solves this issue and allows them to collect the most reliable data at scale in a matter of minutes.
insideBIGDATA: How do your customers approach the scraping process in terms of customizing data collection: where to search, what to do, and how you want your data extracted?
Or Lenchner: Every customer wants to execute a data collection process that is as easy, as fast and as reliable as possible. We serve thousands of customers from multiple sectors with a wide range of data needs. Taking all that into consideration, the data collection requirements usually meet the same kind of clear frameworks across the board. Once you have served 20-30 customers, you have probably covered most of the different cases, excluding the cases placed at the edges, which will always exist.
- Our self-serve data collector interface provides you with multiple options that allow you to:
- Select a collection template
- Choose the data you want to collect
- Select schedule/or frequency for collection or trigger by API
- Advanced users are also offered the option of coding their own templates
insideBIGDATA: Data format is a big deal to data scientists. What data formats do you support?
Or Lenchner: We deliver data in CSV or JSON formats. Our customers can choose which data points to collect, the column or entry ordering, and an option to change the naming of the objects in the output.
insideBIGDATA: In terms of scalability, what if a customer has a requirement for “big data” resources? How does your offering support the acquisition of large data sets?
Or Lenchner: We can always scale up. Relying on our technology has made scaling very easy. Our offering provides the flexibility needed. We have over 1000 browser servers and over 2000 code servers and these can be increased at any time and as required.
We have been “living” in the online data collection ecosystem for over 6 years now, gathering data from the largest source in the world, i.e., the internet. After acquiring experience from thousands of customers, some of the biggest enterprises and the most data-driven companies in the world, we know how to very rapidly scale, and we do it with ease and simplicity. Looking at the expediential growth of the data domain as a whole, we made sure we can serve all the big and the small. The very well-known and data-driven as well as those who are just starting their data collection journey – our products and tech are designed for it from the start.
insideBIGDATA: How would your solution be part of a robust data pipeline? Do you have an API available so that data sources can be refreshed so machine learning algorithms can be retrained for increased accuracy?
Or Lenchner: We have API for both triggering collections by code and for delivery collected data by API hooks that can easily be integrated into existing solutions. Our customers define their data requirement at the very start and set up what they need, and we adjust our solutions accordingly so there is very little set-up time involved.
insideBIGDATA: Can you tell us what’s coming from Luminati Networks down the road? Any new plans you’d like to discuss?
Or Lenchner: 2021 brings with it a lot of exciting, new data-driven developments for us. We will continue to expand our automation-based technology as well as our educational reach. We believe that, as market leaders, we must take the added responsibility to promote a responsible and ethical data collection practice. For this reason, we have recently partnered with over 45 leading educational and academic institutions to help secure a data collection practice that is data-bright, transparency-driven and ethical by design all the way through. This is our commitment to the market and to our customers, and we intend to stand by it.