Topics for Study | Lucid Apache Solr/Lucene Developer Certification
In order to prepare for the Lucid Solr/Lucene Developer Certification exam, you should be comfortable with the following topics:
- Lucene Background
- Indexing
- Searching
- Debugging Solr
- General Solr knowledge
- Architecting a Deployment
- General search and web application environment topics
Details on concepts included in each topic area are included below.
Lucene Background
-
Understand Lucene scoring methods such as TF-IDF and how to debug Lucene’s output.
-
Understand Lucene payloads.
-
Understand merge algorithms used by Lucene.
-
Be familiar with the Lucene index file format.
-
Understand how Lucene uses segments and merging, and how this impacts the performance of an application.
-
Be familiar with the different types of queries, such as term queries, phrase queries, and wildcard queries.
-
Be familiar with the different options available when defining a field, such as storage, indexing, analysis, vectors, positions, and frequencies.
-
Know what Lucene options can be tuned through Solr.
Indexing
-
Understand the advantages and disadvantages of distributed indexing strategies, and how they affect performance and capabilities of an application.
-
Know how to configure indexing so that faceting is available.
-
Be familiar with indexing best practices, such as batching adds, multi-threaded adds, and using a streaming update server.
-
Understand how indexing best practices can affect searching, highlighting, faceting, and sorting.
-
Be familiar with the capabilities of Solr Cell and the parameters involved.
-
Understand the DataImportHander configuration file format.
-
Be familiar with available DataImportHandler entity processors and transformers.
-
Know how to debug the DataImportHandler.
-
Understand the parameters and trade-offs of various commit strategies.
-
Be familiar with the update processor chain.
- Understand use cases for full builds and incremental builds.
-
Be able to recognize when a full rebuild is necessary.
-
Understand index writer configuration.
-
Be familiar with the various update request handlers, such as XML, CSV, and so on, and their default URLs.
Searching
-
Understand security filters.
-
Understand how an analyzer works.
-
Be familiar with the syntax for the Lucene query parser.
-
Be familiar with the syntax for the Dismax query parser.
-
Be familiar with common query parameters.
-
Understand how Solr's cache is used by features such as filters and faceting.
-
Understand the local parameter syntax.
-
Know what kinds of query parsers are available to Solr.
-
Be familiar with the best usage for each type of query parser.
-
Be familiar with use cases for faceting.
-
Understand the parameters necessary to use search faceting.
-
Understand faceting algorithms as implemented in Solr.
-
Be familiar with how various features such as searching, faceting, sorting, caching, and warming affect memory requirements in Solr.
-
Know when to use filters instead of modifying the main query.
-
Be familiar with the parameters necessary to use highlighting.
-
Understand Solr spatial functions and request parameters, such as field types, functions, and request parameters.
-
Understand the use cases for field collapsing.
-
Be familiar with the parameters involved in field collapsing.
-
Understand how Solr implements field collapsing.
-
Be familiar with Solr's boost function, and how it can be used to affect the position of a returned result based on factors such as recency, popularity, price, and so on.
-
Understand boosting based on proximity.
-
Be familiar with the syntax used to find the time elapsed between date values.
-
Be familiar with the search request handler, and how it can be used with search components.
-
Be able to list the built-in search components included with Solr, such as elevated queries, faceting, highlighting, “more like this”, and spellcheck.
-
Be able to list the built-in response writers, such as XML, JSON, Java, and PHP.
-
Understand function queries and how they can be used in a search.
Debugging Solr
-
Understand the debug parameters provided by Solr, and how they affect output.
-
Be familiar with the debugging tools provided with Solr, such as analysis.jsp, the schema browser, the Luke request handler, and the stats page.
General Solr
-
Understand what security options are available within Solr, and how to configure them.
-
Be familiar with the various configuration files needed to run Solr, such as solr.xml, solrconfig.xml, and schema.xml.
-
Know how to deploy Solr to an existing web container.
-
Know how to propagate environment variables through to a Solr configuration.
-
Know how to tell Solr to store data in a non-default location.
-
Understand the field types built into a standard Solr installation, and their options.
-
Understand the difference between query analyzers and index analyzers, and how they relate to each other in the context of an application.
-
Be familiar with techniques Solr uses to perform analysis.
-
Be familiar with best practices related to search, such as domain modeling and knowing when to use features such as OmitNorms.
-
Know when to use index boosting rather than query boosting, and vice versa.
-
Understand how to add custom code to the Solr classpath.
-
Understand Solr cache settings and how they impact performance.
-
Understand why dynamic fields exist, and when you should use them.
-
Be familiar with the Solr external API.
-
Be familiar with the various Solr client libraries, and how they can be used to build an application.
-
Know where to find the Solr logs, and how they can be used in troubleshooting an application.
-
Be familiar with various relevancy manipulation strategies.
-
Understand the "cost" of a commit, and how it affects an application.
-
Be familiar with Solr plug-in hooks and capabilities.
-
Be familiar with the techniques necessary for writing a Solr plug-in.
-
Understand the parameters related to replicating a Solr server.
-
Understand the failure modes of various components, such as replication and the DataImportHandler.
-
Know what features are available in various Solr releases.
-
Know where various components can be found within the Solr directory structure.
-
Understand the core Solr admin APIs.
-
Be familiar with the Solr source code.
-
Be familiar with the configuration options available in solrconfig.xml, and where to find them.
-
Know how to reference external file resources from within Solr's configuration.
-
Understand analysis techniques for both Western languages and Asian scripts.
Architecting a Deployment
-
Know how to set up distributed search so that a single request can be processed by multiple Solr instances.
-
Understand the organization of a cluster of Solr servers.
-
Know when it is appropriate to use distribution search.
-
Be familiar with use cases that indicate replication, and the best practices involved.
-
Final questions and answers created and rated on what developers would need to know
General Background Topics
-
Know how to execute a Java application from a command line environment.
-
Know how to install and test a Java development environment.
-
Know how to create, compile, and run a Java application.
-
Understand and be able to manage JNDI initialization parameters.
-
Understand Java configurations, such as garbage collection options and memory allocation, and how they can impact performance.
-
Know how to use techniques such as debugging, memory tools, or examining stack traces or GC logs to troubleshoot Java applications.
-
Understand how Java libraries are packaged and deployed.
-
Be familiar with configuring a web container in order to perform tasks such as changing port numbers.
-
Understand performance monitoring techniques such as measuring query speed or query throughput.
-
Be familiar with network protocols such as HTTP.
-
Be familiar with common application server internals, such as classloader hierarchies and logging configurations.
-
Be familiar with XML, and be able to troubleshoot issues such as markup errors or unescaped special characters.
-
Be comfortable with basic administrative tasks for a *nix system, such as installing packages and managing file permissions.
-
Understand how Solr interacts with various file systems such as ext*, HFS Plus, and NTFS, and how those interactions can impact performance.
-
Understand the differences between various operating systems, filesystems, and Java virtual machines, and how they can impact performance.
-
Understand general information retrieval techniques, such as inverted indexes, calculating relevance, and common techniques for implementing search.
