SharePoint Saturday Philly
Session: Search for SharePoint 2010 and FAST ESP by Natalya Voskresenskaya
Natalya is a SharePoint MVP. She speaks about SharePoint Search and ESP Fast today.
Agenda
Enterprise Search Value
Architectural Changes
FAST search linguistic features
Lemmatization
Entity Extraction and Synonyms
Structural Analysis
Dictionaries
New Search experience
Configure, extend, create new
Enterprise Search Value
What’s the true value in Enterprise search?
Impose multiple layers of information architecture on top of structured and unstructured content. It helps you build a 360 degrees view of what’s happening in the company. File shares, hard drives are full of information but no one knows what’s in there. People don’t think about organizing information. People should organize the information. That’s where taxonomy comes in. We replicate information. Companies replicate information. If there is really no easy access to information then you don’t have that information. The problem is that all the documents have to be tagged with the information. This is called meta tagging.
SharePoint Search 2007
Problem with SharePoint Search! It ‘s locked. You can index data from all custom databases but you cannot impose your business rules on top of the business engine. SharePoint has its indexing engine but it is used only for people search.
FAST Search extends SharePoint server
Advantages:
High availability!
Why new search is so great? Why 2007 search was failing so badly? Important part in search engine is preprocessing. When document is normalized, all characters are normalized to the language in which they are searchable. Documents should be properly index to be searched.
Pre-processing
Doc id
Text normalization
Processing
All the magic stuff
Post –processing
Signals to indexer
Common Scenario
Search string: “looking for SP architect”
Query: “looking” and “sp” and “architect”
First result: “set of 4 dunlop sp sport 6000 tires ….”
Search engines remove noise words from the sentences.
Contextual insight
Contextual insight annotates the scope with entity metadata by extracting all the entities it can find to provide the answers. FAST understands paragraphs, addresses, names, company names, phone numbers. It also has natural language processing. Main thing is contextual support. It’s amazing.
Linguistics
Determine the language
Tokenization and normalization
Spelling and spelling variation
Anti-phrasing and stop words
Synonyms, acronyms
Pronunciation
Entity extraction
Lemmatization
Mapping of the word to its base form and base form variation. “Walk” is the base form of word “walking”. And hence this matches in both stemming and lemmatization.
Lemmatization dictionaries
Dictionaries
Nouns
Adjectives
Verbs
Blacklist
Dictionaries in SP 2010
Now you can create your own dictionaries. In SP 2010 you can do it through the UI.
Notes/Interesting topics and points
- Relevance tuning
- Proximity boosts
- Context boosts
- Boosts are based on the contexts. If information is found in the meta data then the boosting of the document goes up.
- Now we can extend search capabilities
- Now you can use BCS (Business Connectivity Suite) in SP 2010. No need to create ADF files! BDC was difficult to use. Searching external data in SP 2010 is easy.
- Now you can create solutions in VS
- 118 Powershell Search cmdlets
Follow Natalya on Twitter: @natalyvo
Blog: http://spforsquirrels.blogspot.com
(Note: These notes were taken during live blogging. You may find typos and other mistakes. Please ignore them. Thanks.)
To see more pictures of this event, visit my facebook profile: saifullah_shafiq@hotmail.com
No comments:
Post a Comment