Sunday, November 8, 2009

Notes from SPSPhilly: Search for SP 2010 and FAST ESP

SharePoint Saturday Philly Session: Search for SharePoint 2010 and FAST ESP by Natalya Voskresenskaya Natalya is a SharePoint MVP. She speaks about SharePoint Search and ESP Fast today. Agenda Enterprise Search Value Architectural Changes FAST search linguistic features Lemmatization Entity Extraction and Synonyms Structural Analysis Dictionaries New Search experience Configure, extend, create new Enterprise Search Value What’s the true value in Enterprise search? Impose multiple layers of information architecture on top of structured and unstructured content. It helps you build a 360 degrees view of what’s happening in the company. File shares, hard drives are full of information but no one knows what’s in there. People don’t think about organizing information. People should organize the information. That’s where taxonomy comes in. We replicate information. Companies replicate information. If there is really no easy access to information then you don’t have that information. The problem is that all the documents have to be tagged with the information. This is called meta tagging. SharePoint Search 2007 Problem with SharePoint Search! It ‘s locked. You can index data from all custom databases but you cannot impose your business rules on top of the business engine. SharePoint has its indexing engine but it is used only for people search. FAST Search extends SharePoint server Advantages: High availability! Why new search is so great? Why 2007 search was failing so badly? Important part in search engine is preprocessing. When document is normalized, all characters are normalized to the language in which they are searchable. Documents should be properly index to be searched. Pre-processing Doc id Text normalization Processing All the magic stuff Post –processing Signals to indexer Common Scenario Search string: “looking for SP architect” Query: “looking” and “sp” and “architect” First result: “set of 4 dunlop sp sport 6000 tires ….” Search engines remove noise words from the sentences. Contextual insight Contextual insight annotates the scope with entity metadata by extracting all the entities it can find to provide the answers. FAST understands paragraphs, addresses, names, company names, phone numbers. It also has natural language processing. Main thing is contextual support. It’s amazing. Linguistics Determine the language Tokenization and normalization Spelling and spelling variation Anti-phrasing and stop words Synonyms, acronyms Pronunciation Entity extraction Lemmatization Mapping of the word to its base form and base form variation. “Walk” is the base form of word “walking”. And hence this matches in both stemming and lemmatization. Lemmatization dictionaries Dictionaries Nouns Adjectives Verbs Blacklist Dictionaries in SP 2010 Now you can create your own dictionaries. In SP 2010 you can do it through the UI. Notes/Interesting topics and points
  1. Relevance tuning
  2. Proximity boosts
  3. Context boosts
  4. Boosts are based on the contexts. If information is found in the meta data then the boosting of the document goes up.
  5. Now we can extend search capabilities
  6. Now you can use BCS (Business Connectivity Suite) in SP 2010. No need to create ADF files! BDC was difficult to use. Searching external data in SP 2010 is easy.
  7. Now you can create solutions in VS
  8. 118 Powershell Search cmdlets
Follow Natalya on Twitter: @natalyvo Blog: http://spforsquirrels.blogspot.com (Note: These notes were taken during live blogging. You may find typos and other mistakes. Please ignore them. Thanks.) To see more pictures of this event, visit my facebook profile: saifullah_shafiq@hotmail.com