Information Retrieval Applications in Software Development

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering challenges during the initial development process is presented.

See Full PDF See Full PDF

Related Papers

Download Free PDF View PDF

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering (SE) challenges during the initial development process is presented.

Download Free PDF View PDF

Abstract There is a growing interest in creating tools that can assist engineers in all phases of the software life cycle. This assistance requires techniques that go beyond traditional static and dynamic analysis. An example of such a technique is the application of information retrieval (IR), which exploits information found in a project's natural language. Such information can be extracted from the source code's identifiers and comments and in artifacts associated with the project, such as the requirements.

Download Free PDF View PDF

Abstract During software evolution a collection of related artifacts with different representations are created. Some of these are composed of structured data (eg, analysis data), some contain semi-structured information (eg, source code), and many include unstructured information (eg, text). Research efforts exist that are trying to extract, represent, and analyze the unstructured information in software.

Download Free PDF View PDF

Abstract—There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place.

Download Free PDF View PDF

Mining textual artifacts is important for a large array of software engineering tasks: software reuse, software maintenance, software quality assurance, to name a few. Much of the work on mining software repositories has tended to " exclude " such non-structured artifacts. At the same time, we find these items to be rich in semantic information and feel that mining techniques should treat text as software and address their efficient mining. We investigate the application of information retrieval (IR) techniques to the tracing of textual elements in the software repository. Some textual mining activities are very critical (e.g., tracing artifacts to assure satisfaction of safety requirements) and require analyst participation. We describe our approach to eliciting and processing analyst feedback for the tracing of textual elements of a repository. We then present a study that shows that standard IR methods combined with analyst feedback outperform IR methods alone in terms of coverage (recall-did we find all the relevant links?) and signal-to-noise ratio (precision-were the links we found relevant?). With the analyst " in the loop, " it is necessary to ensure that the tracing software possesses quality from the perspective of the analyst. We examined standard measures for evaluating IR methods and found that they do not always suffice for examining a tool from the analyst's perspective. To address this, we developed a set of secondary measures for evaluating the tracing software. We show, by counterexamples from two projects, that standard measures alone do not provide the detail necessary for adequately evaluating mining tools from the analyst's perspective.

Download Free PDF View PDF

Download Free PDF View PDF

Empirical Software Engineering

Download Free PDF View PDF

Empirical Software Engineering

Download Free PDF View PDF

For over two decades, software engineering (SE) researchers have been importing tools and techniques from information retrieval (IR). Initial results have been quite positive. For example, when applied to problems such as feature location or re-establishing traceability links, IR techniques work well on their own, and often even better in combination with more traditional source code analysis techniques such as static and dynamic analysis. However, recently there has been growing awareness among SE researchers that IR tools and techniques are designed to work under a different set of assumptions than those that hold for a software system. Thus it may be beneficial to consider IR inspired tools and techniques that are specifically designed to work with software. One aim of this work is to provide quantitative empirical evidence in support of this observation. To do so a new technique is introduced that captures the level of difficulty found in an information need, the true, often lat.

Download Free PDF View PDF