Mining software for source code

Navigating through the vast amount of source code archived by Software Heritage can be daunting, and we are working to provide appropriate tools to search inside it. As a first step in this direction, we have been providing you with the possibility to search among the tens of millions of URLs where the source code comes from. This is already quite useful, as these URLs usually contain the project name as well as the name of the hosting organization, but we want more. The next step has been to make software metadata searchable too. This metadata is extracted from packaging information as contained in, e. It seems quite easy, right?



We are searching data for your request:

Databases of online projects:
Data from exhibitions and seminars:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Content:
WATCH RELATED VIDEO: Searching and Mining Open Source Code from the Web

We apologize for the inconvenience...


Search related work Go. Contact Miltos Allamanis about this survey or website. Made with Jekyll and Hyde. We present the first method for automatically mining code idioms from a corpus of previously written, idiomatic software projects. We take the view that a code idiom is a syntactic fragment that recurs across projects and has a single semantic purpose.

Idioms may have metavariables, such as the body of a for loop. Modern IDEs commonly provide facilities for manually defining idioms and inserting them on demand, but this does not help programmers to write idiomatic code in languages or using libraries with which they are unfamiliar.

We present Haggis, a system for mining code idioms that builds on recent advanced techniques from statistical natural language processing, namely, nonparametric Bayesian probabilistic tree substitution grammars. We apply Haggis to several of the most popular open source projects from GitHub. Manual examination of the most common idioms indicate that they describe important program concepts, including object creation, exception handling, and resource management.

During maintenance, developers spend a lot of time transforming existing code: refactoring, optimizing, and adding checks to make it more robust. Much of this work is the drudgery of identifying and replacing specific patterns, yet it resists automation, because of meaningful patterns are hard to automatically find.

We present a technique for mining loop idioms, surprisingly probable semantic patterns that occur in loops, from big code to find meaningful patterns. First, we show that automatically identifiable patterns exist, in great numbers, with a large scale empirical study of loop over 25 MLOC.

Encouraged by this result, we coil loops to abstract away syntactic diversity to define information rich loop idioms. We show how loop idioms can help a tool developers identify and prioritize refactorings. We also show how our framework opens the door to data-driven tool and language design discovering opportunities to introduce new API calls and language constructs: loop idioms show that LINQ would benefit from an Enumerate operator, a result confirmed by the fact that precisely this feature is one of the most requested features on StackOverflow with votes and 95k views.

Existing API mining algorithms can be difficult to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over , LOC of hand-written API example code from the client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.

Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Indeed, modern code editors provide code folding, which allows one to selectively hide blocks of code. However this is impractical to use as folding decisions must be made manually or based on simple rules. We introduce the autofolding problem, which is to automatically create a code summary by folding less informative code regions. We present a novel solution by formulating the problem as a sequence of AST folding decisions, leveraging a scoped topic model for code tokens.

Furthermore, we find through a case study that our summarizer is strongly preferred by experienced developers. More broadly, we hope this work will aid program comprehension by turning code folding into a usable and valuable tool. The system then extracts facts that match the predefined ontology.

We propose an unsupervised model that jointly learns a latent ontological structure of an input corpus, and identifies facts from the corpus that match the learned structure. Our approach combines mixed membership stochastic block models and topic models to infer a structure by jointly modeling text, a latent concept hierarchy, and latent semantic relationships among the entities mentioned in the text.

As a case study, we apply the model to a corpus of Web documents from the software domain, and evaluate the accuracy of the various components of the learned ontology. We present a Bayesian statistical approach to the problem of automatic program synthesis. Our synthesizer starts by learning, offline and from an existing corpus, a probabilistic model of real-world programs.

During synthesis, it is provided some ambiguous and incomplete evidence about the nature of the programming task that the user wants automated, for example sets of API calls or data types that are relevant for the task.

Given this input, the synthesizer infers a posterior distribution over type-safe programs that assigns higher likelihood to programs that, according to the learned model, are more likely to match the evidence. We realize this approach using two key ideas. First, our learning techniques operate not over code but syntactic abstractions, or sketches, of programs.

During synthesis, we infer a posterior distribution over sketches, then concretize samples from this distribution into type-safe programs using combinatorial techniques. Second, our statistical model explicitly models the full intent behind a synthesis task as a latent variable.

To infer sketches, we first estimate a posterior distribution on the intent, then use samples from this posterior to generate a distribution over possible sketches. We show that our model can be implemented effectively using the new neural architecture of Bayesian encoder-decoders, which can be trained with stochastic gradient descent and yields a simple inference procedure.

We train BAYOU on a large corpus of Android apps, and find that the trained system can often synthesize complex methods given just a few API method names or data types as evidence. The experiments also justify the design choice of using a latent intent variable and the levels of abstraction at which sketches and evidence are defined. We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and a method to use this framework to statically detect anomalous, hence likely buggy, program behavior.

The distinctive insight here is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to this program.

This allows accurate program analysis even if the corpus is highly heterogeneous. We present a concrete embodiment of our framework that combines a topic model and a neural network model to learn specifications, and queries the learned models to compute anomaly scores.

We evaluate this implementation on the task of detecting anomalous usage of Android APIs. Our encouraging experimental results show that the method can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.

Word2Vec is a class of neural network models that as being trained from a large corpus of texts, they can produce for each unique word a corresponding vector in a continuous space in which linguistic contexts of words can be observed. First, we build a tool that mines the pairs of API elements that share the same usage relations among them.

The other applications are in the code migration domain. Finally, as another application in code migration, we are able to migrate equivalent API usages from Java to C with up to Software defect prediction, which predicts defective code regions, can help developers find bugs and prioritize their testing efforts.

To build accurate prediction models, previous studies focus on manually designing features that encode the characteristics of programs and exploring different machine learning algorithms. Existing traditional features often fail to capture the semantic differences of programs, and such a capability is needed for building accurate prediction models.

Our evaluation on ten open source projects shows that our automatically learned semantic features significantly improve both within-project defect prediction WPDP and cross-project defect prediction CPDP compared to traditional features. Our semantic features improve WPDP on average by Code clone detection is an important problem for software maintenance and evolution.

Many approaches consider either structure or identifiers, but none of the existing detection techniques model both sources of information. These techniques also depend on generic, handcrafted features to represent code fragments. We introduce learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository.

Our code analysis supports a framework, which relies on deep learning, for automatically linking patterns mined at the lexical level with patterns mined at the syntactic level. We evaluated our novel learning-based approach for code clone detection with respect to feasibility from the point of view of software maintainers. Among the true positives, we found pairs mapping to all four clone types. We compared our approach to a traditional structure-oriented technique and found that our learning-based approach detected clones that were either undetected or suboptimally reported by the prominent tool Deckard.

Our results affirm that our learning-based approach is suitable for clone detection and a tenable technique for researchers. Pattern Mining Models Pattern mining models infer, without supervision, a likely latent structure within code.

These models are an instantiation of clustering in the code domain; they can find reusable and human-interpretable patterns. Allamanis, C. Sutton, Mining Idioms from Source Code Graphical Model Syntax We present the first method for automatically mining code idioms from a corpus of previously written, idiomatic software projects.

Allamanis, E. Barr, C. Bird, M. Marron, C. Fowkes, C. Fowkes, R. Ranca, M. Allamanis, M. Lapata, C. Autofolding for Source Code Summarization Graphical Model Tokens Code Summarization Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Movshovitz-Attias, W. Cohen, Murali, S. Chaudhuri, C. Jermaine, Finding Likely Errors with Bayesian Specifications Graphical Model API Usage Errors Defect Prediction We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and a method to use this framework to statically detect anomalous, hence likely buggy, program behavior.

Nguyen, A. Nguyen, H. Phan, T. Nguyen, Wang, T. Liu, L. Tan, Automatically Learning Semantic Features for Defect Prediction Distributed Serialized ASTs Defect Prediction Software defect prediction, which predicts defective code regions, can help developers find bugs and prioritize their testing efforts.

White, M. Tufano, C.



95 Open Source Miner Software Projects

The computer code undergirding each major cryptocurrency and open blockchain project is developed as open source software. Regulators and policymakers looking into cryptocurrencies but unfamiliar with open source software may have an incorrect mental model: one wherein software-based systems are and must be developed by one or a handful of for-profit companies. While plenty of important software projects are developed in this way e. Open source software is collaboratively produced, shared freely, published transparently, and developed to be a community good rather than the property or business of a single company or person.

On Ethereum, you can write code that controls money, and build applications accessible anywhere Ethereum and its apps are transparent and open source.

Mining Code Change Patterns to Aid Software Development

Monitors crypto mining pools in real-time in order to find the most profitable for your machine. Controls any miner that is available via command line. Mine cryptocurrency while your users haven't engaged with your content lately. GPU miner. This application was created as a POC for how to scan your local network traffic for HTTP requests and then inject various javascript cryptocurrency miners into the response payloads. Includes basic persistance. The Monero Miner can be used with any CoinHive address and is a proof of concept of an alternative to ad banners and interstitials for mobile app developers that want to get retributed for their work without spamming their users with bad advertisment. A simple script that will watch a stream for you and earn the channel points. I customize code to run cuda on maximum GPU performance.


5. Mining Repositories

mining software for source code

There are a large number of open source projects in software repositories for developers to reuse. During software development and maintenance, developers can leverage good interfaces in these open source projects and establish the framework of the new project quickly when reusing interfaces in these open source projects. However, if developers want to reuse them, they need to read a lot of code files and learn which interfaces can be reused. To help developers better take advantage of the available interfaces used in software repositories, we previously proposed an approach to automatically recommend interfaces by mining existing open source projects in the software repositories.

Abstract: Mining Software Repositories MSR has become a complete and mature research field, also due to the increasing number of open source projects publicly available.

MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories

Mining Software Repositories MSR has become a complete and mature research field, also due to the increasing number of open source projects publicly available. Repository hosting services such as GitHub provide unprecedented access to millions of events generated during development activities e. See also Similar records Computer science University of Lugano. Links Permalink URN. Lanza, Michele.


Subscribe to RSS

Organization: giganticode. Organization: uni-bremen-agst. Organization: CommittedTeam. Organization: TheSuperiorCoin. Organization: collab-uniba.

Program source code substantially is structured and contains semantically rich programming constructs such as 6 variables, functions, data structures.

Best Bitcoin Mining Software

Sheng , Ermyas Abebe, M. Ali Babar, Andi Zhou. Developers nowadays can leverage existing systems to build their own applications. However, a lack of documentation hinders the process of software system reuse.


open-source mining firmware

RELATED VIDEO: Building Bitcoin Software From Source Code

Skip to search form Skip to main content Skip to account menu You are currently offline. Some features of the site may not work correctly. Khatoon , Guohui Li , A. Mahmood Published Program source code substantially is structured and contains semantically rich programming constructs such as 6 variables, functions, data structures, and program structures which indicate patterns. Mining source code by using different data 7 mining techniques to extract the valuable hidden patterns is the new revolution in software engineering. Over last decade many 8 tools and techniques have been proposed by researcher to extract pertinent information and uncover relationships and trends 9… Expand.

There have been some pretty creative attempts to mine bitcoins using unconventional means over the years, from lightbulbs to web browsers.

Software development environments originally have been mostly considered as some kind of text editors for manipulating source code. Nowadays this view is changing as not only source code makes up a software system, but different kind of artefacts contribute to a software system such as models, configurations, etc. Therefore, the development environments are becoming a multi-faceted set of user interfaces that are designed to support various tasks such as navigation, restructuring, debugging, and delegation for different user groups. The interaction of low-code developers, in particular, when it comes to citizen programming, with the development tool and its associated user interfaces produces a continuous stream of interaction events which provides a promising data basis for improving software development. For instance, the development processes can be reconstructed and aligned or even improved with respect to the observed behaviour. The main objective of this project is to provide an interaction mining framework, which allows for scalable analyses of LCEP interactions. Such a framework requires effective and efficient analysis algorithms which can deal with a huge amount of interaction history in off-line but also in online processing settings.

One of the popular terms in machine learning techniques is data mining. It is the process of extracting hidden or previously unknown and potentially useful information from the large sets of data. The outcome can be for analysing and achieving meaningful insights for the development of an organisation.


Comments: 3
Thanks! Your comment will appear after verification.
Add a comment

  1. Goltikasa

    What do you mean?

  2. Swinton

    Excuse, that I can not participate now in discussion - there is no free time. But I will return - I will necessarily write that I think on this question.

  3. Abdul-Salam

    What a necessary sentence ... great, the beautiful idea